

Research
I am working with Artificial
Intelligence group
at the Department of Information Engineering and Mathematical Science of University of Siena.
I am mainly a theoretical researcher focused on understanding the fundamentals theory of machine learning
and, more generally, of artificial intelligence. But, after a theory has been
developed, I also like to put it in pratice, to develop new methods and to face pratical
applications by using the acquired knowledge. Summing my activity in a list, my principal interests
include machine learning algorithms, artificial neural networks,
adaptive processing of data structures, web information retrieval, and any application of the tecniques developed
on those fields.
In the following, few main topics of my research are described. More details can be obtained
from my publication list .
Why are deep neural networks better than shallow architectures?
The recent success of Deep Neural Networks (DNNs) seems to suggest that the depth of the employed architecture
is a key ingredient of modern machine learning and it is required to face complex applications,
e.g. image understanding, speech recognition, and NLP.
From a theoretical point of view, the above claim can be reformulated as
"DNNs can implement functions with higher complexity than shallow ones, when using the same number of resources."
But, how can the concept "high complexity function" be formally defined? In fact,
the complexity measures commonly used are not useful for this purposes, since they deal with different concepts,
such as the architectural complexity (number of parameters, units and layers)
or the generalization capability of a model (VapnikCervonenkis dimension).
In [1], we introduced a novel measure aimed at evaluating the complexity of the function implemented
by a neural network, used for classification purposes. Then, deep and shallow neural architectures have been compared.
The obtained results support the idea that deep networks actually implements more complex functions or,
in other words, that they are able, with the same number of resources, to address more difficult problems.

M. Bianchini, F. Scarselli.
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures.
IEEE Transactions on Neural Networks and Learning Systems, 25 (8), pp. 15531565, 2014.

Graph Neural Networks: learning to classify patterns along their relationships
A labelled graph is a general data structure that allows to represent a complex application
domain where the patterns (the nodes) comes along with their relationships (the edges).
For example, a portion of the Web can be denoted by
a directed graph whose nodes stand for the webpages and whose edges represent the hyperlinks.
In such a case, each node may have a vectorial label
containing a description of the webpage, e.g. the words contained in the page.
Labelled graphs extends standard machine learning data structures:
a graph without edges denotes a common set of patterns,
a graph where each node is linked to (only) a following node denote a sequence.
The Graph Neural Network (GNN) [2] is a supervised connectionist model able to face the problem of
learning to classify a pattern (a node) using not only its label, but also its direct and
indirect relationships with other patterns (nodes).
A GNN can take as input a graph and return an output for each node of the graph.
For example, the model can learn to classify a webpage as spam or nonspam using the information about the page, its connectivity and, more generally, the whole information on the web graph.
GNNs extend the previous connectionist approaches for graph processing since they can deal
with a wider class of graphs, including cyclic and noncyclic, directed and undirected,
and positional and nonpositional graphs. It has been also proved that GNNs are a sort of universal
approximators for functions defined on graphs [3]. Finally, the model
has been applied to several practical problems including bioinformatics,
image understanding, sentences extraction, web page ranking, and web
document classification. See this page, for the software and more references.
 F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G.
Monfardini.The Graph Neural Network
Model. IEEE
Transactions on Neural Networks, vol. 20(1); p. 6180, 2009.
 F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G.
Monfardini. Computational Capabilities
of Graph Neural Networks. IEEE
Transactions on Neural Networks, vol. 20(1); p. 81102, 2009.

Inside Google's PageRank
The PageRank is the algorithm originally used by Google to measure the authority (importance)
of the webpages. The authority, along with other measures of the closeness of the query with respect
to the webpage content, is used by search engines to sort the URLs returned in response to user queries.
The introduction of PageRank
produced a breakthrough in the capability of search engines to deal with web spam.
In [5], we studied the fundamental properties of PageRank about stability
and complexity of computational scheme. Moreover, we introduced a circuit analysis that allows us
to understand the distribution of the page score, the way different Web communities interact, and some
secrets about the information promotion of Web pages.

M Bianchini, M Gori, F Scarselli. Inside pagerank
ACM Transactions on Internet Technology (TOIT) 5 (1), 92128, 2005.

Why are autoencoders
better than multilayer networks in verification problems?
Autoencoders (also called autoassociators) can be used for several purposes:
for classification; for verification; to recognize outlayers; for information compression.
Moreover, deep neural networks use autoencoders to construct a deep architecture.
An autoencoder is a three layered neural
network (one hidden layer) with the same number of input and output neurons. The network is trained to reproduce (to autoassociate)
the input into the output: the target of each pattern is just its input.
By such a training and since the number of hidden
neurons is smaller than the number of inputs,
the autoencoder is forced to compress the input information into the hidden neurons.
Moreover, the network is expected to produce the best autoassociation for
the most frequent pattern in the training set. Thus, the error computed on novel pattern is small for
the domain regions that are dense of patterns and it is large for regions where patterns are not frequent.
In [6], the reresentation capability of autoencoders has been studied and
compared with that of multilayer networks.
It has been discovered that in autoencoders the domain regions
containing the positively classified patterns are always bounded,
whereas they may be unbounded in multiplayer networks.
Thus, an autoencoder tends to classsify as negative outlayer patterns and those patterns that are very differnt
from the examples of the training set. On the contrary, the classification of those patterns
is completely unpredictable in multilayer perceptrons.
Such a difference is particularly relevant in verificantion problems.
A verification problem is a particular type of binary classication problem,
where the purpose is to verify whether the input
pattern belongs to a given class: for instance, to verify a signature or a banknote.
In verification problems often
only positive examples are available and/or the number of negative examples are not enough
to represent the underline distribution.
For example, it is impossible to create a complete dataset of all the possible images
that do not represent a 5euro banknote.

M Gori, F Scarselli.
Are multilayer perceptrons adequate for pattern recognition and verification?
Pattern Analysis and Machine Intelligence, IEEE Transactions on 20 (11), pp. 11211132, 1998.

Understanding approximantion capability
of multilayer neural networks
It is well known that multialyer neural networks are universal approximators, i.e., they
can approximate any function ƒ:ℝ^{n} → ℝ^{n}.
However, the most common theoritical results are existential in nature and do not clarify
how a network approximating a given function can be constructed.
In [7], the results on approximation capability of neural networks are reviewed and
and an intutive explanation is provided. Moreover, it is shown that, in three layered neural networks
(one hidden layer), only the hiddentooutput layer weights have to be optimizated
on the base of the target function, whereas the inputtohidden layer weights can be chosen randomly or
by using a grid of values. Such an observation suggests how to implement a constructive algorithm
that does not suffer from the problem of local minima.

F Scarselli, A. C. Tsoi,
Universal approximation using feedforward neural networks:
A survey of some existing methods, and some new results.
Neural networks 11 (1), 1537, 1998.

