Franco Scarselli' research page

University of Siena
Department of Information Engineering and Mathematics

Franco Scarselli

Home

Research

Publications

Curriculum Vitae

Teaching

GNN software

Links

DIISM

AIR group

Research

I am working with Artificial Intelligence group at the Department of Information Engineering and Mathematical Science of University of Siena.

I am mainly a theoretical researcher focused on understanding the fundamentals theory of machine learning and, more generally, of artificial intelligence. But, after a theory has been developed, I also like to put it in pratice, to develop new methods and to face pratical applications by using the acquired knowledge. Summing my activity in a list, my principal interests include machine learning algorithms, artificial neural networks, adaptive processing of data structures, web information retrieval, and any application of the tecniques developed on those fields.

In the following, few main topics of my research are described. More details can be obtained from my publication list .

Why are deep neural networks better than shallow architectures? The introduction of a new measure of complexity for the funcctions implemented by neural networks allows us to understand an important difference between deep and shallow neural networks.
Graph Neural Networks. A new connectionist model for application domains where the patterns come along with their relationships.
Inside Google PageRank. A theoretical analysis of the algorithm used by Google to measure the web-page authority.
Why are autoencoders better than multilayer networks in verification problems?
Understanding approximantion capability of multilayer neural networks.

Why are deep neural networks better than shallow architectures?

The recent success of Deep Neural Networks (DNNs) seems to suggest that the depth of the employed architecture is a key ingredient of modern machine learning and it is required to face complex applications, e.g. image understanding, speech recognition, and NLP. From a theoretical point of view, the above claim can be reformulated as "DNNs can implement functions with higher complexity than shallow ones, when using the same number of resources." But, how can the concept "high complexity function" be formally defined? In fact, the complexity measures commonly used are not useful for this purposes, since they deal with different concepts, such as the architectural complexity (number of parameters, units and layers) or the generalization capability of a model (Vapnik-Cervonenkis dimension).

In [1], we introduced a novel measure aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes. Then, deep and shallow neural architectures have been compared. The obtained results support the idea that deep networks actually implements more complex functions or, in other words, that they are able, with the same number of resources, to address more difficult problems.

M. Bianchini, F. Scarselli. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Transactions on Neural Networks and Learning Systems, 25 (8), pp. 1553-1565, 2014.

Graph Neural Networks: learning to classify patterns along their relationships

A labelled graph is a general data structure that allows to represent a complex application domain where the patterns (the nodes) comes along with their relationships (the edges). For example, a portion of the Web can be denoted by a directed graph whose nodes stand for the web-pages and whose edges represent the hyperlinks. In such a case, each node may have a vectorial label containing a description of the web-page, e.g. the words contained in the page. Labelled graphs extends standard machine learning data structures: a graph without edges denotes a common set of patterns, a graph where each node is linked to (only) a following node denote a sequence.

The Graph Neural Network (GNN) [2] is a supervised connectionist model able to face the problem of learning to classify a pattern (a node) using not only its label, but also its direct and indirect relationships with other patterns (nodes). A GNN can take as input a graph and return an output for each node of the graph. For example, the model can learn to classify a web-page as spam or non-spam using the information about the page, its connectivity and, more generally, the whole information on the web graph.

GNNs extend the previous connectionist approaches for graph processing since they can deal with a wider class of graphs, including cyclic and non-cyclic, directed and undirected, and positional and non-positional graphs. It has been also proved that GNNs are a sort of universal approximators for functions defined on graphs [3]. Finally, the model has been applied to several practical problems including bioinformatics, image understanding, sentences extraction, web page ranking, and web document classification. See this page, for the software and more references.

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini.The Graph Neural Network Model. IEEE Transactions on Neural Networks, vol. 20(1); p. 61-80, 2009.
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini. Computational Capabilities of Graph Neural Networks. IEEE Transactions on Neural Networks, vol. 20(1); p. 81-102, 2009.

Inside Google's PageRank

The PageRank is the algorithm originally used by Google to measure the authority (importance) of the web-pages. The authority, along with other measures of the closeness of the query with respect to the web-page content, is used by search engines to sort the URLs returned in response to user queries. The introduction of PageRank produced a breakthrough in the capability of search engines to deal with web spam.

In [5], we studied the fundamental properties of PageRank about stability and complexity of computational scheme. Moreover, we introduced a circuit analysis that allows us to understand the distribution of the page score, the way different Web communities interact, and some secrets about the information promotion of Web pages.

M Bianchini, M Gori, F Scarselli. Inside pagerank ACM Transactions on Internet Technology (TOIT) 5 (1), 92-128, 2005.

Why are autoencoders better than multilayer networks in verification problems?

Autoencoders (also called autoassociators) can be used for several purposes: for classification; for verification; to recognize out-layers; for information compression. Moreover, deep neural networks use autoencoders to construct a deep architecture.

An autoencoder is a three layered neural network (one hidden layer) with the same number of input and output neurons. The network is trained to reproduce (to autoassociate) the input into the output: the target of each pattern is just its input. By such a training and since the number of hidden neurons is smaller than the number of inputs, the autoencoder is forced to compress the input information into the hidden neurons. Moreover, the network is expected to produce the best autoassociation for the most frequent pattern in the training set. Thus, the error computed on novel pattern is small for the domain regions that are dense of patterns and it is large for regions where patterns are not frequent.

In [6], the reresentation capability of autoencoders has been studied and compared with that of multilayer networks. It has been discovered that in autoencoders the domain regions containing the positively classified patterns are always bounded, whereas they may be unbounded in multiplayer networks. Thus, an autoencoder tends to classsify as negative outlayer patterns and those patterns that are very differnt from the examples of the training set. On the contrary, the classification of those patterns is completely unpredictable in multilayer perceptrons.

Such a difference is particularly relevant in verificantion problems. A verification problem is a particular type of binary classication problem, where the purpose is to verify whether the input pattern belongs to a given class: for instance, to verify a signature or a banknote. In verification problems often only positive examples are available and/or the number of negative examples are not enough to represent the underline distribution. For example, it is impossible to create a complete dataset of all the possible images that do not represent a 5euro banknote.

M Gori, F Scarselli. Are multilayer perceptrons adequate for pattern recognition and verification? Pattern Analysis and Machine Intelligence, IEEE Transactions on 20 (11), pp. 1121-1132, 1998.

Understanding approximantion capability of multilayer neural networks

It is well known that multialyer neural networks are universal approximators, i.e., they can approximate any function ƒ:ℝⁿ → ℝⁿ. However, the most common theoritical results are existential in nature and do not clarify how a network approximating a given function can be constructed.

In [7], the results on approximation capability of neural networks are reviewed and and an intutive explanation is provided. Moreover, it is shown that, in three layered neural networks (one hidden layer), only the hidden-to-output layer weights have to be optimizated on the base of the target function, whereas the input-to-hidden layer weights can be chosen randomly or by using a grid of values. Such an observation suggests how to implement a constructive algorithm that does not suffer from the problem of local minima.

F Scarselli, A. C. Tsoi, Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results. Neural networks 11 (1), 15-37, 1998.