Exponential of Real World Network



Abstract
With expanding social networks and large online social data, large graphical representations of useful data are getting prevalent. A lot of graph mining tasks exist which when run on graphs extract useful information. But with the growing graph size, issues like scalability and computational cost of these tasks arise. So, here in this project we explore usefulness of exponential of graph matrices to observe any interesting properties. Here, we explore Eigen Vector decomposition properties on exponential matrix. In addition, several important mathematical properties relevant to graph mining which can be modelled using exponential of a matrix have been discussed. Specifically, we aim at improving Clustering and Link Prediction algorithms using mathematical intuition obtained from exponential of a matrix.

Background

Exponential of Matrix

Exponential of a matrix is a matrix function defined on a square matrix. Exponential of a matrix can be computed using power series given by:

For any matrix A, the above power series converges (for matrix with finite norm) and hence exponential of a matrix is well defined. In case of graphs, we represent graph using an n x n adjacency matrix, where n is the number of nodes. Hence, matrix exponential is computed on adjacency matrix.

Exponential Features

Exponential of a matrix could be used to model certain useful features which can be used in Graph Mining Tasks.

  • Centrality - Denotes the connectivity of a node in the network. Defined for every node. Denotes the number of closed walks associated to a node.

  • Communicability - Denotes the connectivity between two nodes in a graph. Defined for every pair of nodes in the graph. Denotes the number of walks between 2 nodes.

  • Betweenness - Quantifies the influence of a particular node in a network. Defined for every node. Computes the effectiveness of the node in information exchange.

  • Inbound - Measures the amount of inbound traffic onto a node. Defined for every node. Affectively computes the weighted connectivity of inward edges onto a node.

  • Outbound - Measures the amount of outbound traffic from a node. Defined for every node. Affectively computes the weighted connectivity of outward edges from a node.

  • Modified Katz Measure - Computes the weighted sum of paths varying lengths between pair of nodes. Defined for every pair of nodes. With weights appropriately chosen, it is similar to communicability between two nodes.

  • Eigen Decomposition and Clustering
  • For a matrix and its exponential, Eigen vectors would remain same while Eigen values in exponential case would be exponentiated values of that in normal matrix.
  • This property could be useful in scenarios where Eigen decomposition is relatively difficult for normal matrices. This is observed in sparse directed matrices.
  • As Eigen vector computation speeds up, a significant impact occurs in spectral clustering performance.

  • Link Prediction
  • Link Prediction is a graph mining task that deals with predicting missing edges and also suggests possible links.
  • For every pair of nodes, certain set of path based and node based features are computed and decision is accordingly taken to predict if an edge exists.
  • Few of the conventional features such as node degrees, Jaccard and Cosine Similarity between nodes based on their neighbors are used.
  • In addition, features derived from exponential such as centrality, communicability, betweenness, inbound, outbound which represent graph connectivity are used.

  • Procedure
    Several directed and undirected datasets from SNAP Dataset Collection were used for our experiments.
    1. Data Preprocessing – Adjacency matrix has been constructed out of edge data of graph.
    2. Exponential Features – Exponential of the matrix has been computed and features have been extracted for Link Prediction.
    3. Training and Test Sets – We have sampled equal number of edge and non-edge node pairs for classification. Further, a 80-20 split is considered to build training and test sets.
    4. Classifiers –Various classifiers such as Random Forest, Logistic Regression, Support Vector Machine, and Neural Networks have ben used for Edge Classification task.
    Results
  • Eigen Decomposition is faster for exponential matrix in case of Sparse Directed matrix.
  • Various features derived from exponential of matrix contribute in making link prediction task more accurate. Out of these Katz measure and Outboundness of source node play dominant role in improving link prediction task.
  • Exponential of matrix computation gets expensive for large matrices making it memory and time intensive. A map-reduce memory intensive method needs to be explored for very large matrices.




  • Reference
    • • M. Benzi, E. Estrada, and C. Klymko, "Ranking hubs and authorities using matrix functions," Linear Algebra and its Applications 438, 5(2013), 2447-2474
    • • E. Estrada, and N. Hatano, "Communicability in complex networks," Physical Review E 77, 3(2008), 036111.