Some Advancements in Statistical Modeling of Complex Networks

Doctoral Dissertation

Abstract

The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in models and algorithms for the inference of networks. Among various learning tasks with network data, community detection, which divides nodes into clusters or `communities’, and change-point detection, which detects possible change points among multiple networks, have arguably received the most attention in the scientific community.

In many real-world networks, the network data often comes with additional covariates information, which should ideally be leveraged for inference such as for performing community detection. We add to a limited literature on community detection for networks with covariates by proposing a Bayesian model in which the effects of the covariates are incorporated via a covariates-dependent random partition prior, under which a block model is assumed. Under our prior, the covariates information is explicitly expressed in specifying the prior distribution on the cluster membership. Our model has the flexibility of modeling uncertainties of all parameter estimates, including the community membership. Another key feature of our models compared with the existing work is that it has the ability to learn the number of the community via posterior inference without having to assume it to be known. Our model can be applied to community detection in both dense and sparse networks. We carried out a comprehensive simulation study and apply our model to two real data sets which demonstrated superior performance of our model over existing methods.

For community detection in sparse weighted network, we propose a novel Bayesian model, which we call the spike and slab stochastic block model (ssSBM). A random partition model such as the Chinese restaurant process is combined with the spike and slab prior for modeling the community structure as well as the sparsity of the networks. A random partition prior such as the Chinese restaurant process, is imposed on the community structure under which the edge weights are assumed to follow a cluster-dependent spike and slab distribution. One of the key novelties of our model is that it can explicitly model the sparsity levels of edge weights, and the sparsity patterns are allowed to vary according to the community structure. Another appealing feature of our model is that it automatically learns the number of communities in a network without having to assume it is known or estimating it beforehand. Efficient MCMC algorithms are developed for sampling the posterior distribution of the parameters. Extension simulation study and data analysis have been carried out demonstrating the efficiency as well as the utilities of our algorithms.

In the change-point problem, one studies a series of networks indexed by time and wants to find out whether there is a significant change in the structure of these networks at some time point. In our work, different from classical network inference methods, which are mainly based on node, edge, or correlation, we also take the underlying topological structure into account. We extracted topological features from these networks using persistent homology and then employed tools from topological data analysis to perform change point detection on networks. We illustrate our method using both simulated and real data set.

Attributes

Attribute NameValues
Author Luyi Shen
Contributor Lizhen Lin, Research Director
Contributor Jun Li, Committee Member
Contributor Fang Liu, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Applied and Computational Mathematics and Statistics
Degree Name Doctor of Philosophy
Banner Code
  • PHD-ACMS

Defense Date
  • 2021-12-14

Submission Date 2022-04-11
Record Visibility Public
Content License
  • All rights reserved

Departments and Units
Catalog Record

Digital Object Identifier

doi:10.7274/0v838052d15

This DOI is the best way to cite this doctoral dissertation.

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.