Some Advancements in Statistical Modeling of Complex Networks

Shen, Luyi

doi:10.7274/0v838052d15

ShenL042022D.pdf (12.36 MB)

Some Advancements in Statistical Modeling of Complex Networks

thesis

posted on 2022-04-11, 00:00 authored by Luyi Shen

The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in models and algorithms for the inference of networks. Among various learning tasks with network data, community detection, which divides nodes into clusters or `communities', and change-point detection, which detects possible change points among multiple networks, have arguably received the most attention in the scientific community.

In many real-world networks, the network data often comes with additional covariates information, which should ideally be leveraged for inference such as for performing community detection. We add to a limited literature on community detection for networks with covariates by proposing a Bayesian model in which the effects of the covariates are incorporated via a covariates-dependent random partition prior, under which a block model is assumed. Under our prior, the covariates information is explicitly expressed in specifying the prior distribution on the cluster membership. Our model has the flexibility of modeling uncertainties of all parameter estimates, including the community membership. Another key feature of our models compared with the existing work is that it has the ability to learn the number of the community via posterior inference without having to assume it to be known. Our model can be applied to community detection in both dense and sparse networks. We carried out a comprehensive simulation study and apply our model to two real data sets which demonstrated superior performance of our model over existing methods.

For community detection in sparse weighted network, we propose a novel Bayesian model, which we call the spike and slab stochastic block model (ssSBM). A random partition model such as the Chinese restaurant process is combined with the spike and slab prior for modeling the community structure as well as the sparsity of the networks. A random partition prior such as the Chinese restaurant process, is imposed on the community structure under which the edge weights are assumed to follow a cluster-dependent spike and slab distribution. One of the key novelties of our model is that it can explicitly model the sparsity levels of edge weights, and the sparsity patterns are allowed to vary according to the community structure. Another appealing feature of our model is that it automatically learns the number of communities in a network without having to assume it is known or estimating it beforehand. Efficient MCMC algorithms are developed for sampling the posterior distribution of the parameters. Extension simulation study and data analysis have been carried out demonstrating the efficiency as well as the utilities of our algorithms.

In the change-point problem, one studies a series of networks indexed by time and wants to find out whether there is a significant change in the structure of these networks at some time point. In our work, different from classical network inference methods, which are mainly based on node, edge, or correlation, we also take the underlying topological structure into account. We extracted topological features from these networks using persistent homology and then employed tools from topological data analysis to perform change point detection on networks. We illustrate our method using both simulated and real data set.

History

Date Modified

2022-06-22

Defense Date

2021-12-14

CIP Code

27.9999

Research Director(s)

Lizhen Lin

Committee Members

Jun Li Fang Liu

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Alternate Identifier

1327697977

Library Record

6234218

OCLC Number

1327697977

Program Name

Applied and Computational Mathematics and Statistics

Usage metrics

Keywords

Not Assigned

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Some Advancements in Statistical Modeling of Complex Networks

History

Date Modified

Defense Date

CIP Code

Research Director(s)

Committee Members

Degree

Degree Level

Alternate Identifier

Library Record

OCLC Number

Program Name

Usage metrics

Categories

Keywords

Licence

Exports