Communities in Social Networks: Detection, Heterogeneity and Experimentation
The study of network data in the social and health sciences frequently concentrates on understanding how and why connections form. In particular, the task of determining latent mechanisms driving connection has received a lot of attention across statistics, machine learning, and information theory. In social networks, this mechanism often manifests as community structure. As a result, this work provides methods for discovering and leveraging these communities to better understand networks and the data they generate.
We provide three main contributions. First, we present methodology for performing community detection in challenging regimes. Existing literature has focused on modeling the spectral embedding of a network using Gaussian mixture models (GMMs) in scaling regimes where the ability to detect community memberships improves with the size of the network. However, these regimes are not very realistic. As such, we provide tractable methodology motivated by new theoretical results for networks with non-vanishing noise by using GMMs that incorporate truncation and shrinkage effects.
Further, when covariate information is available, often we want to understand how covariates impact connections. It is likely that the effects of covariates on edge formation differ between communities (e.g. age might play a different role in friendship formation in communities across a city). To address this issue, we introduce a latent space network model where coefficients associated with certain covariates can depend on latent community membership of the nodes. We show that ignoring such structure can lead to either over- or under-estimation of covariate importance to edge formation and propose a Markov Chain Monte Carlo approach for simultaneously learning the latent community structure and the community specific coefficients.
Finally, we consider how community structure can impact experimentation. It is evident that communities can act in different ways, and it is natural that this propagates into experimental design. As as result, this observation motivates our development of community informed experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design improves estimation of global average treatment effect, even when the community structure of the graph needs to be estimated.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info