On Graph Clustering or Community Detection: A Characteristic Analysis and Its Implications
Graphs or networks represent relational data in an abstract and unifying form in terms of vertices and edges. Graph clustering in graph study language, or community detection in network science and engineering language, is fundamental to exploratory analysis of relational data, at different levels of depth and in broad applications. The main objectives of graph clustering are to capture, characterize and categorize heterogeneous clusters or diverse community patterns intrinsic to the data, and beyond that, to discover hidden structures or functional relationships.In the present work, we investigate clustering models and methodologies themselves. We make several original and impactful contributions in three key stages. (I) We introduce the dialectical component principle (DCP) for graph clustering. For the DCP realization, we formulate a general model representation in terms of two dialectical encoding functions (interdependent and opposing potentials in coupling and decoupling, or attraction and repulsion) and a trainable balance/bias ratio between the encoding components. The general representation encompasses notable existing models or clustering functions. We analyze the fundamental properties of DCP models acting on graph data. The analysis offers new insight into the known and previously unknown advantages or disadvantages of existing models. Furthermore, the study leads to our design of a novel, competitive and rich family of DCP models, named the BlueRed family. (II) We introduce an innovative characteristic analysis of DCP models on undirected, connected graphs. Conventional comparison of cluster configurations by a clustering function on a graph relies typically on a fixed or lucky value of the so-called resolution parameter, which is related to the balance/bias ratio of DCP components. In contrast, we investigate the relationship among cluster configurations across the entire, infinite range of the resolution parameter. With the characteristic analysis, we address and answer for the first time critical yet previously unasked questions: the existence of global minimum cluster configurations, their locations, their stability and sensitivity, and resolution quantization. (III) We present an algorithmic architecture based on the model family and characteristic analysis. The architecture assembles a set of effective graph operation constituents, deploys a set of economic search strategies for optimal solutions, and is equipped with a set of iteration criteria and post-evaluation measures. It accommodates existing clustering algorithms. We provide empirical evidence and comparison results with synthetic graphs and real-world networks.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info