On Graph Clustering or Community Detection: A Characteristic Analysis and Its Implications
Date
2021
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Graphs or networks represent relational data in an abstract and unifying form in terms of vertices and edges. Graph clustering in graph study language, or community detection in network science and engineering language, is fundamental to exploratory analysis of relational data, at different levels of depth and in broad applications. The main objectives of graph clustering are to capture, characterize and categorize heterogeneous clusters or diverse community patterns intrinsic to the data, and beyond that, to discover hidden structures or functional relationships.In the present work, we investigate clustering models and methodologies themselves. We make several original and impactful contributions in three key stages. (I) We introduce the dialectical component principle (DCP) for graph clustering. For the DCP realization, we formulate a general model representation in terms of two dialectical encoding functions (interdependent and opposing potentials in coupling and decoupling, or attraction and repulsion) and a trainable balance/bias ratio between the encoding components. The general representation encompasses notable existing models or clustering functions. We analyze the fundamental properties of DCP models acting on graph data. The analysis offers new insight into the known and previously unknown advantages or disadvantages of existing models. Furthermore, the study leads to our design of a novel, competitive and rich family of DCP models, named the BlueRed family. (II) We introduce an innovative characteristic analysis of DCP models on undirected, connected graphs. Conventional comparison of cluster configurations by a clustering function on a graph relies typically on a fixed or lucky value of the so-called resolution parameter, which is related to the balance/bias ratio of DCP components. In contrast, we investigate the relationship among cluster configurations across the entire, infinite range of the resolution parameter. With the characteristic analysis, we address and answer for the first time critical yet previously unasked questions: the existence of global minimum cluster configurations, their locations, their stability and sensitivity, and resolution quantization. (III) We present an algorithmic architecture based on the model family and characteristic analysis. The architecture assembles a set of effective graph operation constituents, deploys a set of economic search strategies for optimal solutions, and is equipped with a set of iteration criteria and post-evaluation measures. It accommodates existing clustering algorithms. We provide empirical evidence and comparison results with synthetic graphs and real-world networks.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Liu, Tiancheng (2021). On Graph Clustering or Community Detection: A Characteristic Analysis and Its Implications. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/23740.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.