Novel Identification of Radiomic Biomarkers with Langevin Annealing


Yin, Fang-Fang

Lafata, Kyle





Medical Physics


As modern diagnostic imaging systems become increasingly more quantitative, new techniques and scientific disciplines are emerging as powerful avenues to personalized medicine. Leading this paradigm shift is the field of radiomics, which attempts to identify computational biomarkers hidden within high-throughput imaging data. Radiomic biomarkers may be able to non-invasively detect the underlying phenotype of an image, leading to new insights and innovation. Such insights may include correlations between radiomic features and pathological information, treatment response, functional characteristics, etc. Searching for meaningful structure within these quantitative datasets is therefore fundamental to contemporary imaging science.

However, imaging data is being created at an alarming rate, and the ability to understand hyper-dimensional relationships between radiomic features is often non-trivial. This is a major challenge for radiomic applications to clinical medicine. There is an urgent need to investigate novel technologies to manage this challenge, so that radiomics can be effectively and efficiently used to solve complex clinical problems.

Major contributions of this dissertation research include: (1) the development of a novel data clustering algorithm called Langevin annealing; (2) the development of a translational research environment to use this clustering algorithm for oncological imaging characterization; and (3) applications of the developed technique for clinical diagnosis and treatment evaluation.

Cluster analysis – i.e., the grouping of similar data objects together based on their intrinsic properties – is a common approach to understanding otherwise non-trivial data. Although data clustering is a hallmark of many fields, it is generally an ill-defined practice. Notable limitations and challenges include: (1) defining the appropriate number of clusters, (2) poor optimization near local minima, and (3) black-box approaches that often make interpretation difficult.

To overcome some of these challenges, data clustering may benefit from physics intuition. Langevin Annealing models radiomics data as a dynamical system in equilibrium with a heat bath. The method is briefly summarized as follows. (1) A radial basis function is used to construct a density distribution, , from the radiomics data. (2) A potential, , is then constructed such that is the ground-state solution to the time-independent Schrödinger equation. (3) Using , Langevin dynamics are formulated at sub-critical temperature to avoid ergodicity, and the radiomic feature vectors are propagated as the system evolves.

The time dynamics of individual radiomic feature vectors lead to different metastable states, which are interpreted as clusters. Clustering is achieved when subsets of the data aggregate near minima of . While the radiomic feature vectors are pushed towards potential minima by the potential gradient, , Brownian motion allows them to effectively tunnel through local potential barriers and escape saddle points into functional locations of the potential surface otherwise forbidden. Nearly degenerate local minima can merge, allowing hyper-dimensional radiomics data to be explored at high resolution, while still maintaining a reasonably narrow impulse response.

Since radiomics is still a rather immature field, there is currently a lack of commercially-available software. Therefore, a radiomic feature extraction platform was developed to facilitate this dissertation research. The extraction code – which is the primary focus of Chapter 2 – is the means to converting unstructured data (i.e., images) into structured data (i.e., features). It therefore serves as a translational research environment that provides the necessary input to subsequent radiomic analyses.

Imaging features derived from dynamic environments – such as the lungs – are highly susceptible to variability and motion artifacts. Before implementing major analyses and new techniques, Chapter 3 investigates the spatial-temporal variability of radiomic features. This problem is approached based on computational experiments using both (a) a simulated dynamic digital phantom and (b) real patient CT data. Key findings demonstrate that radiomic features are sensitive to spatial-temporal changes, which may influence the quality of feature analyses. In general, radiomic feature-sensitivity is shown to be broad and inherently feature-specific.

The theory and development of Langevin annealing is covered in Chapter 4, where a complete theory and mathematical derivation is formulated. Several illustrating examples and computational simulations are used to demonstrate the clustering technique. Chapter 5 provides a comprehensive validation of Langevin annealing using a common benchmark dataset. Accurate ergodic sampling is achieved, clustering performance is evaluated, hyper-parameters are characterized, and the approach is compared to several well-known clustering algorithms.

While this dissertation has broader application to many aspects of medical imaging, a majority of the analysis is conducted on patients with non-small cell lung cancer (NSCLC). In particular, two classes of CT-based radiomic biomarkers are considered throughout this work: (1) radiomic biomarkers derived from lungs, and (2) radiomic biomarkers derived from tumors. These radiomic biomarkers are characterized in Chapter 6 and Chapter 7, where the emphasis of the dissertation shifts from highly theoretical work to a more application-driven focus.

Radiomic lung biomarkers are investigated in Chapter 6, where several associations are identified linking the quantitative imaging data to pulmonary function. In general, patients with larger lungs of homogeneous, low attenuating pulmonary tissue are shown to have worse pulmonary function. Radiomic tumor biomarkers are investigated in Chapter 7, where several associations are identified linking the quantitative imaging data to treatment response. In general, relatively dense tumors with a homogenous coarse texture are shown to be linked with higher rates of local cancer recurrence following stereotactic body radiation therapy.



Medical imaging


Computational physics




Data Clustering


Langevin Dynamics


Quantitative Imaging




Stochastic Dynamical Systems


Novel Identification of Radiomic Biomarkers with Langevin Annealing






Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
18.06 MB
Adobe Portable Document Format