Modeling Biological Systems from Heterogeneous Data

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



The past decades have seen rapid development of numerous high-throughput technologies to observe biomolecular phenomena. High-throughput biological data are inherently heterogeneous, providing information at the various levels at which organisms integrate inputs to arrive at an observable phenotype. Approaches are needed to not only analyze heterogeneous biological data, but also model the complex experimental observation procedures.

We first present an algorithm for learning dynamic cell cycle transcriptional regulatory networks from gene expression and transcription factor binding data. We learn regulatory networks using dynamic Bayesian network inference algorithms that combine evidence from gene expression data through the likelihood and evidence from binding data through an informative structure prior.

We next demonstrate how analysis of cell cycle measurements like gene expression data are obstructed by sychrony loss in synchronized cell populations. Due to synchrony loss, population-level cell cycle measurements are convolutions of the true measurements that would have been observed when monitoring individual cells. We introduce a fully parametric, probabilistic model, CLOCCS, capable of characterizing multiple sources of asynchrony in synchronized cell populations. Using CLOCCS, we formulate a constrained convex optimization deconvolution algorithm that recovers single cell estimates from observed population-level measurements. Our algorithm offers a solution for monitoring individual cells rather than a population of cells that lose synchrony over time. Using our deconvolution algorithm, we provide a global high resolution view of cell cycle gene expression in budding yeast, right from an initial cell progressing through its cell cycle, to across the newly created mother and daughter cell.

Proteins, and not gene expression, are responsible for all cellular functions, and we need to understand how proteins and protein complexes operate. We introduce PROCTOR, a statistical approach capable of learning the hidden interaction topology of protein complexes from direct protein-protein interaction data and indirect co-complexed protein interaction data. We provide a global view of the budding yeast interactome depicting how proteins interact with each other via their interfaces to form macromolecular complexes.

We conclude by demonstrating how our algorithms, utilizing information from heterogeneous biological data, can provide a dynamic view of regulatory control in the budding yeast cell cycle.





Bernard, Allister P. (2008). Modeling Biological Systems from Heterogeneous Data. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.