Modeling Biological Systems from Heterogeneous Data
Abstract
The past decades have seen rapid development of numerous high-throughput
technologies to observe biomolecular phenomena. High-throughput biological data are
inherently heterogeneous, providing information at the various levels at which organisms
integrate inputs to arrive at an observable phenotype. Approaches are needed to not
only analyze heterogeneous biological data, but also model the complex
experimental observation procedures.
We first present an algorithm for learning dynamic cell cycle
transcriptional regulatory networks from gene expression and
transcription factor binding data. We learn regulatory networks using
dynamic Bayesian network inference algorithms that combine evidence from
gene expression data through the likelihood and evidence from binding data through
an informative structure prior.
We next demonstrate how analysis of cell cycle measurements like gene
expression data are obstructed by sychrony loss in synchronized cell
populations. Due to synchrony loss, population-level cell cycle
measurements are convolutions of the true measurements that would have
been observed when monitoring individual cells. We introduce a fully
parametric, probabilistic model, CLOCCS, capable of characterizing multiple
sources of asynchrony in synchronized cell populations. Using CLOCCS, we
formulate a constrained convex optimization deconvolution algorithm that recovers
single cell estimates from observed population-level measurements.
Our algorithm offers a solution for monitoring individual cells rather than
a population of cells that lose synchrony over time. Using our
deconvolution algorithm, we provide a global high resolution view of cell
cycle gene expression in budding yeast, right from an initial cell
progressing through its cell cycle, to across the newly created mother and
daughter cell.
Proteins, and not gene expression, are responsible for all cellular
functions, and we need to understand how proteins and protein complexes
operate. We introduce PROCTOR, a statistical approach capable of learning
the hidden interaction topology of protein complexes from direct
protein-protein interaction data and indirect co-complexed protein
interaction data. We provide a global view of the budding yeast interactome
depicting how proteins interact with each other via their interfaces to
form macromolecular complexes.
We conclude by demonstrating how our algorithms, utilizing
information from heterogeneous biological data, can provide a dynamic view of regulatory
control in the budding yeast cell cycle.
Type
DissertationDepartment
Computer ScienceSubject
Computer Sciencemachine learning
computational biology
cell cycle
information integration
regulatory networks
dynamic models
Permalink
https://hdl.handle.net/10161/615Citation
Bernard, Allister P. (2008). Modeling Biological Systems from Heterogeneous Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/615.Collections
More Info
Show full item record
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info