Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
Abstract
Protein-protein interactions (PPIs) are essential to most fundamental cellular processes.
There has been increasing interest in reconstructing PPIs networks. However, several
critical difficulties exist in obtaining reliable predictions. Noticeably, false positive
rates can be as high as >80%. Error correction from each generating source can be
both time-consuming and inefficient due to the difficulty of covering the errors from
multiple levels of data processing procedures within a single test. We propose a novel
Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL),
to lower the misclassification rate (both false positives and negatives) through automatically
up-weighting data sources that are most informative, while down-weighting less informative
and biased sources. Extensive studies indicate that NBEL is significantly more robust
than the classic naïve Bayes to unreliable, error-prone and contaminated data. On
a large human data set our NBEL approach predicts many more PPIs than naïve Bayes.
This suggests that previous studies may have large numbers of not only false positives
but also false negatives. The validation on two human PPIs datasets having high quality
supports our observations. Our experiments demonstrate that it is feasible to predict
high-throughput PPIs computationally with substantially reduced false positives and
false negatives. The ability of predicting large numbers of PPIs both reliably and
automatically may inspire people to use computational approaches to correct data errors
in general, and may speed up PPIs prediction with high quality. Such a reliable prediction
may provide a solid platform to other studies such as protein functions prediction
and roles of PPIs in disease susceptibility.
Type
Journal articleSubject
AlgorithmsBayes Theorem
Computational Biology
Databases, Protein
Humans
Logistic Models
Protein Interaction Mapping
Proteins
ROC Curve
Reproducibility of Results
Permalink
https://hdl.handle.net/10161/15602Published Version (Please cite this version)
10.1371/journal.pcbi.1002110Publication Info
Xing, Chuanhua; & Dunson, David B (2011). Bayesian inference for genomic data integration reduces misclassification rate in
predicting protein-protein interactions. PLoS Comput Biol, 7(7). pp. e1002110. 10.1371/journal.pcbi.1002110. Retrieved from https://hdl.handle.net/10161/15602.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
David B. Dunson
Arts and Sciences Distinguished Professor of Statistical Science
My research focuses on developing new tools for probabilistic learning from complex
data - methods development is directly motivated by challenging applications in ecology/biodiversity,
neuroscience, environmental health, criminal justice/fairness, and more. We seek
to develop new modeling frameworks, algorithms and corresponding code that can be
used routinely by scientists and decision makers. We are also interested in new inference
framework and in studying theoretical properties

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info
Related items
Showing items related by title, author, creator, and subject.
-
LKB1 Loss induces characteristic patterns of gene expression in human tumors associated with NRF2 activation and attenuation of PI3K-AKT.
Kaufman, Jacob M; Amann, Joseph M; Park, Kyungho; Arasada, Rajeswara Rao; Li, Haotian; Shyr, Yu; Carbone, David P (Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer, 2014-06)Inactivation of serine/threonine kinase 11 (STK11 or LKB1) is common in lung cancer, and understanding the pathways and phenotypes altered as a consequence will aid the development of targeted therapeutic strategies. Gene ... -
Amino acid permeases require COPII components and the ER resident membrane protein Shr3p for packaging into transport vesicles in vitro.
Kuehn, MJ; Schekman, R; Ljungdahl, PO (J Cell Biol, 1996-11)In S. cerevisiae lacking SHR3, amino acid permeases specifically accumulate in membranes of the endoplasmic reticulum (ER) and fail to be transported to the plasma membrane. We examined the requirements of transport of the ... -
G protein beta gamma subunits stimulate phosphorylation of Shc adapter protein.
Touhara, K; Hawes, BE; van Biesen, T; Lefkowitz, RJ (Proc Natl Acad Sci U S A, 1995-09-26)The mechanism of mitogen-activated protein (MAP) kinase activation by pertussis toxin-sensitive Gi-coupled receptors is known to involve the beta gamma subunits of heterotrimeric G proteins (G beta gamma), p21ras activation, ...