Show simple item record

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

dc.contributor.author Xing, Chuanhua
dc.contributor.author Dunson, David B
dc.coverage.spatial United States
dc.date.accessioned 2017-10-01T21:19:00Z
dc.date.available 2017-10-01T21:19:00Z
dc.date.issued 2011-07
dc.identifier https://www.ncbi.nlm.nih.gov/pubmed/21829334
dc.identifier PCOMPBIOL-D-11-00024
dc.identifier.uri https://hdl.handle.net/10161/15602
dc.description.abstract Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.
dc.language eng
dc.publisher Public Library of Science (PLoS)
dc.relation.ispartof PLoS Comput Biol
dc.relation.isversionof 10.1371/journal.pcbi.1002110
dc.subject Algorithms
dc.subject Bayes Theorem
dc.subject Computational Biology
dc.subject Databases, Protein
dc.subject Humans
dc.subject Logistic Models
dc.subject Protein Interaction Mapping
dc.subject Proteins
dc.subject ROC Curve
dc.subject Reproducibility of Results
dc.title Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.
dc.type Journal article
duke.contributor.id Dunson, David B|0277221
pubs.author-url https://www.ncbi.nlm.nih.gov/pubmed/21829334
pubs.begin-page e1002110
pubs.issue 7
pubs.organisational-group Duke
pubs.organisational-group Duke Institute for Brain Sciences
pubs.organisational-group Electrical and Computer Engineering
pubs.organisational-group Institutes and Provost's Academic Units
pubs.organisational-group Pratt School of Engineering
pubs.organisational-group Statistical Science
pubs.organisational-group Trinity College of Arts & Sciences
pubs.organisational-group University Institutes and Centers
pubs.publication-status Published
pubs.volume 7
dc.identifier.eissn 1553-7358


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record