Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

dc.contributor.author

Xing, Chuanhua

dc.contributor.author

Dunson, David B

dc.coverage.spatial

United States

dc.date.accessioned

2017-10-01T21:19:00Z

dc.date.available

2017-10-01T21:19:00Z

dc.date.issued

2011-07

dc.description.abstract

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.

dc.identifier

https://www.ncbi.nlm.nih.gov/pubmed/21829334

dc.identifier

PCOMPBIOL-D-11-00024

dc.identifier.eissn

1553-7358

dc.identifier.uri

https://hdl.handle.net/10161/15602

dc.language

eng

dc.publisher

Public Library of Science (PLoS)

dc.relation.ispartof

PLoS Comput Biol

dc.relation.isversionof

10.1371/journal.pcbi.1002110

dc.subject

Algorithms

dc.subject

Bayes Theorem

dc.subject

Computational Biology

dc.subject

Databases, Protein

dc.subject

Humans

dc.subject

Logistic Models

dc.subject

Protein Interaction Mapping

dc.subject

Proteins

dc.subject

ROC Curve

dc.subject

Reproducibility of Results

dc.title

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

dc.type

Journal article

pubs.author-url

https://www.ncbi.nlm.nih.gov/pubmed/21829334

pubs.begin-page

e1002110

pubs.issue

7

pubs.organisational-group

Duke

pubs.organisational-group

Duke Institute for Brain Sciences

pubs.organisational-group

Electrical and Computer Engineering

pubs.organisational-group

Institutes and Provost's Academic Units

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

Statistical Science

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

University Institutes and Centers

pubs.publication-status

Published

pubs.volume

7

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.pdf
Size:
543 KB
Format:
Adobe Portable Document Format