dc.contributor.author |
Xing, Chuanhua |
|
dc.contributor.author |
Dunson, David B |
|
dc.coverage.spatial |
United States |
|
dc.date.accessioned |
2017-10-01T21:19:00Z |
|
dc.date.available |
2017-10-01T21:19:00Z |
|
dc.date.issued |
2011-07 |
|
dc.identifier |
https://www.ncbi.nlm.nih.gov/pubmed/21829334 |
|
dc.identifier |
PCOMPBIOL-D-11-00024 |
|
dc.identifier.uri |
https://hdl.handle.net/10161/15602 |
|
dc.description.abstract |
Protein-protein interactions (PPIs) are essential to most fundamental cellular processes.
There has been increasing interest in reconstructing PPIs networks. However, several
critical difficulties exist in obtaining reliable predictions. Noticeably, false positive
rates can be as high as >80%. Error correction from each generating source can be
both time-consuming and inefficient due to the difficulty of covering the errors from
multiple levels of data processing procedures within a single test. We propose a novel
Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL),
to lower the misclassification rate (both false positives and negatives) through automatically
up-weighting data sources that are most informative, while down-weighting less informative
and biased sources. Extensive studies indicate that NBEL is significantly more robust
than the classic naïve Bayes to unreliable, error-prone and contaminated data. On
a large human data set our NBEL approach predicts many more PPIs than naïve Bayes.
This suggests that previous studies may have large numbers of not only false positives
but also false negatives. The validation on two human PPIs datasets having high quality
supports our observations. Our experiments demonstrate that it is feasible to predict
high-throughput PPIs computationally with substantially reduced false positives and
false negatives. The ability of predicting large numbers of PPIs both reliably and
automatically may inspire people to use computational approaches to correct data errors
in general, and may speed up PPIs prediction with high quality. Such a reliable prediction
may provide a solid platform to other studies such as protein functions prediction
and roles of PPIs in disease susceptibility.
|
|
dc.language |
eng |
|
dc.publisher |
Public Library of Science (PLoS) |
|
dc.relation.ispartof |
PLoS Comput Biol |
|
dc.relation.isversionof |
10.1371/journal.pcbi.1002110 |
|
dc.subject |
Algorithms |
|
dc.subject |
Bayes Theorem |
|
dc.subject |
Computational Biology |
|
dc.subject |
Databases, Protein |
|
dc.subject |
Humans |
|
dc.subject |
Logistic Models |
|
dc.subject |
Protein Interaction Mapping |
|
dc.subject |
Proteins |
|
dc.subject |
ROC Curve |
|
dc.subject |
Reproducibility of Results |
|
dc.title |
Bayesian inference for genomic data integration reduces misclassification rate in
predicting protein-protein interactions.
|
|
dc.type |
Journal article |
|
duke.contributor.id |
Dunson, David B|0277221 |
|
pubs.author-url |
https://www.ncbi.nlm.nih.gov/pubmed/21829334 |
|
pubs.begin-page |
e1002110 |
|
pubs.issue |
7 |
|
pubs.organisational-group |
Duke |
|
pubs.organisational-group |
Duke Institute for Brain Sciences |
|
pubs.organisational-group |
Electrical and Computer Engineering |
|
pubs.organisational-group |
Institutes and Provost's Academic Units |
|
pubs.organisational-group |
Pratt School of Engineering |
|
pubs.organisational-group |
Statistical Science |
|
pubs.organisational-group |
Trinity College of Arts & Sciences |
|
pubs.organisational-group |
University Institutes and Centers |
|
pubs.publication-status |
Published |
|
pubs.volume |
7 |
|
dc.identifier.eissn |
1553-7358 |
|