Browsing by Subject "Databases, Protein"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Accelerating crystal structure determination with iterative AlphaFold prediction.(Acta crystallographica. Section D, Structural biology, 2023-03) Terwilliger, Thomas C; Afonine, Pavel V; Liebschner, Dorothee; Croll, Tristan I; McCoy, Airlie J; Oeffner, Robert D; Williams, Christopher J; Poon, Billy K; Richardson, Jane S; Read, Randy J; Adams, Paul DExperimental structure determination can be accelerated with artificial intelligence (AI)-based structure-prediction methods such as AlphaFold. Here, an automatic procedure requiring only sequence information and crystallographic data is presented that uses AlphaFold predictions to produce an electron-density map and a structural model. Iterating through cycles of structure prediction is a key element of this procedure: a predicted model rebuilt in one cycle is used as a template for prediction in the next cycle. This procedure was applied to X-ray data for 215 structures released by the Protein Data Bank in a recent six-month period. In 87% of cases our procedure yielded a model with at least 50% of Cα atoms matching those in the deposited models within 2 Å. Predictions from the iterative template-guided prediction procedure were more accurate than those obtained without templates. It is concluded that AlphaFold predictions obtained based on sequence information alone are usually accurate enough to solve the crystallographic phase problem with molecular replacement, and a general strategy for macromolecular structure determination that includes AI-based prediction both as a starting point and as a method of model optimization is suggested.Item Open Access Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.(PLoS Comput Biol, 2011-07) Xing, Chuanhua; Dunson, David BProtein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.Item Open Access DOMINE: a comprehensive collection of known and predicted domain-domain interactions.(Nucleic acids research, 2011-01) Yellaboina, Sailu; Tasneem, Asba; Zaykin, Dmitri V; Raghavachari, Balaji; Jothi, RajaDOMINE is a comprehensive collection of known and predicted domain-domain interactions (DDIs) compiled from 15 different sources. The updated DOMINE includes 2285 new domain-domain interactions (DDIs) inferred from experimentally characterized high-resolution three-dimensional structures, and about 3500 novel predictions by five computational approaches published over the last 3 years. These additions bring the total number of unique DDIs in the updated version to 26,219 among 5140 unique Pfam domains, a 23% increase compared to 20,513 unique DDIs among 4346 unique domains in the previous version. The updated version now contains 6634 known DDIs, and features a new classification scheme to assign confidence levels to predicted DDIs. DOMINE will serve as a valuable resource to those studying protein and domain interactions. Most importantly, DOMINE will not only serve as an excellent reference to bench scientists testing for new interactions but also to bioinformaticans seeking to predict novel protein-protein interactions based on the DDIs. The contents of the DOMINE are available at http://domine.utdallas.edu.Item Open Access Statistical analysis of crystallization database links protein physico-chemical features with crystallization mechanisms.(PLoS One, 2014) Fusco, Diana; Barnum, Timothy J; Bruno, Andrew E; Luft, Joseph R; Snell, Edward H; Mukherjee, Sayan; Charbonneau, PatrickX-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.Item Open Access The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.(Protein science : a publication of the Protein Society, 2022-01) Williams, Christopher J; Richardson, David C; Richardson, Jane SWe have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.