Browsing by Author "Richardson, Jane S"
Results Per Page
Sort Options
Item Open Access Accelerating crystal structure determination with iterative AlphaFold prediction.(Acta crystallographica. Section D, Structural biology, 2023-03) Terwilliger, Thomas C; Afonine, Pavel V; Liebschner, Dorothee; Croll, Tristan I; McCoy, Airlie J; Oeffner, Robert D; Williams, Christopher J; Poon, Billy K; Richardson, Jane S; Read, Randy J; Adams, Paul DExperimental structure determination can be accelerated with artificial intelligence (AI)-based structure-prediction methods such as AlphaFold. Here, an automatic procedure requiring only sequence information and crystallographic data is presented that uses AlphaFold predictions to produce an electron-density map and a structural model. Iterating through cycles of structure prediction is a key element of this procedure: a predicted model rebuilt in one cycle is used as a template for prediction in the next cycle. This procedure was applied to X-ray data for 215 structures released by the Protein Data Bank in a recent six-month period. In 87% of cases our procedure yielded a model with at least 50% of Cα atoms matching those in the deposited models within 2 Å. Predictions from the iterative template-guided prediction procedure were more accurate than those obtained without templates. It is concluded that AlphaFold predictions obtained based on sequence information alone are usually accurate enough to solve the crystallographic phase problem with molecular replacement, and a general strategy for macromolecular structure determination that includes AI-based prediction both as a starting point and as a method of model optimization is suggested.Item Open Access Analysis and Error Correction in Structures of Macromolecular Interiors and Interfaces(2009) Headd, Jeffrey JohnAs of late 2009, the Protein Data Bank (PDB) has grown to contain over 70,000 models. This recent increase in the amount of structural data allows for more extensive explication of the governing principles of macromolecular folding and association to complement traditional studies focused on a single molecule or complex. PDB-wide characterization of structural features yields insights that are useful in prediction and validation of the 3D structure of macromolecules and their complexes. Here, these insights lead to a deeper understanding of protein--protein interfaces, full-atom critical assessment of increasingly more accurate structure predictions, a better defined library of RNA backbone conformers for validation and building 3D models, and knowledge-based automatic correction of errors in protein sidechain rotamers.
My study of protein--protein interfaces identifies amino acid pairing preferences in a set of 146 transient interfaces. Using a geometric interface surface definition devoid of arbitrary cutoffs common to previous studies of interface composition, I calculate inter- and intrachain amino acid pairing preferences. As expected, salt-bridges and hydrophobic patches are prevalent, but likelihood correction of observed pairing frequencies reveals some surprising pairing preferences, such as Cys-His interchain pairs and Met-Met intrachain pairs. To complement my statistical observations, I introduce a 2D visualization of the 3D interface surface that can display a variety of interface characteristics, including residue type, atomic distance and backbone/sidechain composition.
My study of protein interiors finds that 3D structure prediction from sequence (as part of the CASP experiment) is very close to full-atom accuracy. Validation of structure prediction should therefore consider all atom positions instead of the traditional Calpha-only evaluation. I introduce six new full-model quality criteria to assess the accuracy of CASP predictions, which demonstrate that groups who use structural knowledge culled from the PDB to inform their prediction protocols produce the most accurate results.
My study of RNA backbone introduces a set of rotamer-like "suite" conformers. Initially hand-identified by the Richardson laboratory, these 7D conformers represent backbone segments that are found to be genuine and favorable. X-ray crystallographers can use backbone conformers for model building in often poor backbone density and in validation after refinement. Increasing amounts of high quality RNA data allow for improved conformer identification, but also complicate hand-curation. I demonstrate that affinity propagation successfully differentiates between two related but distinct suite conformers, and is a useful tool for automated conformer clustering.
My study of protein sidechain rotamers in X-ray structures identifies a class of systematic errors that results in sidechains misfit by approximately 180 degrees. I introduce Autofix, a method for automated detection and correction of such errors. Autofix corrects over 40% of errors for Leu, Thr, and Val residues, and a significant number of Arg residues. On average, Autofix made four corrections per PDB file in 945 X-ray structures. Autofix will be implemented into MolProbity and PHENIX for easy integration into X-ray crystallography workflows.
Item Open Access Building Better Backbones: Visualizations, Analyses, and Tools for Higher Quality Macromolecular Structure Models(2010) Chen, Vincent Bin-HanIn this work, I develop computational and visual tools for analyzing and manipulating the backbone of macromolecules, and I demonstrate that these tools support building better structures than currently done. These visualization and analysis tools belong to an "Intelligence Amplification" (IA) tradition (rather than complete Artificial Intelligence (AI) automation), empowering users to improve structures.
Proteins and nucleic acids are among the most important molecules in biology, mediating the majority of biochemical processes that comprise a living organism. Therefore, these macromolecules are important targets, both for basic research to improve understanding of how life works, and for medical research as possible drug targets.
The function of these macromolecules is largely determined by their 3D structure. Although these macromolecules are chemically fairly simple, made up of linear sequences of a few possible subunits, they physically fold into complex, compact structures. Overall, structural biology aims to determine the general relationship between sequence and structure of these macromolecules, through determination of the positions of the atoms within individual macromolecules.
Because it is currently impossible to directly see the position of atoms in a molecule, all structural determination techniques, including X-ray crystallography, NMR, and homology modeling, result in an interpreted model of a structure. Nearly all of these models contain mistakes, in which atoms are fit in incorrect or impossible positions. These mistakes, especially at a functionally-important location in a structure, can mislead both basic and medical research, making it critical for structural biologists to build the highest quality models possible.
This document details how my dissertation work enables the building of better macromolecular structure models. This work follows an iterative development cycle, where visual analysis of models spurs development of better tools, which in turn improves the analysis. First, I describe how my analysis of protein loops from X-ray crystal structures reveals that the traditional definition of loop endpoints is too restrictive. Second, I create a protein backbone analysis and modeling tool, using a new peptide-centric division system. I show how this tool makes it easier to study protein loops, and also how it improves an algorithm for calculating core protein models from NMR residual dipolar coupling (RDC) data. Third, I describe how 3D visualization of RDCs in their structural context improves understanding of RDCs and validates NMR models in a novel way. Fourth, I describe how local quality analysis can diagnose problems in homology models. Fifth, I demonstrate that local quality analysis can be successfully used in conjunction with model rebuilding software to correct errors in low resolution structures. The various tools and software packages I created during the course of my work are freely available and have already made a positive impact on structures being generated by the community.
Archive versions of several of these software packages (JiffiLoop, RDCvis, and KiNG) should be included with this document; current versions can be downloaded from http://kinemage.biochem.duke.edu.
Item Open Access Computational Methods for RNA Structure Validation and Improvement.(Methods Enzymol, 2015) Jain, Swati; Richardson, David C; Richardson, Jane SWith increasing recognition of the roles RNA molecules and RNA/protein complexes play in an unexpected variety of biological processes, understanding of RNA structure-function relationships is of high current importance. To make clean biological interpretations from three-dimensional structures, it is imperative to have high-quality, accurate RNA crystal structures available, and the community has thoroughly embraced that goal. However, due to the many degrees of freedom inherent in RNA structure (especially for the backbone), it is a significant challenge to succeed in building accurate experimental models for RNA structures. This chapter describes the tools and techniques our research group and our collaborators have developed over the years to help RNA structural biologists both evaluate and achieve better accuracy. Expert analysis of large, high-resolution, quality-conscious RNA datasets provides the fundamental information that enables automated methods for robust and efficient error diagnosis in validating RNA structures at all resolutions. The even more crucial goal of correcting the diagnosed outliers has steadily developed toward highly effective, computationally based techniques. Automation enables solving complex issues in large RNA structures, but cannot circumvent the need for thoughtful examination of local details, and so we also provide some guidance for interpreting and acting on the results of current structure validation for RNA.Item Open Access Development of new approaches to NMR data collection for protein structure determination(2007-05-10T16:02:04Z) Coggins, Brian E.Multidimensional nuclear magnetic resonance (NMR) spectroscopy has become one of the most important techniques available for studying the structure and function of biological macromolecules at atomic resolution. The conventional approach to multidimensional NMR involves the sampling of the time domain on a Cartesian grid followed by a multidimensional Fourier transform (FT). While this approach yields high quality spectra, as the number of dimensions is increased the time needed for sampling on a Cartesian grid increases exponentially, making it impractical to record 4-D spectra at high resolution and impossible to record 5-D spectra at all. This thesis describes new approaches to data collection and processing that make it possible to obtain spectra at higher resolution and/or with a higher dimensionality than was previously feasible with the conventional method. The central focus of this work has been the sampling of the time domain along radial spokes, which was recently introduced into the NMR community. If each radial spoke is processed by an FT with respect to radius, a set of projections of the higher-dimensional spectrum are obtained. Full spectra at high resolution can be generated from these projections via tomographic reconstruction. We generalized the lower-value reconstruction algorithm from the literature, and later integrated it with the backprojection algorithm in a hybrid reconstruction method. These methods permit the reconstruction of accurate 4-D and 5- D spectra at very high resolution, from only a small number of projections, as we demonstrated in the reconstruction of 4-D and 5-D sequential assignment spectra on small and large proteins. For nuclear Overhauser spectroscopy (NOESY), used to measure interproton distances in proteins, one requires quantitative reconstructions. We have successfully obtained these using filtered backprojection, which we found was equivalent to processing the radially sampled data by a polar FT. All of these methods represent significant gains in data collection efficiency over conventional approaches. The polar FT interpretation suggested that the problem could be analyzed using FT theory, to design even more efficient methods. We have developed a new approach to sampling, using concentric rings of sampling points, which represents a further improvement in efficiency and sensitivity over radial sampling.Item Open Access Exploring the structurial diversity and engineering potential of thermophilic periplasmic binding proteins(2007-05-02T17:37:41Z) Cuneo, Matthew JosephThe periplasmic binding protein (PBP) superfamily is found throughout the genosphere of both prokaryotic and eukaryotic organisms. PBPs function as receptors in bacterial solute transport and chemotaxis systems; however the same fold is also used in transcriptional regulators, enzymes, and eukaryotic neurotransmitter receptors. This versatility has been exploited for structure-based computational protein design experiments where PBPs have been engineered to bind novel ligands and serve as biosensors for the detection of small-molecule ligands relevant to biomedical or defense-related interests. In order to further understand functional adaptation from a structural biology perspective, and to provide a set of robust starting points for engineering novel biosensors by structure-based design, I have characterized the ligand-binding properties and solved the structure of nine PBPs from various thermophilic bacteria. Analysis of these structures reveals a variety of mechanisms by which diverse function can be encoded in a common fold. It is observed that re-modeling of secondary structure elements (such as insertions, deletions, and loop movements), and re-decoration of amino acid side-chains are common diversification mechanisms in PBPs. Furthermore, the relationship between hinge-bending motion and ligand binding is critical to understanding the function of natural or engineered adaptations in PBPs. Three of these proteins were solved in both the presence and absence of ligand which allowed for the first time the observation and analysis of ligand-induced structural rearrangements in thermophilic PBPs. This work revealed that the magnitude and transduction of local and global ligand-induced motions are diverse throughout the PBP superfamily. Through the analysis of the open-to-closed transition, and the identification of natural structural adaptations in thermophilic members of the PBP superfamily, I reveal strategies which can be applied to computational protein design to significantly improve current strategies.Item Open Access Improved AlphaFold modeling with implicit experimental information.(Nature methods, 2022-11) Terwilliger, Thomas C; Poon, Billy K; Afonine, Pavel V; Schlicksup, Christopher J; Croll, Tristan I; Millán, Claudia; Richardson, Jane S; Read, Randy J; Adams, Paul DMachine-learning prediction algorithms such as AlphaFold and RoseTTAFold can create remarkably accurate protein models, but these models usually have some regions that are predicted with low confidence or poor accuracy. We hypothesized that by implicitly including new experimental information such as a density map, a greater portion of a model could be predicted accurately, and that this might synergistically improve parts of the model that were not fully addressed by either machine learning or experiment alone. An iterative procedure was developed in which AlphaFold models are automatically rebuilt on the basis of experimental density maps and the rebuilt models are used as templates in new AlphaFold predictions. We show that including experimental information improves prediction beyond the improvement obtained with simple rebuilding guided by the experimental data. This procedure for AlphaFold modeling with density has been incorporated into an automated procedure for interpretation of crystallographic and electron cryo-microscopy maps.Item Open Access Local Motion And Local Accuracy In Protein Backbone(2006-09) Davis, Ian WheelerProteins are chemically simple molecules, being unbranched polymers of uncomplicated organic compounds. Nonetheless, they fold up into a dazzling variety of complex and beautiful configurations with a dizzying array of structural, regulatory, and catalytic functions. Despite great progress, we still have very limited ability to predict the folded conformation of an amino acid sequence, and limited understanding of its dynamics and motions. Thus, this work presents a quartet of interrelated studies that address some aspects of the detailed local conformations and motions of protein backbone. First, I used a density-dependent smoothing algorithm and a high-quality, B-filtered data set to construct highly accurate conformational distributions for protein backbone (Ramachandran plots) and sidechains (rotamers). These distributions are the most accurate and restrictive produced to date, with improved discrimination between rare-but-real conformations and artifactual ones. Second, I analyzed hundreds of alternate conformations in atomic resolution crystal structures, and discovered that dramatic conformational change in a protein sidechain is often coupled to a subtle but very common mode of conformational change in its backbone -- the backrub motion. Examination of other biophysical data further supports the ubiquity of this motion. Third, I applied a model of backrub motion to protein design calculations. Although experimental characterization of the designs showed them to be unstable and/or inactive, the computational results proved to be very sensitive to changes in the backbone. Finally, I describe how MolProbity uses my conformational distributions together with all-atom contacts and other tools to validate protein structures, and how those quality metrics can be combined visually or analytically to provide "multi-criterion" validation summaries.Item Restricted Multiscale conformational heterogeneity in staphylococcal protein a: possible determinant of functional plasticity.(Structure, 2014-10-07) Deis, Lindsay N; Pemble, Charles W; Qi, Yang; Hagarman, Andrew; Richardson, David C; Richardson, Jane S; Oas, Terrence GThe Staphylococcus aureus virulence factor staphylococcal protein A (SpA) is a major contributor to bacterial evasion of the host immune system, through high-affinity binding to host proteins such as antibodies. SpA includes five small three-helix-bundle domains (E-D-A-B-C) separated by conserved flexible linkers. Prior attempts to crystallize individual domains in the absence of a binding partner have apparently been unsuccessful. There have also been no previous structures of tandem domains. Here we report the high-resolution crystal structures of a single C domain, and of two B domains connected by the conserved linker. Both structures exhibit extensive multiscale conformational heterogeneity, which required novel modeling protocols. Comparison of domain structures shows that helix1 orientation is especially heterogeneous, coordinated with changes in side chain conformational networks and contacting protein interfaces. This represents the kind of structural plasticity that could enable SpA to bind multiple partners.Item Open Access New tools provide a second look at HDV ribozyme structure, dynamics and cleavage.(Nucleic Acids Res, 2014-11-10) Kapral, Gary J; Jain, Swati; Noeske, Jonas; Doudna, Jennifer A; Richardson, David C; Richardson, Jane SThe hepatitis delta virus (HDV) ribozyme is a self-cleaving RNA enzyme essential for processing viral transcripts during rolling circle viral replication. The first crystal structure of the cleaved ribozyme was solved in 1998, followed by structures of uncleaved, mutant-inhibited and ion-complexed forms. Recently, methods have been developed that make the task of modeling RNA structure and dynamics significantly easier and more reliable. We have used ERRASER and PHENIX to rebuild and re-refine the cleaved and cis-acting C75U-inhibited structures of the HDV ribozyme. The results correct local conformations and identify alternates for RNA residues, many in functionally important regions, leading to improved R values and model validation statistics for both structures. We compare the rebuilt structures to a higher resolution, trans-acting deoxy-inhibited structure of the ribozyme, and conclude that although both inhibited structures are consistent with the currently accepted hammerhead-like mechanism of cleavage, they do not add direct structural evidence to the biochemical and modeling data. However, the rebuilt structures (PDBs: 4PR6, 4PRF) provide a more robust starting point for research on the dynamics and catalytic mechanism of the HDV ribozyme and demonstrate the power of new techniques to make significant improvements in RNA structures that impact biologically relevant conclusions.Item Open Access NMR Structure Improvement: A Structural Bioinformatics & Visualization Approach(2010) Block, JeremyThe overall goal of this project is to enhance the physical accuracy of individual models in macromolecular NMR (Nuclear Magnetic Resonance) structures and the realism of variation within NMR ensembles of models, while improving agreement with the experimental data. A secondary overall goal is to combine synergistically the best aspects of NMR and crystallographic methodologies to better illuminate the underlying joint molecular reality. This is accomplished by using the powerful method of all-atom contact analysis (describing detailed sterics between atoms, including hydrogens); new graphical representations and interactive tools in 3D and virtual reality; and structural bioinformatics approaches to the expanded and enhanced data now available.
The resulting better descriptions of macromolecular structure and its dynamic variation enhances the effectiveness of the many biomedical applications that depend on detailed molecular structure, such as mutational analysis, homology modeling, molecular simulations, protein design, and drug design.
Item Open Access Rare Sidechain Conformations in Proteins and DNA(2015) Hintze, Bradley JoelMedical advances often come as a result of understanding the underlying mechanisms of life. Life, in this sense, happens at various scales. A very complex and interesting one is the molecular scale. Understanding life’s mechanistic details at this level will provide the most promising therapies to modern ailments. Because of structure and function’s close relationship, knowledge of macromolecular structure provides invaluable insight into molecular mechanism.
A major tool used to get structural information at the molecular scale is X-ray crystallography. Such experiments result in an electron density map from which a model is built. Building such a model is a difficult task, especially at low resolu- tion where detailed features in the electron density deteriorate making it difficult to interpret. However, many advances in the field have greatly eased the model build- ing task, in fact, at high resolutions it has become automated. However, human inspection is still required to get a correct solution.
The largest boon to model building has been the application of structural knowl- edge. A prominent example is bond and dihedral angles. We often know what is absolutely not allowed and often convince ourselves we know everything that is al- lowed. This work focuses on the fuzzy border between allowed and disallowed. The hypothesis is that rare structural conformations exist but one needs to take great care in modeling them.
This work has two major components – rotamers (protein sidechain conformation)
and Hoogsteen base pairing in DNA. I first describe methods used to gain empirical knowledge about rotamers and how that knowledge is used in model validation. Part of this knowledge is rotamer-dependent bond angle deviations. I describe how the observation and quantitation of these deviations is used in a novel set of restraints in protein structure refinement. To provide structural context to rare rotamers, I describe where and why some occur.
My DNA work has focused on Hoogsteen base pairing. I describe a collaborative survey of existing Hoogsteen base pairs in the PDB. Lessons learned during the survey led to the other DNA topic, the detection and correction of mismodeled purines. I identified Hoogsteens in the PDB mismodeled as Watson-Crick base pairs. This work underscores that Hoogsteens are extremely rare but nonetheless do occur.
The fuzzy borderland between allowed and disallowed is a strange place filled with the most interesting structural features. My work here has focused on this area, bringing into view many rare conformations. Going forward we need to ensure that conformational frequency is taken into account during model building, refinement, and validation.
Item Open Access RNA 3D Structure Analysis and Validation, and Design Algorithms for Proteins and RNA(2015) Jain, SwatiRNA, or ribonucleic acid, is one of the three biological macromolecule types essential for all known life forms, and is a critical part of a variety of cellular processes. The well known functions of RNA molecules include acting as carriers of genetic information in the form of mRNAs, and then assisting in translation of that information to protein molecules as tRNAs and rRNAs. In recent years, many other kinds of non-coding RNAs have been found, like miRNAs and siRNAs, that are important for gene regulation. Some RNA molecules, called ribozymes, are also known to catalyze biochemical reactions. Functions carried out by these recently discovered RNAs, coupled with the traditionally known functions of tRNAs, mRNAs, and rRNAs make RNA molecules even more crucial and essential components in biology.
Most of the functions mentioned above are carried out by RNA molecules associ- ating themselves with proteins to form Ribonucleoprotein (RNP) complexes, e.g. the ribosome or the splicesosome. RNA molecules also bind a variety of small molecules, such as metabolites, and their binding can turn on or off gene expression. These RNP complexes and small molecule binding RNAs are increasingly being recognized as potential therapeutic targets for drug design. The technique of computational structure-based rational design has been successfully used for designing drugs and inhibitors for protein function, but its potential has not been tapped for design of RNA or RNP complexes. For the success of computational structure-based design, it is important to both understand the features of RNA three-dimensional structure and develop new and improved algorithms for protein and RNA design.
This document details my thesis work that covers both the above mentioned areas. The first part of my thesis work characterizes and analyzes RNA three-dimensional structure, in order to develop new methods for RNA validation and refinement, and new tools for correction of modeling errors in already solved RNA structures. I collaborated to assemble non-redundant and quality-conscious datasets of RNA crystal structures (RNA09 and RNA11), and I analyzed the range of values occupied by the RNA backbone and base dihedral angles to improve methods for RNA structure correction, validation, and refinement in MolProbity and PHENIX. I rebuilt and corrected the pre-cleaved structure of the HDV ribozyme and parts of the 50S ribosomal subunit to demonstrate the potential of new tools and techniques to improve RNA structures and help crystallographers to make correct biological interpretations. I also extended the previous work of characterizing RNA backbone conformers by the RNA Ontology Consortium (ROC) to define new conformers using the data from the larger RNA11 dataset, supplemented by ERRASER runs that optimize data points to add new conformers or improve cluster separation.
The second part of my thesis work develops novel algorithms for structure-based
protein redesign when interactions between distant residue pairs are neglected and the design problem is represented by a sparse residue interaction graph. I analyzed the sequence and energy differences caused by using sparse residue interaction graphs (using the protein redesign package OSPREY), and proposed a novel use of ensemble-based provable design algorithms to mitigate the effects caused by sparse residue interaction graphs. I collaborated to develop a novel branch-decomposition based dynamic programming algorithm, called BWM*, that returns the Global Minimum Energy Conformation (GMEC) for sparse residue interaction graphs much faster than the traditional A* search algorithm. As the final step, I used the results of my analysis of the RNA base dihedral angle and implemented the capability of RNA design and RNA structural flexibility in osprey. My work enables OSPREY to design not only RNA, but also simultaneously design both the RNA and the protein chains in a RNA-protein interface.
Item Open Access RNA Backbone Rotamers and Chiropraxis(2007-07-25) Murray, Laura WestonRNA backbone is biologically important with many roles in reactions and interactions, but has historically been a challenge in structural determination. It has many atoms and torsions to place, and often there is less data on it than one might wish. This problem leads to both random and systematic error, producing noise in an already high-dimensional and complex distribution to further complicate data-driven analysis. With the advent of the ribosomal subunit structures published in 2000, large RNA structures at good resolution, it became possible to apply the Richardson laboratory's quality-filtering, visualization, and analysis techniques to RNA and develop new tools for RNA as well. A first set of 42 RNA backbone rotamers was identified, developed, and published in 2003; it has since been thoroughly overhauled in conjunction with the backbone group of the RNA Ontology Consortium to combine the strengths of different approaches, incorporate new data, and produce a consensus set of 46 conformers. Meanwhile, extensive work has taken place on developing validation and remodeling tools to correct and improve existing structures as well as to assist in initial fitting. The use of base-phosphate perpendicular distances to identify sugar pucker has proven very useful in both hand-refitting and the semi-automated process of using RNABC (RNA Backbone Correction), a program developed in conjunction with Dr. Jack Snoeyink's laboratory. The guanine riboswitch structure ur0039/1U8D, by Dr. Rob Batey's laboratory, has been collaboratively refit and rerefined as a successful test case of the utility of these tools and techniques. Their testing and development will continue, and they are expected to help to improve RNA structure determination in both ease and quality.Item Open Access RNA Backbone Validation, Correction, and Implications for RNA-Protein Interfaces(2013) Kapral, Gary JosephRNA is the molecular workhorse of nature, capable of doing many cellular tasks, from genetic data storage and regulation, to enzymatic synthesis--even to the point of self-catalyzing its own replication. While RNA can act as a catalyst on its own, as in the hammerhead ribozyme, the added efficiency of proteins is often a necessity; the ribosome--the large ribozyme responsible for peptide chain formation, is aided by proteins which ensure correct assembly and structural stability. These complexes of RNA and proteins feature in many essential cellular processes, including the RISC silencing complex and in the spliceosome. Despite its enormous utility, structural determination of RNA is notoriously difficult--particularly in the backbone, since a nucleotide standardly has 12 torsion angles (including χ) and 12 non-hydrogen atoms, compared to 4 torsions (including χ1) and 4 non-H atoms in a typical amino acid. The abundance of backbone atoms, their conformational flexibility, and experimental resolution limitations often result in systematic errors that can have a significant impact on the interpretation. False trails due to structural errors can lead to significant loss of time and effort, especially with such high-profile complexes as the ribosome and the RISC complex.
My research has focused on harnessing the recently discovered ribosome structures and the Richardsons' RNA dataset to find trends in RNA backbone conformations and motifs that were then used to develop structural validation techniques and provide improved diagnosis and correction techniques for RNA backbone. Methods for fixing RNA structure have been developed for both NMR and X-ray crystallography. For NMR structures, a method for assigning RNA backbone structure based on NOE data was developed, leading to improved identification and building of RNA backbone conformation in NMR ensembles. For crystallography, our method of diagnosing the correct ribose pucker from clear observables allows reliable assessment of pucker in validation or refinement. Observed differences in bond-lengths, bond-angles, and dihedrals have been categorized by sugar pucker in the PHENIX refinement package. I have shown that this improves the refinement behavior of both pucker and geometry.
There have also been improvements in identifying structural motifs. Many previously identified structural motifs have now been defined in terms of backbone suitestrings, a series of 2-character code divisions of RNA backbone that show the best clustering of dihedral angle correlations. Combined with a BLAST-like alignment program called SuiteAlign, these suitestrings were quickly and easily identified in a number of structures, eventually leading to the discovery of multiple instances of TψC-loop structures in the ribosome.
To facilitate error diagnosis and corrections in RNA-protein complexes, as well as to expand the knowledge base of the scientific community as a whole, a database of RNA-protein interaction motifs has been developed. This database is rooted in the quality-filtering, visualization, and analysis techniques of the Richardson lab, particularly those developed by Laura Murray specifically for RNA structures.
The consensus backbone conformers, pucker diagnosis, and all-atom contacts have been combined to develop first manual and then automated tools for RNA structure correction. I have applied all these techniques to improve the accuracy of a number of important RNA and RNA/protein complex structures.
Item Open Access The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.(Protein science : a publication of the Protein Society, 2022-01) Williams, Christopher J; Richardson, David C; Richardson, Jane SWe have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.Item Open Access Using C-Alpha Geometry to Describe Protein Secondary Structure and Motifs(2015) Williams, Christopher JosephX-ray crystallography 3D atomic models are used in a variety of research areas to understand and manipulate protein structure. Research and application are dependent on the quality of the models. Low-resolution experimental data is a common problem in crystallography which makes solving structures and producing the reliable models that many scientists depend on difficult.
In this work, I develop new, automated tools for validation and correction of low-resolution structures. These tools are gathered under the name CaBLAM, for C-alpha Based Low-resolution Annotation Method. CaBLAM uses a unique, C-alpha-geometry-based parameter space to identify outliers in protein backbone geometry, and to identify secondary structure that may be masked by modeling errors.
CaBLAM was developed in the Python programming language as part of the Phenix crystallography suite and the open CCTBX Project. It makes use of architecture and methods available in the CCTBX toolbox. Quality-filtered databases of high-resolution protein structures, especially the Top8000, were used to construct contours of expected protein behavior for CaBLAM. CaBLAM has also been integrated into the codebase for the Richardson Lab's online MolProbity validation service.
CaBLAM succeeds in providing useful validation feedback for protein structures in the 2.5-4.0A resolution range. This success demonstrates the relative reliability of the C-alpha; trace of a protein in this resolution range. Full mainchain information can be extrapolated from the C-alpha; trace, especially for regular secondary structure elements.
CaBLAM has also informed our approach to validation for low-resolution structures. Moderation of feedback, to reduce validation overload and to focus user attention on modeling errors that are both significant and correctable, is one of our goals. CaBLAM and the related methods that have grown around it demonstrate the progress towards this goal.
Item Open Access Using Protein-Likeness to Validate Conformational Alternatives(2012) Keedy, Daniel AustinProteins are among the most complex entities known to science. Composed of just 20 fundamental building blocks arranged in simple linear strings, they nonetheless fold into a dizzying array of architectures that carry out the machinations of life at the molecular level.
Despite this central role in biology, we cannot reliably predict the structure of a protein from its sequence, and therefore rely on time-consuming and expensive experimental techniques to determine their structures. Although these methods can reveal equilibrium structures with great accuracy, they unfortunately mask much of the inherent molecular flexibility that enables proteins to dynamically perform biochemical tasks. As a result, much of the field of structural biology is mired in a static perspective; indeed, most attempts to naively model increased structural flexibility still end in failure.
This document details my work to validate alternative protein conformations beyond the primary or equilibrium conformation. The underlying hypothesis is that more realistic modeling of flexibility will enhance our understanding of how natural proteins function, and thereby improve our ability to design new proteins that perform desired novel functions.
During the course of my work, I used structure validation techniques to validate conformational alternatives in a variety of settings. First, I extended previous work introducing the backrub, a local, sidechain-coupled backbone motion, by demonstrating that backrubs also accompany sequence changes and therefore are useful for modeling conformational changes associated with mutations in protein design. Second, I extensively studied a new local backbone motion, helix shear, by documenting its occurrence in both crystal and NMR structures and showing its suitability for expanding conformational search space in protein design. Third, I integrated many types of local alternate conformations in an ultra-high-resolution crystal structure and discovered the combinatorial complexity that arises when adjacent flexible segments combine into networks. Fourth, I used structural bioinformatics techniques to construct smoothed, multi-dimensional torsional distributions that can be used to validate trial conformations or to propose new ones. Fifth, I participated in judging a structure prediction competition by using validation of geometrical and all-atom contact criteria to help define correctness across thousands of submitted conformations. Sixth, using similar tools plus collation of multiple comparable structures from the public database, I determined that low-energy states identified by the popular structure modeling suite Rosetta sometimes are valid conformations likely to be populated in the cell, but more often are invalid conformations attributable to artifacts in the physical/statistical hybrid energy function.
Unified by the theme of validating conformational alternatives by reference to high-quality experimental structures, my cumulative work advances our fundamental understanding of protein structural variability, and will benefit future endeavors to design useful proteins for biomedicine or industrial chemistry.