Browsing by Author "Richardson, David C"
Results Per Page
Sort Options
Item Open Access Building Better Backbones: Visualizations, Analyses, and Tools for Higher Quality Macromolecular Structure Models(2010) Chen, Vincent Bin-HanIn this work, I develop computational and visual tools for analyzing and manipulating the backbone of macromolecules, and I demonstrate that these tools support building better structures than currently done. These visualization and analysis tools belong to an "Intelligence Amplification" (IA) tradition (rather than complete Artificial Intelligence (AI) automation), empowering users to improve structures.
Proteins and nucleic acids are among the most important molecules in biology, mediating the majority of biochemical processes that comprise a living organism. Therefore, these macromolecules are important targets, both for basic research to improve understanding of how life works, and for medical research as possible drug targets.
The function of these macromolecules is largely determined by their 3D structure. Although these macromolecules are chemically fairly simple, made up of linear sequences of a few possible subunits, they physically fold into complex, compact structures. Overall, structural biology aims to determine the general relationship between sequence and structure of these macromolecules, through determination of the positions of the atoms within individual macromolecules.
Because it is currently impossible to directly see the position of atoms in a molecule, all structural determination techniques, including X-ray crystallography, NMR, and homology modeling, result in an interpreted model of a structure. Nearly all of these models contain mistakes, in which atoms are fit in incorrect or impossible positions. These mistakes, especially at a functionally-important location in a structure, can mislead both basic and medical research, making it critical for structural biologists to build the highest quality models possible.
This document details how my dissertation work enables the building of better macromolecular structure models. This work follows an iterative development cycle, where visual analysis of models spurs development of better tools, which in turn improves the analysis. First, I describe how my analysis of protein loops from X-ray crystal structures reveals that the traditional definition of loop endpoints is too restrictive. Second, I create a protein backbone analysis and modeling tool, using a new peptide-centric division system. I show how this tool makes it easier to study protein loops, and also how it improves an algorithm for calculating core protein models from NMR residual dipolar coupling (RDC) data. Third, I describe how 3D visualization of RDCs in their structural context improves understanding of RDCs and validates NMR models in a novel way. Fourth, I describe how local quality analysis can diagnose problems in homology models. Fifth, I demonstrate that local quality analysis can be successfully used in conjunction with model rebuilding software to correct errors in low resolution structures. The various tools and software packages I created during the course of my work are freely available and have already made a positive impact on structures being generated by the community.
Archive versions of several of these software packages (JiffiLoop, RDCvis, and KiNG) should be included with this document; current versions can be downloaded from http://kinemage.biochem.duke.edu.
Item Open Access Computational Methods for RNA Structure Validation and Improvement.(Methods Enzymol, 2015) Jain, Swati; Richardson, David C; Richardson, Jane SWith increasing recognition of the roles RNA molecules and RNA/protein complexes play in an unexpected variety of biological processes, understanding of RNA structure-function relationships is of high current importance. To make clean biological interpretations from three-dimensional structures, it is imperative to have high-quality, accurate RNA crystal structures available, and the community has thoroughly embraced that goal. However, due to the many degrees of freedom inherent in RNA structure (especially for the backbone), it is a significant challenge to succeed in building accurate experimental models for RNA structures. This chapter describes the tools and techniques our research group and our collaborators have developed over the years to help RNA structural biologists both evaluate and achieve better accuracy. Expert analysis of large, high-resolution, quality-conscious RNA datasets provides the fundamental information that enables automated methods for robust and efficient error diagnosis in validating RNA structures at all resolutions. The even more crucial goal of correcting the diagnosed outliers has steadily developed toward highly effective, computationally based techniques. Automation enables solving complex issues in large RNA structures, but cannot circumvent the need for thoughtful examination of local details, and so we also provide some guidance for interpreting and acting on the results of current structure validation for RNA.Item Open Access Describing the Statistical Conformation of Highly Flexible Proteins by Small-Angle X-ray Scattering(2014) Wiersma Capp, Jo AnnaSmall-angle X-ray scattering (SAXS) is a biophysical technique that allows one to study the statistical conformation of a biopolymer in solution. The two-dimensional data obtained from SAXS is a low-resolution probe of the statistical conformation- it is a population weighted orientational average of all conformers within a conformational ensemble. Traditional biological SAXS experiments seek to describe an "average" structure of a protein, or enumerate a "minimal ensemble" of a protein at the atomic resolution scale. However, for highly flexible proteins, an average structure or minimal ensemble may be insufficient for enumeration of conformational space, and may be an over-parameterized model of the statistical conformation. This work describes a SAXS analysis of highly flexible proteins and presents a protocol for describing the statistical conformation based on minimally parameterized polymer physics models and judicious use of ensemble modeling. This protocol is applied to the structural characterization of S. aureus protein A - a crucial virulence factor - and Fibronectin III domains 1-2 - an important structural protein.
Item Open Access Local Motion And Local Accuracy In Protein Backbone(2006-09) Davis, Ian WheelerProteins are chemically simple molecules, being unbranched polymers of uncomplicated organic compounds. Nonetheless, they fold up into a dazzling variety of complex and beautiful configurations with a dizzying array of structural, regulatory, and catalytic functions. Despite great progress, we still have very limited ability to predict the folded conformation of an amino acid sequence, and limited understanding of its dynamics and motions. Thus, this work presents a quartet of interrelated studies that address some aspects of the detailed local conformations and motions of protein backbone. First, I used a density-dependent smoothing algorithm and a high-quality, B-filtered data set to construct highly accurate conformational distributions for protein backbone (Ramachandran plots) and sidechains (rotamers). These distributions are the most accurate and restrictive produced to date, with improved discrimination between rare-but-real conformations and artifactual ones. Second, I analyzed hundreds of alternate conformations in atomic resolution crystal structures, and discovered that dramatic conformational change in a protein sidechain is often coupled to a subtle but very common mode of conformational change in its backbone -- the backrub motion. Examination of other biophysical data further supports the ubiquity of this motion. Third, I applied a model of backrub motion to protein design calculations. Although experimental characterization of the designs showed them to be unstable and/or inactive, the computational results proved to be very sensitive to changes in the backbone. Finally, I describe how MolProbity uses my conformational distributions together with all-atom contacts and other tools to validate protein structures, and how those quality metrics can be combined visually or analytically to provide "multi-criterion" validation summaries.Item Restricted Multiscale conformational heterogeneity in staphylococcal protein a: possible determinant of functional plasticity.(Structure, 2014-10-07) Deis, Lindsay N; Pemble, Charles W; Qi, Yang; Hagarman, Andrew; Richardson, David C; Richardson, Jane S; Oas, Terrence GThe Staphylococcus aureus virulence factor staphylococcal protein A (SpA) is a major contributor to bacterial evasion of the host immune system, through high-affinity binding to host proteins such as antibodies. SpA includes five small three-helix-bundle domains (E-D-A-B-C) separated by conserved flexible linkers. Prior attempts to crystallize individual domains in the absence of a binding partner have apparently been unsuccessful. There have also been no previous structures of tandem domains. Here we report the high-resolution crystal structures of a single C domain, and of two B domains connected by the conserved linker. Both structures exhibit extensive multiscale conformational heterogeneity, which required novel modeling protocols. Comparison of domain structures shows that helix1 orientation is especially heterogeneous, coordinated with changes in side chain conformational networks and contacting protein interfaces. This represents the kind of structural plasticity that could enable SpA to bind multiple partners.Item Open Access New tools provide a second look at HDV ribozyme structure, dynamics and cleavage.(Nucleic Acids Res, 2014-11-10) Kapral, Gary J; Jain, Swati; Noeske, Jonas; Doudna, Jennifer A; Richardson, David C; Richardson, Jane SThe hepatitis delta virus (HDV) ribozyme is a self-cleaving RNA enzyme essential for processing viral transcripts during rolling circle viral replication. The first crystal structure of the cleaved ribozyme was solved in 1998, followed by structures of uncleaved, mutant-inhibited and ion-complexed forms. Recently, methods have been developed that make the task of modeling RNA structure and dynamics significantly easier and more reliable. We have used ERRASER and PHENIX to rebuild and re-refine the cleaved and cis-acting C75U-inhibited structures of the HDV ribozyme. The results correct local conformations and identify alternates for RNA residues, many in functionally important regions, leading to improved R values and model validation statistics for both structures. We compare the rebuilt structures to a higher resolution, trans-acting deoxy-inhibited structure of the ribozyme, and conclude that although both inhibited structures are consistent with the currently accepted hammerhead-like mechanism of cleavage, they do not add direct structural evidence to the biochemical and modeling data. However, the rebuilt structures (PDBs: 4PR6, 4PRF) provide a more robust starting point for research on the dynamics and catalytic mechanism of the HDV ribozyme and demonstrate the power of new techniques to make significant improvements in RNA structures that impact biologically relevant conclusions.Item Open Access NMR Structure Improvement: A Structural Bioinformatics & Visualization Approach(2010) Block, JeremyThe overall goal of this project is to enhance the physical accuracy of individual models in macromolecular NMR (Nuclear Magnetic Resonance) structures and the realism of variation within NMR ensembles of models, while improving agreement with the experimental data. A secondary overall goal is to combine synergistically the best aspects of NMR and crystallographic methodologies to better illuminate the underlying joint molecular reality. This is accomplished by using the powerful method of all-atom contact analysis (describing detailed sterics between atoms, including hydrogens); new graphical representations and interactive tools in 3D and virtual reality; and structural bioinformatics approaches to the expanded and enhanced data now available.
The resulting better descriptions of macromolecular structure and its dynamic variation enhances the effectiveness of the many biomedical applications that depend on detailed molecular structure, such as mutational analysis, homology modeling, molecular simulations, protein design, and drug design.
Item Open Access Rare Sidechain Conformations in Proteins and DNA(2015) Hintze, Bradley JoelMedical advances often come as a result of understanding the underlying mechanisms of life. Life, in this sense, happens at various scales. A very complex and interesting one is the molecular scale. Understanding life’s mechanistic details at this level will provide the most promising therapies to modern ailments. Because of structure and function’s close relationship, knowledge of macromolecular structure provides invaluable insight into molecular mechanism.
A major tool used to get structural information at the molecular scale is X-ray crystallography. Such experiments result in an electron density map from which a model is built. Building such a model is a difficult task, especially at low resolu- tion where detailed features in the electron density deteriorate making it difficult to interpret. However, many advances in the field have greatly eased the model build- ing task, in fact, at high resolutions it has become automated. However, human inspection is still required to get a correct solution.
The largest boon to model building has been the application of structural knowl- edge. A prominent example is bond and dihedral angles. We often know what is absolutely not allowed and often convince ourselves we know everything that is al- lowed. This work focuses on the fuzzy border between allowed and disallowed. The hypothesis is that rare structural conformations exist but one needs to take great care in modeling them.
This work has two major components – rotamers (protein sidechain conformation)
and Hoogsteen base pairing in DNA. I first describe methods used to gain empirical knowledge about rotamers and how that knowledge is used in model validation. Part of this knowledge is rotamer-dependent bond angle deviations. I describe how the observation and quantitation of these deviations is used in a novel set of restraints in protein structure refinement. To provide structural context to rare rotamers, I describe where and why some occur.
My DNA work has focused on Hoogsteen base pairing. I describe a collaborative survey of existing Hoogsteen base pairs in the PDB. Lessons learned during the survey led to the other DNA topic, the detection and correction of mismodeled purines. I identified Hoogsteens in the PDB mismodeled as Watson-Crick base pairs. This work underscores that Hoogsteens are extremely rare but nonetheless do occur.
The fuzzy borderland between allowed and disallowed is a strange place filled with the most interesting structural features. My work here has focused on this area, bringing into view many rare conformations. Going forward we need to ensure that conformational frequency is taken into account during model building, refinement, and validation.
Item Open Access RNA Backbone Rotamers and Chiropraxis(2007-07-25) Murray, Laura WestonRNA backbone is biologically important with many roles in reactions and interactions, but has historically been a challenge in structural determination. It has many atoms and torsions to place, and often there is less data on it than one might wish. This problem leads to both random and systematic error, producing noise in an already high-dimensional and complex distribution to further complicate data-driven analysis. With the advent of the ribosomal subunit structures published in 2000, large RNA structures at good resolution, it became possible to apply the Richardson laboratory's quality-filtering, visualization, and analysis techniques to RNA and develop new tools for RNA as well. A first set of 42 RNA backbone rotamers was identified, developed, and published in 2003; it has since been thoroughly overhauled in conjunction with the backbone group of the RNA Ontology Consortium to combine the strengths of different approaches, incorporate new data, and produce a consensus set of 46 conformers. Meanwhile, extensive work has taken place on developing validation and remodeling tools to correct and improve existing structures as well as to assist in initial fitting. The use of base-phosphate perpendicular distances to identify sugar pucker has proven very useful in both hand-refitting and the semi-automated process of using RNABC (RNA Backbone Correction), a program developed in conjunction with Dr. Jack Snoeyink's laboratory. The guanine riboswitch structure ur0039/1U8D, by Dr. Rob Batey's laboratory, has been collaboratively refit and rerefined as a successful test case of the utility of these tools and techniques. Their testing and development will continue, and they are expected to help to improve RNA structure determination in both ease and quality.Item Open Access RNA Backbone Validation, Correction, and Implications for RNA-Protein Interfaces(2013) Kapral, Gary JosephRNA is the molecular workhorse of nature, capable of doing many cellular tasks, from genetic data storage and regulation, to enzymatic synthesis--even to the point of self-catalyzing its own replication. While RNA can act as a catalyst on its own, as in the hammerhead ribozyme, the added efficiency of proteins is often a necessity; the ribosome--the large ribozyme responsible for peptide chain formation, is aided by proteins which ensure correct assembly and structural stability. These complexes of RNA and proteins feature in many essential cellular processes, including the RISC silencing complex and in the spliceosome. Despite its enormous utility, structural determination of RNA is notoriously difficult--particularly in the backbone, since a nucleotide standardly has 12 torsion angles (including χ) and 12 non-hydrogen atoms, compared to 4 torsions (including χ1) and 4 non-H atoms in a typical amino acid. The abundance of backbone atoms, their conformational flexibility, and experimental resolution limitations often result in systematic errors that can have a significant impact on the interpretation. False trails due to structural errors can lead to significant loss of time and effort, especially with such high-profile complexes as the ribosome and the RISC complex.
My research has focused on harnessing the recently discovered ribosome structures and the Richardsons' RNA dataset to find trends in RNA backbone conformations and motifs that were then used to develop structural validation techniques and provide improved diagnosis and correction techniques for RNA backbone. Methods for fixing RNA structure have been developed for both NMR and X-ray crystallography. For NMR structures, a method for assigning RNA backbone structure based on NOE data was developed, leading to improved identification and building of RNA backbone conformation in NMR ensembles. For crystallography, our method of diagnosing the correct ribose pucker from clear observables allows reliable assessment of pucker in validation or refinement. Observed differences in bond-lengths, bond-angles, and dihedrals have been categorized by sugar pucker in the PHENIX refinement package. I have shown that this improves the refinement behavior of both pucker and geometry.
There have also been improvements in identifying structural motifs. Many previously identified structural motifs have now been defined in terms of backbone suitestrings, a series of 2-character code divisions of RNA backbone that show the best clustering of dihedral angle correlations. Combined with a BLAST-like alignment program called SuiteAlign, these suitestrings were quickly and easily identified in a number of structures, eventually leading to the discovery of multiple instances of TψC-loop structures in the ribosome.
To facilitate error diagnosis and corrections in RNA-protein complexes, as well as to expand the knowledge base of the scientific community as a whole, a database of RNA-protein interaction motifs has been developed. This database is rooted in the quality-filtering, visualization, and analysis techniques of the Richardson lab, particularly those developed by Laura Murray specifically for RNA structures.
The consensus backbone conformers, pucker diagnosis, and all-atom contacts have been combined to develop first manual and then automated tools for RNA structure correction. I have applied all these techniques to improve the accuracy of a number of important RNA and RNA/protein complex structures.
Item Open Access The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.(Protein science : a publication of the Protein Society, 2022-01) Williams, Christopher J; Richardson, David C; Richardson, Jane SWe have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.Item Open Access The Statistical Conformation of a Highly Flexible Protein: Small-Angle X-Ray Scattering of S. aureus Protein A(STRUCTURE, 2014-08-05) Capp, Jo A; Hagarman, Andrew; Richardson, David C; Oas, Terrence GItem Open Access Using C-Alpha Geometry to Describe Protein Secondary Structure and Motifs(2015) Williams, Christopher JosephX-ray crystallography 3D atomic models are used in a variety of research areas to understand and manipulate protein structure. Research and application are dependent on the quality of the models. Low-resolution experimental data is a common problem in crystallography which makes solving structures and producing the reliable models that many scientists depend on difficult.
In this work, I develop new, automated tools for validation and correction of low-resolution structures. These tools are gathered under the name CaBLAM, for C-alpha Based Low-resolution Annotation Method. CaBLAM uses a unique, C-alpha-geometry-based parameter space to identify outliers in protein backbone geometry, and to identify secondary structure that may be masked by modeling errors.
CaBLAM was developed in the Python programming language as part of the Phenix crystallography suite and the open CCTBX Project. It makes use of architecture and methods available in the CCTBX toolbox. Quality-filtered databases of high-resolution protein structures, especially the Top8000, were used to construct contours of expected protein behavior for CaBLAM. CaBLAM has also been integrated into the codebase for the Richardson Lab's online MolProbity validation service.
CaBLAM succeeds in providing useful validation feedback for protein structures in the 2.5-4.0A resolution range. This success demonstrates the relative reliability of the C-alpha; trace of a protein in this resolution range. Full mainchain information can be extrapolated from the C-alpha; trace, especially for regular secondary structure elements.
CaBLAM has also informed our approach to validation for low-resolution structures. Moderation of feedback, to reduce validation overload and to focus user attention on modeling errors that are both significant and correctable, is one of our goals. CaBLAM and the related methods that have grown around it demonstrate the progress towards this goal.
Item Open Access Using Protein-Likeness to Validate Conformational Alternatives(2012) Keedy, Daniel AustinProteins are among the most complex entities known to science. Composed of just 20 fundamental building blocks arranged in simple linear strings, they nonetheless fold into a dizzying array of architectures that carry out the machinations of life at the molecular level.
Despite this central role in biology, we cannot reliably predict the structure of a protein from its sequence, and therefore rely on time-consuming and expensive experimental techniques to determine their structures. Although these methods can reveal equilibrium structures with great accuracy, they unfortunately mask much of the inherent molecular flexibility that enables proteins to dynamically perform biochemical tasks. As a result, much of the field of structural biology is mired in a static perspective; indeed, most attempts to naively model increased structural flexibility still end in failure.
This document details my work to validate alternative protein conformations beyond the primary or equilibrium conformation. The underlying hypothesis is that more realistic modeling of flexibility will enhance our understanding of how natural proteins function, and thereby improve our ability to design new proteins that perform desired novel functions.
During the course of my work, I used structure validation techniques to validate conformational alternatives in a variety of settings. First, I extended previous work introducing the backrub, a local, sidechain-coupled backbone motion, by demonstrating that backrubs also accompany sequence changes and therefore are useful for modeling conformational changes associated with mutations in protein design. Second, I extensively studied a new local backbone motion, helix shear, by documenting its occurrence in both crystal and NMR structures and showing its suitability for expanding conformational search space in protein design. Third, I integrated many types of local alternate conformations in an ultra-high-resolution crystal structure and discovered the combinatorial complexity that arises when adjacent flexible segments combine into networks. Fourth, I used structural bioinformatics techniques to construct smoothed, multi-dimensional torsional distributions that can be used to validate trial conformations or to propose new ones. Fifth, I participated in judging a structure prediction competition by using validation of geometrical and all-atom contact criteria to help define correctness across thousands of submitted conformations. Sixth, using similar tools plus collation of multiple comparable structures from the public database, I determined that low-energy states identified by the popular structure modeling suite Rosetta sometimes are valid conformations likely to be populated in the cell, but more often are invalid conformations attributable to artifacts in the physical/statistical hybrid energy function.
Unified by the theme of validating conformational alternatives by reference to high-quality experimental structures, my cumulative work advances our fundamental understanding of protein structural variability, and will benefit future endeavors to design useful proteins for biomedicine or industrial chemistry.