Building Better Backbones: Visualizations, Analyses, and Tools for Higher Quality Macromolecular Structure Models
In this work, I develop computational and visual tools for analyzing and manipulating the backbone of macromolecules, and I demonstrate that these tools support building better structures than currently done. These visualization and analysis tools belong to an "Intelligence Amplification" (IA) tradition (rather than complete Artificial Intelligence (AI) automation), empowering users to improve structures.
Proteins and nucleic acids are among the most important molecules in biology, mediating the majority of biochemical processes that comprise a living organism. Therefore, these macromolecules are important targets, both for basic research to improve understanding of how life works, and for medical research as possible drug targets.
The function of these macromolecules is largely determined by their 3D structure. Although these macromolecules are chemically fairly simple, made up of linear sequences of a few possible subunits, they physically fold into complex, compact structures. Overall, structural biology aims to determine the general relationship between sequence and structure of these macromolecules, through determination of the positions of the atoms within individual macromolecules.
Because it is currently impossible to directly see the position of atoms in a molecule, all structural determination techniques, including X-ray crystallography, NMR, and homology modeling, result in an interpreted <italic>model</italic> of a structure. Nearly all of these models contain mistakes, in which atoms are fit in incorrect or impossible positions. These mistakes, especially at a functionally-important location in a structure, can mislead both basic and medical research, making it critical for structural biologists to build the highest quality models possible.
This document details how my dissertation work enables the building of better macromolecular structure models. This work follows an iterative development cycle, where visual analysis of models spurs development of better tools, which in turn improves the analysis. First, I describe how my analysis of protein loops from X-ray crystal structures reveals that the traditional definition of loop endpoints is too restrictive. Second, I create a protein backbone analysis and modeling tool, using a new peptide-centric division system. I show how this tool makes it easier to study protein loops, and also how it improves an algorithm for calculating core protein models from NMR residual dipolar coupling (RDC) data. Third, I describe how 3D visualization of RDCs in their structural context improves understanding of RDCs and validates NMR models in a novel way. Fourth, I describe how local quality analysis can diagnose problems in homology models. Fifth, I demonstrate that local quality analysis can be successfully used in conjunction with model rebuilding software to correct errors in low resolution structures. The various tools and software packages I created during the course of my work are freely available and have already made a positive impact on structures being generated by the community.
Archive versions of several of these software packages (JiffiLoop, RDCvis, and KiNG) should be included with this document; current versions can be downloaded from http://kinemage.biochem.duke.edu.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations