RNA 3D Structure Analysis and Validation, and Design Algorithms for Proteins and RNA
RNA, or ribonucleic acid, is one of the three biological macromolecule types essential for all known life forms, and is a critical part of a variety of cellular processes. The well known functions of RNA molecules include acting as carriers of genetic information in the form of mRNAs, and then assisting in translation of that information to protein molecules as tRNAs and rRNAs. In recent years, many other kinds of non-coding RNAs have been found, like miRNAs and siRNAs, that are important for gene regulation. Some RNA molecules, called ribozymes, are also known to catalyze biochemical reactions. Functions carried out by these recently discovered RNAs, coupled with the traditionally known functions of tRNAs, mRNAs, and rRNAs make RNA molecules even more crucial and essential components in biology.
Most of the functions mentioned above are carried out by RNA molecules associ- ating themselves with proteins to form Ribonucleoprotein (RNP) complexes, e.g. the ribosome or the splicesosome. RNA molecules also bind a variety of small molecules, such as metabolites, and their binding can turn on or off gene expression. These RNP complexes and small molecule binding RNAs are increasingly being recognized as potential therapeutic targets for drug design. The technique of computational structure-based rational design has been successfully used for designing drugs and inhibitors for protein function, but its potential has not been tapped for design of RNA or RNP complexes. For the success of computational structure-based design, it is important to both understand the features of RNA three-dimensional structure and develop new and improved algorithms for protein and RNA design.
This document details my thesis work that covers both the above mentioned areas. The first part of my thesis work characterizes and analyzes RNA three-dimensional structure, in order to develop new methods for RNA validation and refinement, and new tools for correction of modeling errors in already solved RNA structures. I collaborated to assemble non-redundant and quality-conscious datasets of RNA crystal structures (RNA09 and RNA11), and I analyzed the range of values occupied by the RNA backbone and base dihedral angles to improve methods for RNA structure correction, validation, and refinement in MolProbity and PHENIX. I rebuilt and corrected the pre-cleaved structure of the HDV ribozyme and parts of the 50S ribosomal subunit to demonstrate the potential of new tools and techniques to improve RNA structures and help crystallographers to make correct biological interpretations. I also extended the previous work of characterizing RNA backbone conformers by the RNA Ontology Consortium (ROC) to define new conformers using the data from the larger RNA11 dataset, supplemented by ERRASER runs that optimize data points to add new conformers or improve cluster separation.
The second part of my thesis work develops novel algorithms for structure-based
protein redesign when interactions between distant residue pairs are neglected and the design problem is represented by a sparse residue interaction graph. I analyzed the sequence and energy differences caused by using sparse residue interaction graphs (using the protein redesign package OSPREY), and proposed a novel use of ensemble-based provable design algorithms to mitigate the effects caused by sparse residue interaction graphs. I collaborated to develop a novel branch-decomposition based dynamic programming algorithm, called BWM*, that returns the Global Minimum Energy Conformation (GMEC) for sparse residue interaction graphs much faster than the traditional A* search algorithm. As the final step, I used the results of my analysis of the RNA base dihedral angle and implemented the capability of RNA design and RNA structural flexibility in osprey. My work enables OSPREY to design not only RNA, but also simultaneously design both the RNA and the protein chains in a RNA-protein interface.
RNA 3D structure validation and correction
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations