Novel Algorithms for Protein Structure Determination from Sparse NMR Data
Nuclear magnetic resonance (NMR) spectroscopy is an established technique for macromolecular structure determination at atomic resolution. However, the majority of the current structure determination approaches require a large set of experiments and use large amount of data to elucidate the three dimensional protein structure. While current structure determination protocols may perform well in data-rich settings, protein structure determination still remains to be a difficult task in a sparse-data setting. Sparse data arises in high-throughput settings, for larger proteins, membrane proteins, and symmetric protein complexes; thereby requiring novel algorithms that can compute structures with provable guarantees on solution quality and running time.
In this dissertation project we made an effort to address the key computational bottlenecks in NMR structural biology. Specifically, we improved and extended the recently-developed techniques by our laboratory, and developed novel algorithms and computational tools that will enable protein structure determination from sparse NMR data. An underlying goal of our project was to minimize the number of NMR experiments, hence the amount of time and cost to perform them, and still be able to determine protein structures accurately from a limited set of experimental data. The algorithms developed in this dissertation use the global orientational restraints from residual dipolar coupling (RDC) and residual chemical shift anisotropy (RCSA) data from solution NMR, in addition to a sparse set of distance restraints from nuclear Overhauser effect (NOE) and paramagnetic relaxation enhancement (PRE) measurements. We have used tools from algebraic geometry to derive analytic expressions for the bond vector and peptide plane orientations, by exploiting the mathematical interplay between RDC- or RCSA-derived sphero-conics and protein kinematics, which in addition to improving our understanding of the geometry of the restraints from these experimental data, have been used by our algorithms to compute the protein structures provably accurately. Our algorithms, which determine protein backbone global fold from sparse NMR data, were used in the high-resolution structure determination protocol developed in our laboratory to solve the solution NMR structures of the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein), which have been deposited into the Protein Data Bank. We have developed a novel, sparse data, RDC-based algorithm to compute ensembles of protein loop conformations in the presence of a moderate level of dynamics in the loop regions. All the algorithms developed in this dissertation have been tested on experimental NMR data. The promising results obtained by our algorithms suggest that our algorithms can be successfully applied to determine high-quality protein backbone structures from a limited amount of experimental NMR data, and hence will be useful in automated NOE assignments and high-resolution protein backbone structure determination from sparse NMR data. The algorithms and the software tools developed during this project are made available as free open-source to the scientific community.
Residual chemical shift anisotropy
Residual dipolar coupling
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations