Browsing by Author "Donald, Bruce Randall"
- Results Per Page
- Sort Options
Item Open Access A maximum entropy-based approach for the description of the conformational ensemble of calmodulin from paramagnetic NMR(2016-05-04) Thelot, FrancoisCharacterizing protein dynamics is an essential step towards a better understanding of protein function. Experimentally, we can access information about protein dynamics from paramagnetic NMR data such as pseudocontact shifts, which integrate ensemble-averaged information about the motion of proteins. In this report, we recognize that the relative position of the two domains of calmodulin can be represented as the evolution of one of the domains in the space of Euclidean motions. From this perspective, we suggest a maximum entropy-based approach for finding a probability distribution on SE(3) satisfying experimental NMR measurements. While sampling of SE(3) is performed with the ensemble generator EOM, the proposed framework can be extended to uniform sampling of the space of Euclidean motions. At the end of this study, we find that the most represented protein conformations for calmodulin corresponds to conformations in which both protein domains are in close contact, despite being largely different from each other. Such a representation agrees with the random coil linker model, and sharply differs with the extended crystal structure of calmodulin.Item Open Access Combined Computational, Experimental, and Assay-Development Studies of Protein:Protein and Protein:Small Molecule Complexes, with Applications to the Inhibition of Enzymes and Protein:Protein Interactions(2019) Frenkel, MarcelDespite the best efforts of both academia and the pharma industry, most non-resectable cancers remain uncurable and lethal. The world health organization (WHO) believes cancer to be the second leading cause of death worldwide, with roughly 9.6 million deaths in 2018. Meanwhile, the emergence of antimicrobial resistance (AMR), or superbugs, is an increasingly large medical crisis, with estimates as high as 700,000 deaths for 2018 worldwide. This number is increasing rapidly. These unmet medical needs, although distinct, are intimately related by the need for better chemistry and intelligent drug design.
Both AMR and cancer could benefit from the expansion of the druggable proteome through the inhibition of protein-protein interactions (PPIs). PPIs drive both intra- and inter-cellular communication, and therefore their inhibition is vital for disease modulation. Moreover, both AMR and cancer therapeutics suffer from the rapid emergence of drug resistance. Even great drugs that function perfectly at first frequently lose effectiveness a few months later, due to the rapid emergence of drug resistance.
Here, I discuss my contributions towards developing a PPI inhibitor to KRas, the most commonly activated oncogene in cancer. Through the use of OSPREY, a state-of-the-art computational protein and drug design (CPDD) software, and using KRas’ native ligand Raf-1 RBD as a starting point, we developed a super-binder with single-digit nanomolar affinity for KRas. The development and validation of this biologic inhibitor required the development of four novel biochemical assays to study binding to KRas and the inhibition of the KRas:Raf interaction.
I also discuss my contributions towards enhancing our ability to predict resistance mutations through the use of OSPREY. This work focused on novel mechanisms of resistance in the dihydrofolate reductase of Staphylococcus aureus (SaDHFR). Specifically, we investigated the role of plasmid-borne resistance genes in Staph, as well as the mechanism of resistance due to the emergence of the F98Y and V31L resistance mutations. We discovered a potential new mechanism of resistance based on the formation of a tricyclic NADPH configuration, which we have named chiral evasion.
Finally, I discuss lessons learned from benchmarking OSPREY and share observations that can be used by drug designers using CPDD tools to enhance the accuracy and predictive potential of their results.
In conclusion, a combination of OSPREY and biochemical assays was used towards overcoming two of the largest limitations in drug development that directly affect global human health: the development of PPI inhibitors and overcoming drug resistance. We identified a novel hot-spot in the KRas:Raf interface that can successfully be used to optimize the PPI and develop a biologic inhibitor to KRas. We generated models that explain the mechanism of inhibition of both V31L and F98Y in the context of chiral evasion through a tricyclic NADPH configuration, and we benchmarked OSPREY and observed features that can contribute towards the predictive accuracy of CPDD tools.
Item Open Access Computational Molecular Engineering Nucleic Acid Binding Proteins and Enzymes(2010) Reza, FaisalInteractions between nucleic acid substrates and the proteins and enzymes that bind and catalyze them are ubiquitous and essential for reading, writing, replicating, repairing, and regulating the genomic code by the proteomic machinery. In this dissertation, computational molecular engineering furthered the elucidation of spatial-temporal interactions of natural nucleic acid binding proteins and enzymes and the creation of synthetic counterparts with structure-function interactions at predictive proficiency. We examined spatial-temporal interactions to study how natural proteins can process signals and substrates. The signals, propagated by spatial interactions between genes and proteins, can encode and decode information in the temporal domain. Natural proteins evolved through facilitating signaling, limiting crosstalk, and overcoming noise locally and globally. Findings indicate that fidelity and speed of frequency signal transmission in cellular noise was coordinated by a critical frequency, beyond which interactions may degrade or fail. The substrates, bound to their corresponding proteins, present structural information that is precisely recognized and acted upon in the spatial domain. Natural proteins evolved by coordinating substrate features with their own. Findings highlight the importance of accurate structural modeling. We explored structure-function interactions to study how synthetic proteins can complex with substrates. These complexes, composed of nucleic acid containing substrates and amino acid containing enzymes, can recognize and catalyze information in the spatial and temporal domains. Natural proteins evolved by balancing stability, solubility, substrate affinity, specificity, and catalytic activity. Accurate computational modeling of mutants with desirable properties for nucleic acids while maintaining such balances extended molecular redesign approaches. Findings demonstrate that binding and catalyzing proteins redesigned by single-conformation and multiple-conformation approaches maintained this balance to function, often as well as or better than those found in nature. We enabled access to computational molecular engineering of these interactions through open-source practices. We examined the applications and issues of engineering nucleic acid binding proteins and enzymes for nanotechnology, therapeutics, and in the ethical, legal, and social dimensions. Findings suggest that these access and applications can make engineering biology more widely adopted, easier, more effective, and safer.
Item Open Access Computational Protein Design with Ensembles, Flexibility and Mathematical Guarantees, and its Application to Drug Resistance Prediction, and Antibody Design(2015-01-01) Gainza Cirauqui, PabloProteins are involved in all of life's processes and are also responsible for many diseases. Thus, engineering proteins to perform new tasks could revolutionize many areas of biomedical research. One promising technique for protein engineering is computational structure-based protein design (CSPD). CSPD algorithms search large protein conformational spaces to approximate biophysical quantities. In this dissertation we present new algorithms to realistically and accurately model how amino acid mutations change protein structure. These algorithms model continuous flexibility, protein ensembles and positive/negative design, while providing guarantees on the output. Using these algorithms and the OSPREY protein design program we design and apply protocols for three biomedically-relevant problems: (i) prediction of new drug resistance mutations in bacteria to a new preclinical antibiotic, (ii) the redesign of llama antibodies to potentially reduce their immunogenicity for use in preclinical monkey studies, and (iii) scaffold-based anti-HIV antibody design. Experimental validation performed by our collaborators confirmed the importance of the algorithms and protocols.
Item Open Access Computational Protein Design with Non-proteinogenic Amino Acids and Small Molecule Ligands, with Applications to Protein-protein Interaction Inhibitors, Anti-microbial Enzyme Inhibitors, and Antibody Design(2021) Wang, SiyuComputational protein design is a leading-edge technology to design novel protein with novel functions, as well as study the structure and function of known protein. Conventionally, most of the existing computational protein design methods and softwares focus only on modeling proteinogenic amino acids. However, in reality most biochemical systems are far more complicated. Many kinds of protein not only consist of proteinogenic amino acids, but also contain non-natural amino acids or post-transnational modifications. For some protein, their function can only be fulfilled through the interaction with small molecule ligands or cofactors, which is also beyond the scope of proteinogenic amino acids. In order to expand the capability of computational protein design methods, in this dissertation we incorporated the the modeling of non-natural amino acids into OSPREY. OSPREY is a computational protein design software suite that based on provable algorithms and developed in our lab. Furthermore, 3 human health related designs involving non-natural amino acids or small molecule ligands are presented in this dissertation: (1) design of novel cystic fibrosis therapeutics using non-natural amino acids, (2) re-design of HIV-1 broadly neutralizing antibodies for better potency and breadth, and (3) development of novel antibiotics fighting methicillin-resistant Staphylococcus aureus and the analysis of its resistance mechanism. Through extensive computational results and experiential data, we are able to demonstrate the success of our above designs.
Item Open Access Developing an In Vivo Intracellular Neuronal Recording System for Freely Behaving Small Animals(2013) Yoon, InhoElectrophysiological intracellular recordings from freely behaving animals can provide information and insights, which have been speculated or cannot be reached by traditional recordings from confined animals. Intracellular recordings can reveal a neuron's intrinsic properties and their communication with other neurons. Utilizing this technology in an awake and socially behaving brain can bring brain research one step further.
In this dissertation, a customized miniature electronics and microdrive assembly is introduced for intracellular recording from small behaving animals. This solution has realized in vivo intracellular recording from freely behaving zebra finches and mice. Also, a new carbon nanotube probe is presented as a surface scanning tip and a neural electrode. With the carbon nanotube probe, intracellular and extracellular neural signals were successfully recorded from mouse brains. Previously, carbon nanotubes have only been used as a coating material on a cell-culturing platform or on a metal based neural electrode. This probe is the first pure carbon nanotube neural electrode without an underlying platform or wire, and it is the first one that has achieved intracellular and extracellular recordings from vertebrate cortical neurons.
Item Embargo Discovery and Characterization of Novel Thanatin Orthologs Against Escherichia coli LptA and Pseudomonas aeruginosa LptH(2023) Huynh, KellyMultidrug resistance (MDR) in bacteria is ever growing and complicates treatment of infections, especially in patients who are critically ill and immunocompromised. Treatment often utilizes a regiment of small molecule drugs, however resistance against them develops after prolonged usage. An alternative class of molecules, antimicrobial peptides (AMPs), have remained of interest due to its vast potential of becoming pharmacological agents. AMPs, or host defense peptides, are naturally expressed in many organisms, including microbes, plants, and humans. AMPs are expressed to control the population of bacteria, fungi, and viruses as a defense mechanism. Mining host genomes for AMPs will prove to be a valuable source of novel alternative drug molecules. Characterization of AMPs will lead to be a better understanding of their mechanism of action and allow for applications to novel targets. Here in this dissertation, we apply these methods to thanatin, an AMP identified from the spined soldier bug (Podisus maculiventris) that was reported to regulate the gut microbiome population by targeting Gram-positive bacteria, Gram-negative bacteria, and fungi.First, we mined genomic databases to discover novel thanatin orthologs. We generated these orthologs and characterized their binding against Escherichia coli LptA, a known target of thanatin, via bio-layer interferometry (BLI) and their antimicrobial activity against several E. coli strains via minimum inhibition concentration (MIC) assays. We found a subset of thanatin sequences that target E. coli better than P. maculiventris thanatin, as shown with increased binding affinity, cell permeability, and overall potency. We crystallized and determined the structures of Chinavia ubica thanatin and Murgantia histrionica thanatin, the two most improved thanatin orthologs, in complex with E. coli LptA to better understand the interaction. We performed mutagenesis studies to show that thanatin residues A10 and M21 interacts with the hydrophobic core of LptA and improves binding and synergistically improves cell permeability to increase antimicrobial activity against E. coli. We redesigned M. histrionica thanatin to truncate the sequence and remove the need for a disulfide bond. Our stapled peptide retained binding affinity to LptA, however potency was hampered. Despite seeing no improvement in antimicrobial activity, we present a novel scaffold for the next generation of thanatin-based AMPs. Next, we characterized thanatin against Pseudomonas aeruginosa, a known but weaker target of thanatin. We confirmed binding of P. maculiventris thanatin to LptH, the P. aeruginosa homolog of E. coli LptA, via BLI and isothermal titration calorimetry (ITC) and showed inhibition of P. aeruginosa strain RP73 via MIC assays. We used homology modeling and an E. coli model system to identify the resistance factor of thanatin to be LptH Y51 at the predicted binding interface. We attempted to overcome the hinderance of LptH Y51 by modeling thanatin to accommodate it. Our designs cooperated in the E. coli model system, however they did not translate to improve binding with LptH. Interestingly, we discovered that thanatin Y10 is essential to binding LptH. We applied the small library of thanatin orthologs to LptH and P. aeruginosa and did not discover any sequences with improved binding or antimicrobial activity. Our small library screening highlighted the necessity of thanatin Y10 and the resistance factor LptH Y51 again. We investigated the role of improved potency of thanatin with P. aeruginosa through c-amidation. Our E. coli model system shows a key rescued interaction between the c-amidated terminus of thanatin and LptA R76Q that mimics LptH. We crystallized and determined the structure of c-amidated truncated thanatin and LptA R76Q to gain insight on the interaction. However, we did not observe the hypothesized rescued interaction. When translating our findings to LptH, we did not observe improved binding due to c-amidation via BLI, but we did via ITC. Conflicting data about how thanatin interacts with LptH could be clarified with a high-resolution protein:peptide complex structure, however attempts to experimentally obtain one has been difficult. Overall, we provide some insight on the mechanism of how thanatin targets LptH in P. aeruginosa, but further studies will be needed to fully elucidate its mechanism of action. Collectively, this dissertation provides an example of how natural sources can be mined to uncover novel AMPs to target bacteria with MDR on the rise. We present various insights gained on the mechanism of action of thanatin by characterizing thanatin and its novel orthologs against E. coli LptA and P. aeruginosa LptH. The characterization of thanatin will allow for improved AMPs to be designed in the next generation of thanatin peptides to target pathogens.
Item Open Access Efficient New Computational Protein Design Algorithms, with Applications to Drug Resistance Prediction and HIV Antibody Design(2018) Ojewole, AdegokeProteins are essential for myriad biological functions, including DNA replication, molecular transport, catalysis, and antigen recognition. Protein function is determined by three dimensional structure, which is largely determined by amino acid composition. The functional diversity of known proteins suggests that nature can support a much larger set of proteins than is currently available. Protein design aims to explore the space of possible proteins in order to create new proteins with novel or improved biological functions. Two key challenges in protein design, however, are the astronomically large number of possible protein sequences, along with the vast conformation space spanned by each protein. Computational structure-based protein design (CPD) enables the prediction of proteins with desired biochemical properties. A practical CPD method must not only efficiently tackle large sequence and conformation spaces but also use a computationally tractable yet biophysically realistic model of protein plasticity. To this end, I have developed algorithms that accurately and more efficiently search large sequence and conformational spaces to compute proteins that satisfy binding affinity, specificity, and stability requirements. Crucially, my algorithms maintain the state-of-the-art in protein design, namely: provable guarantees, continuous flexibility, and ensemble-based scoring. I applied my algorithms to two biomedically relevant problems: (i) prediction of drug resistance mutations that arise in response to four pre-clinical antibiotics, and (ii) the re-design of a monoclonal HIV antibody for improved potency and breadth of neutralization.
Item Open Access Ensemble-based Computational Protein Design: Novel Algorithms and Applications to Energy Landscape Approximation, Antibiotic Resistance, and Antibody Design(2022) Holt, Graham ThomasProteins are incredibly varied in their biological function, and are therefore attractive targets for scientists and engineers to design new and improved functions. These functions are defined by a protein structure, which can be viewed as a probability distribution over a large conformation space. Many successful protein design methods construct and evaluate models of protein structure and physics in silico to design proteins. We apply the concept of protein structure as a probability distribution to design new protein design algorithms, study mechanisms of protein binding and antibiotic resistance, and design improved broadly-neutralizing antibodies This research highlights the utility of the distribution view of protein structure, and suggests future research in this direction.
Item Open Access Geometric Algorithms for Protein Structure Determination Using Measurements From Nuclear Magnetic Resonance Spectroscopy(2014) Martin, Jeffrey W.In an environment such as a cell, the three-dimensional structure of a protein entirely determines its function. Hence, to understand the mechanics of biochemical processes necessary to sustain life, it is crucial to study the structures of proteins at atomic detail. When life is threatened by viral and bacterial pathogens, structural characterization of the proteins at play yields insights about possible treatments and therapeutics. Measurements from nuclear magnetic resonance spectroscopy (NMR) reveal information about the structures of proteins, but building accurate atomic-resolution models from such measurements is an arduous task. The ambiguity and uncertainty of these measurements, and the challenges of obtaining a sufficient number of measurements to uniquely describe a structure, contribute to the difficulty of protein structure determination by NMR.
The current widely-used computational methods using NMR measurements for structure determination primarily rely on various incarnations of stochastic optimization. These techniques have been used to determine protein structures of excellent quality, but in the long term, the reliability of these techniques is dubious (and in cases, demonstrably inadequate), especially as we attempt to solve increasingly difficult structures. Stochastic optimization, due to its random nature, may not always report the best solution. Other superior solutions may lie concealed in the landscape of the objective function and remain undiscovered. We therefore seek computational methods for structure determination that are imbued with guarantees about solution quality. In this dissertation, we present methods for protein structure determination by NMR that are able to guarantee structural solutions quantitatively agree with experimental measurements. Although the trade-off for guaranteeing completeness of algorithms for structure determination is often an exponential running time, for some methods, we remarkably obtained polynomial running times in addition to guarantees of completeness.
Item Embargo New Computational Methods to Predict Cancer Resistance Mutations and Design D-Peptide Therapeutics(2023) Guerin, NathanImproving disease treatment relies on advancements in our understanding of disease etiology and evolution. Rational drug design seeks to exploit this understanding to improve human health through targeted molecular interventions. In this dissertation, we present computational methods that 1) predict disease evolution in the form of resistance mutations; and 2) generate de novo D-peptide therapeutics. First, we introduce the Resistor algorithm. Resistor uses Pareto optimization with multistate design and cancer-specific mutational probabilities to rank resistance mutations based on their ability to ablate binding to an inhibitor, retain native function, and occur in a specific cancer type. We apply Resistor to 8 inhibitors targeting the EGFR, BRAF, and ERK2 proteins, and provide experimental validation of Resistor-predicted resistance mutations. Second, we introduce DexDesign, a novel algorithm for computationally designing de novo D-peptide inhibitors. DexDesign leverages three novel techniques that are broadly applicable to computational protein design: the Minimum Flexible Set, K*-based Mutational Scan, and Inverse Alanine Scan. We apply these techniques and DexDesign to generate novel D-peptide inhibitors of two biomedically important PDZ domain targets: CALP and MAST2. Notably, the peptides we generated are predicted to bind their targets tighter than their targets' endogenous ligands, validating the peptides' potential as lead therapeutic candidates. We provide implementations of Resistor and DexDesign in the free and open source computational protein design software OSPREY.
Item Open Access Novel Algorithms and Tools for Computational Protein Design with Applications to Drug Resistance Prediction, Antibody Design, Peptide Inhibitor Design, and Protein Stability Prediction(2019) Lowegard, Anna UlrikaProteins are biological macromolecules made up of amino acids. Proteins range from enzymes to antibodies and perform their functions through a variety of mechanisms, including through protein-protein interactions (PPIs). Computational structure-based protein design (CSPD) seeks to design proteins toward some specific or novel function by changing the amino acid composition of a protein and modeling the effects. CSPD is a particularly challenging problem since the size of the search space grows exponentially with the number of amino acid positions included in each design. This challenge is most often encountered when considering large designs such as the re-design of a PPI. Herein, we discuss how to use CSPD to predict resistance mutations in the active site of the dihydrofolate reductase enzyme from methicillin-resistant Staphylococcus aureus and we investigate the accuracy of an existing CSPD suite of algorithms, osprey. We have also developed novel algorithms and tools within osprey to more efficiently and accurately predict the effects of mutations. We apply these various algorithms and tools to three systems toward a variety of goals: predicting the affect on stability of mutations in staphylococcal protein A (SpA), re-designing HIV-1 broadly neutralizing antibody PG9-RSH toward improved potency, and designing toward a peptide inhibitor of KRas:effector PPIs.
Item Open Access Novel Algorithms for Automated NMR Assignment and Protein Structure Determination(2011) Zeng, JianyangHigh-throughput structure determination based on solution nuclear magnetic resonance (NMR) spectroscopy plays an important role in structural genomics. Unfortunately, current NMR structure determination is still limited by the lengthy time required to process and analyze the experimental data. A major bottleneck in protein structure determination via NMR is the interpretation of NMR data, including the assignment of chemical shifts and nuclear Overhauser effect (NOE) restraints from NMR spectra. The development of automated and efficient procedures for analyzing NMR data and assigning experimental restraints will thereby enable high-throughput protein structure determination and advance structural proteomics research. In this dissertation, we present the following novel algorithms for automating NMR assignment and protein structure determination. First, we develop a novel high-resolution structure determination algorithm that starts with a global fold calculated from the exact and analytic solutions to the residual dipolar coupling (RDC) equations. Our high-resolution structure determination protocol has been applied to solve the NMR structures of the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein), which have been deposited into the Protein Data Bank. Second, we propose an automated side-chain resonance and NOE assignment algorithm that does not require any explicit through-bond experiment to facilitate side-chain resonance assignment, such as HCCH-TOCSY. Third, we present a Bayesian approach to determine protein side-chain rotamer conformations by integrating the likelihood function derived from unassigned NOE data, with prior information (i.e., empirical molecular mechanics energies) about the protein structures. Fourth, we develop a loop backbone structure determination algorithm that exploits the global orientational restraints from sparse RDCs and computes an ensemble of loop conformations that not only close the gap between two end residues but also satisfy the NMR data restraints. Finally, to facilitate NMR structure determination for large proteins, we develop a novel algorithm for predicting the Ha chemical shifts by exploiting the dependencies between chemical shifts of different backbone atoms and integrating the attainable structural information. All the algorithms developed in this dissertation have been tested on experimental NMR data with collaborators in Dr. Pei Zhou's and our labs. The promising results demonstrate that our algorithms can be successfully applied to high-quality protein structure determination. Since our algorithms reduce the time required in NMR assignment, it can accelerate the protein structure determination process.
Item Open Access Novel Algorithms for Computational Protein Design, with Applications to Enzyme Redesign and Small-Molecule Inhibitor Design(2009) Georgiev, Ivelin StefanovComputational protein design aims at identifying protein mutations and conformations with desired target properties (such as increased protein stability, switch of substrate specificity, or novel function) from a vast combinatorial space of candidate solutions. The development of algorithms to efficiently and accurately solve problems in protein design has thus posed significant computational and modeling challenges. Despite the inherent hardness of protein design, a number of computational techniques have been previously developed and applied to a wide range of protein design problems. In many cases, however, the available computational protein design techniques are deficient both in computational power and modeling accuracy. Typical simplifying modeling assumptions for computational protein design are the rigidity of the protein backbone and the discretization of the protein side-chain conformations. Here, we present the derivation, proofs of correctness and complexity, implementation, and application of novel algorithms for computational protein design that, unlike previous approaches, have provably-accurate guarantees even when backbone or continuous side-chain flexibility are incorporated into the model. We also describe novel divide-and-conquer and dynamic programming algorithms for improved computational efficiency that are shown to result in speed-ups of up to several orders of magnitude as compared to previously-available techniques. Our novel algorithms are further incorporated as part of K*, a provably-accurate ensemble-based algorithm for protein-ligand binding prediction and protein design. The application of our suite of protein design algorithms to a variety of problems, including enzyme redesign and small-molecule inhibitor design, is described. Experimental validation, performed by our collaborators, of a set of our computational predictions confirms the feasibility and usefulness of our novel algorithms for computational protein design.
Item Open Access Novel Algorithms for Protein Structure Determination from Sparse NMR Data(2012) Tripathy, ChittaranjanNuclear magnetic resonance (NMR) spectroscopy is an established technique for macromolecular structure determination at atomic resolution. However, the majority of the current structure determination approaches require a large set of experiments and use large amount of data to elucidate the three dimensional protein structure. While current structure determination protocols may perform well in data-rich settings, protein structure determination still remains to be a difficult task in a sparse-data setting. Sparse data arises in high-throughput settings, for larger proteins, membrane proteins, and symmetric protein complexes; thereby requiring novel algorithms that can compute structures with provable guarantees on solution quality and running time.
In this dissertation project we made an effort to address the key computational bottlenecks in NMR structural biology. Specifically, we improved and extended the recently-developed techniques by our laboratory, and developed novel algorithms and computational tools that will enable protein structure determination from sparse NMR data. An underlying goal of our project was to minimize the number of NMR experiments, hence the amount of time and cost to perform them, and still be able to determine protein structures accurately from a limited set of experimental data. The algorithms developed in this dissertation use the global orientational restraints from residual dipolar coupling (RDC) and residual chemical shift anisotropy (RCSA) data from solution NMR, in addition to a sparse set of distance restraints from nuclear Overhauser effect (NOE) and paramagnetic relaxation enhancement (PRE) measurements. We have used tools from algebraic geometry to derive analytic expressions for the bond vector and peptide plane orientations, by exploiting the mathematical interplay between RDC- or RCSA-derived sphero-conics and protein kinematics, which in addition to improving our understanding of the geometry of the restraints from these experimental data, have been used by our algorithms to compute the protein structures provably accurately. Our algorithms, which determine protein backbone global fold from sparse NMR data, were used in the high-resolution structure determination protocol developed in our laboratory to solve the solution NMR structures of the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein), which have been deposited into the Protein Data Bank. We have developed a novel, sparse data, RDC-based algorithm to compute ensembles of protein loop conformations in the presence of a moderate level of dynamics in the loop regions. All the algorithms developed in this dissertation have been tested on experimental NMR data. The promising results obtained by our algorithms suggest that our algorithms can be successfully applied to determine high-quality protein backbone structures from a limited amount of experimental NMR data, and hence will be useful in automated NOE assignments and high-resolution protein backbone structure determination from sparse NMR data. The algorithms and the software tools developed during this project are made available as free open-source to the scientific community.
Item Open Access Novel Computational Protein Design Algorithms with Applications to Cystic Fibrosis and HIV(2014) Roberts, Kyle EugeneProteins are essential components of cells and are crucial for catalyzing reactions, signaling, recognition, motility, recycling, and structural stability. This diversity of function suggests that nature is only scratching the surface of protein functional space. Protein function is determined by structure, which in turn is determined predominantly by amino acid sequence. Protein design aims to explore protein sequence and conformational space to design novel proteins with new or improved function. The vast number of possible protein sequences makes exploring the space a challenging problem.
Computational structure-based protein design (CSPD) allows for the rational design of proteins. Because of the large search space, CSPD methods must balance search accuracy and modeling simplifications. We have developed algorithms that allow for the accurate and efficient search of protein conformational space. Specifically, we focus on algorithms that maintain provability, account for protein flexibility, and use ensemble-based rankings. We present several novel algorithms for incorporating improved flexibility into CSPD with continuous rotamers. We applied these algorithms to two biomedically important design problems. We designed peptide inhibitors of the cystic fibrosis agonist CAL that were able to restore function of the vital cystic fibrosis protein CFTR. We also designed improved HIV antibodies and nanobodies to combat HIV infections.
Item Open Access Novel Computational Protein Design Algorithms with Sparse Residue Interaction Graphs, Ensembles, and Mathematical Guarantees, and their Application to Antibody Design(2018) Jou, Jonathan DragonComputational structure-based protein design seeks to harness the incredible biological power of proteins by designing proteins with new structures and even new function. In this dissertation, we present new algorithms to more efficiently search over two models of protein design: design with sparse residue interaction graphs, and design with conformational ensembles. These algorithms build upon existing provable algorithms: they retain all mathematical guarantees of preceding provable methods while providing both efficiency gains and novel theoretical results. Using provable algorithms and the OSPREY protein design software suite we develop and apply protocols to redesign broadly neutralizing antibodies for improved potency and breadth vs. HIV. We retrospectively validate experimentally observed escape mutations to HIV gp120 that reduce binding affinity for the broadly neutralizing antibody CAP256-VRC26.25, and identify mutations predicted to improve both potency and breadth of CAP256-VRC26.25 against HIV.
Item Open Access On Provable Algorithms for Determination of Continuous Protein Interdomain Motions from Residual Dipolar Couplings(2016) Qi, YangDynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.
Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.
Item Open Access Partition function estimation in computational protein design with continuous-label Markov random fields(2017-05-04) Mukund, AdityaProteins perform a variety of biological tasks, and drive many of the dynamic processes that make life possible. Computational structure-based protein design (CSPD) involves computing optimal sequences of amino acids with respect to particular backbones, or folds, in order to produce proteins with novel functions. In particular, it is crucial to be able to accurately model protein-protein interfaces (PPIs) in order to realize desired functionalities. Accurate modeling of PPIs raises two significant considerations. First, incorporating continuous side-chain flexibility in the design process has been shown to significantly improve the quality of designs. Second, because proteins exist as ensembles of structures, many of the properties we wish to design, including binding affinity, require the computation of ensemble properties as opposed to features of particular conformations. The bottleneck in many design algorithms that attempt to handle the ensemble nature of protein structure, including the Donald Lab’s K ∗ algorithm, is the computation of the partition function, which is the sum of the Boltzmann-weighted energies of all the conformational states of a protein or protein-ligand complex. Protein design can be formulated as an inference problem on Markov random fields (MRFs), where each residue to be designed is represented by a node in the MRF and an edge is placed between nodes corresponding to interacting residues. Label sets on each vertex correspond to allowed flexibility in the underlying design problem. The aim of this work is to extend message-passing algorithms that estimate the partition function for Markov random fields with discrete label sets to MRFs with continuous label sets in order to compute the partition function for PPIs with continuous flexibility and continuous entropy.Item Open Access Protein and Drug Design Algorithms Using Improved Biophysical Modeling(2016) Hallen, Mark AndrewThis thesis focuses on the development of algorithms that will allow protein design calculations to incorporate more realistic modeling assumptions. Protein design algorithms search large sequence spaces for protein sequences that are biologically and medically useful. Better modeling could improve the chance of success in designs and expand the range of problems to which these algorithms are applied. I have developed algorithms to improve modeling of backbone flexibility (DEEPer) and of more extensive continuous flexibility in general (EPIC and LUTE). I’ve also developed algorithms to perform multistate designs, which account for effects like specificity, with provable guarantees of accuracy (COMETS), and to accommodate a wider range of energy functions in design (EPIC and LUTE).