Discovery of RNA-Targeted Small Molecules by Quantitative Structure-Activity Relationship (QSAR) Study and Machine Learning
Date
2023
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
RNA is a critical macromolecule in many biological processes by encoding both structural and genetic information. It can serve as the physical template for ribosome read-through during protein synthesis and the intermediary interfering gene expression. For example, messenger RNA encodes specific gene sequence, microRNA regulates expression level of the gene, riboswitch controls translation level and RNA splicing, non-coding RNA provides molecular scaffolding for protein recruitment. Undoubtedly, malfunction of cellular RNAs lead to multiple diseases and targeting disease related RNAs has emerged as the new strategy in many drug development campaigns. Indeed, ribosomal RNA has been utilized as the drug target for a long history and fruitful studies on naturally occurred or synthetic ligands were brought to elucidate the mechanism of translation inhibition. It was the past two decades that witnessed growing research on using small-molecule probes to interrogate non-ribosomal RNAs in various disease pathways.RNA molecules bear distinct chemical properties from proteins that make the design of selective and potent chemical probes challenging. The poor chemical diversity of four building units, immensely charged phosphate backbone, shallow and highly hydrophilic binding pocket, dynamic conformations, all combined render a mysterious ligand space to RNA-targeted small molecules that needs further exploration. A deep understanding of privileged chemotypes or physicochemical properties of RNA-targeting ligands will definitely benefit a broad-scope developing novel chemical entities with desired RNA-interfering outcome. In my thesis work, I first applied the computational approach by building the quantitative structure-activity relationship (QSAR) model to predict the binding profiles of a set of biased ligands scaffolding an amiloride core structure against HIV viral RNA elements. The well-performed model predicted the binding parameters of a set of untested molecules and selected the top-ranked one during lead optimization. The study showed the potential of this computational tool in decision-making during synthesis of RNA-targeted ligands. In the following study, we extended the scope of the QSAR study and leveraged the workflow to cater for the context with diverse structures as substrates. We applied explicit algorithms to build the baseline models to allow easy interpretation of binding behaviors of structurally distinct ligands to HIV-1 TAR. The model first time demonstrated molecular factors that contribute to RNA: small molecule recognition, both kinetically and thermodynamically. The general workflow we described will serve as a powerful computational tool to effectively assess underexplored chemical space and guide decision-making for synthesizing RNA-targeted chemical probes. We then bridged our QSAR approach with the generative deep learning model to pursue de novo ligand design to target SARS-CoV-2 frameshifting pseudoknot. The QSAR model that built on the experimentally validated data provided label annotation of the large training sample for deep learning model. A tree graph-based variational auto-encoder was trained to learn the molecular generation process. Annotated label of each training sample was encoded into the continuous latent space where molecules were reduced their dimensionality and projected. Conditions were applied when sampling new entities from the latent space, leading to the new compounds with desired binding properties. The method mentioned here constitutes the first deep learning practice for automatic chemical design against an RNA target and the first-time application of conditional molecular generation via a junction tree-based variational auto-encoder. Overall, the work presented in this thesis explored possibility of data-driven methods such as QSAR studies and deep learning in accelerating ligand discovery for RNA targets. It is anticipated that these workflows will benefit a wide-range studies in understanding and pursuing RNA-centric drug development, yet slight modifications might be needed for tuning into larger data size.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Cai, Zhengguo (2023). Discovery of RNA-Targeted Small Molecules by Quantitative Structure-Activity Relationship (QSAR) Study and Machine Learning. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30328.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.