Discovery of RNA-Targeted Small Molecules by Quantitative Structure-Activity Relationship (QSAR) Study and Machine Learning

dc.contributor.advisor

Hargrove, Amanda E

dc.contributor.author

Cai, Zhengguo

dc.date.accessioned

2024-03-07T18:39:34Z

dc.date.issued

2023

dc.department

Chemistry

dc.description.abstract

RNA is a critical macromolecule in many biological processes by encoding both structural and genetic information. It can serve as the physical template for ribosome read-through during protein synthesis and the intermediary interfering gene expression. For example, messenger RNA encodes specific gene sequence, microRNA regulates expression level of the gene, riboswitch controls translation level and RNA splicing, non-coding RNA provides molecular scaffolding for protein recruitment. Undoubtedly, malfunction of cellular RNAs lead to multiple diseases and targeting disease related RNAs has emerged as the new strategy in many drug development campaigns. Indeed, ribosomal RNA has been utilized as the drug target for a long history and fruitful studies on naturally occurred or synthetic ligands were brought to elucidate the mechanism of translation inhibition. It was the past two decades that witnessed growing research on using small-molecule probes to interrogate non-ribosomal RNAs in various disease pathways.RNA molecules bear distinct chemical properties from proteins that make the design of selective and potent chemical probes challenging. The poor chemical diversity of four building units, immensely charged phosphate backbone, shallow and highly hydrophilic binding pocket, dynamic conformations, all combined render a mysterious ligand space to RNA-targeted small molecules that needs further exploration. A deep understanding of privileged chemotypes or physicochemical properties of RNA-targeting ligands will definitely benefit a broad-scope developing novel chemical entities with desired RNA-interfering outcome. In my thesis work, I first applied the computational approach by building the quantitative structure-activity relationship (QSAR) model to predict the binding profiles of a set of biased ligands scaffolding an amiloride core structure against HIV viral RNA elements. The well-performed model predicted the binding parameters of a set of untested molecules and selected the top-ranked one during lead optimization. The study showed the potential of this computational tool in decision-making during synthesis of RNA-targeted ligands. In the following study, we extended the scope of the QSAR study and leveraged the workflow to cater for the context with diverse structures as substrates. We applied explicit algorithms to build the baseline models to allow easy interpretation of binding behaviors of structurally distinct ligands to HIV-1 TAR. The model first time demonstrated molecular factors that contribute to RNA: small molecule recognition, both kinetically and thermodynamically. The general workflow we described will serve as a powerful computational tool to effectively assess underexplored chemical space and guide decision-making for synthesizing RNA-targeted chemical probes. We then bridged our QSAR approach with the generative deep learning model to pursue de novo ligand design to target SARS-CoV-2 frameshifting pseudoknot. The QSAR model that built on the experimentally validated data provided label annotation of the large training sample for deep learning model. A tree graph-based variational auto-encoder was trained to learn the molecular generation process. Annotated label of each training sample was encoded into the continuous latent space where molecules were reduced their dimensionality and projected. Conditions were applied when sampling new entities from the latent space, leading to the new compounds with desired binding properties. The method mentioned here constitutes the first deep learning practice for automatic chemical design against an RNA target and the first-time application of conditional molecular generation via a junction tree-based variational auto-encoder. Overall, the work presented in this thesis explored possibility of data-driven methods such as QSAR studies and deep learning in accelerating ligand discovery for RNA targets. It is anticipated that these workflows will benefit a wide-range studies in understanding and pursuing RNA-centric drug development, yet slight modifications might be needed for tuning into larger data size.

dc.identifier.uri

https://hdl.handle.net/10161/30328

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Chemistry

dc.subject

Biochemistry

dc.subject

Organic chemistry

dc.subject

Cheminformatics

dc.subject

Deep learning

dc.subject

Machine learning

dc.subject

QSAR

dc.subject

RNA

dc.subject

Small molecule

dc.title

Discovery of RNA-Targeted Small Molecules by Quantitative Structure-Activity Relationship (QSAR) Study and Machine Learning

dc.type

Dissertation

duke.embargo.months

23

duke.embargo.release

2026-02-07T18:39:34Z

Files

Collections