Browsing by Subject "Cheminformatics"
- Results Per Page
- Sort Options
Item Embargo Discovery of RNA-Targeted Small Molecules by Quantitative Structure-Activity Relationship (QSAR) Study and Machine Learning(2023) Cai, ZhengguoRNA is a critical macromolecule in many biological processes by encoding both structural and genetic information. It can serve as the physical template for ribosome read-through during protein synthesis and the intermediary interfering gene expression. For example, messenger RNA encodes specific gene sequence, microRNA regulates expression level of the gene, riboswitch controls translation level and RNA splicing, non-coding RNA provides molecular scaffolding for protein recruitment. Undoubtedly, malfunction of cellular RNAs lead to multiple diseases and targeting disease related RNAs has emerged as the new strategy in many drug development campaigns. Indeed, ribosomal RNA has been utilized as the drug target for a long history and fruitful studies on naturally occurred or synthetic ligands were brought to elucidate the mechanism of translation inhibition. It was the past two decades that witnessed growing research on using small-molecule probes to interrogate non-ribosomal RNAs in various disease pathways.RNA molecules bear distinct chemical properties from proteins that make the design of selective and potent chemical probes challenging. The poor chemical diversity of four building units, immensely charged phosphate backbone, shallow and highly hydrophilic binding pocket, dynamic conformations, all combined render a mysterious ligand space to RNA-targeted small molecules that needs further exploration. A deep understanding of privileged chemotypes or physicochemical properties of RNA-targeting ligands will definitely benefit a broad-scope developing novel chemical entities with desired RNA-interfering outcome. In my thesis work, I first applied the computational approach by building the quantitative structure-activity relationship (QSAR) model to predict the binding profiles of a set of biased ligands scaffolding an amiloride core structure against HIV viral RNA elements. The well-performed model predicted the binding parameters of a set of untested molecules and selected the top-ranked one during lead optimization. The study showed the potential of this computational tool in decision-making during synthesis of RNA-targeted ligands. In the following study, we extended the scope of the QSAR study and leveraged the workflow to cater for the context with diverse structures as substrates. We applied explicit algorithms to build the baseline models to allow easy interpretation of binding behaviors of structurally distinct ligands to HIV-1 TAR. The model first time demonstrated molecular factors that contribute to RNA: small molecule recognition, both kinetically and thermodynamically. The general workflow we described will serve as a powerful computational tool to effectively assess underexplored chemical space and guide decision-making for synthesizing RNA-targeted chemical probes. We then bridged our QSAR approach with the generative deep learning model to pursue de novo ligand design to target SARS-CoV-2 frameshifting pseudoknot. The QSAR model that built on the experimentally validated data provided label annotation of the large training sample for deep learning model. A tree graph-based variational auto-encoder was trained to learn the molecular generation process. Annotated label of each training sample was encoded into the continuous latent space where molecules were reduced their dimensionality and projected. Conditions were applied when sampling new entities from the latent space, leading to the new compounds with desired binding properties. The method mentioned here constitutes the first deep learning practice for automatic chemical design against an RNA target and the first-time application of conditional molecular generation via a junction tree-based variational auto-encoder. Overall, the work presented in this thesis explored possibility of data-driven methods such as QSAR studies and deep learning in accelerating ligand discovery for RNA targets. It is anticipated that these workflows will benefit a wide-range studies in understanding and pursuing RNA-centric drug development, yet slight modifications might be needed for tuning into larger data size.
Item Open Access Machine Learning to Estimate Exposure and Effects of Emerging Chemicals and Other Consumer Product Ingredients(2023) Thornton, LukaChemicals in consumer products can influence our risk for developing adverse health conditions. This research addresses knowledge gaps in our ability to evaluate chemical safety, particularly for emerging substances on the market. Acknowledging the need for more high-throughput exposure and hazard models to support risk assessment, computational frameworks leveraging machine learning strategies and "big data" from public databases and mass social data sources were tested.
First, to understand consumer exposure, we require a better understanding of ingredient concentrations in products. A computational framework was developed to estimate chemical weight fractions for consumer products containing emerging substances. Nanomaterial-enabled products were used as a case study to represent such substances with limited physicochemical property data. Feature variables included chemical properties, functional use categories (e.g., antimicrobial), the type of product and its matrix. Weight fractions were classified as low, medium or high using a random forest or nonlinear support vector classifier. Performance of machine learning models was qualitatively compared with that of models from a second framework trained on data-rich, bulk-scale organic chemical product data. Models could roughly stratify material-product observations into weight fraction bins with moderate success. The best model achieved an average balanced accuracy of 73% on nanomaterials product data. Chemical functional use features served as particularly insightful predictors, suggesting that functional use data may be useful in evaluating the safety and sustainability of emerging chemicals. Investment in chemical and product data collection could see continued improvement of such machine learning models.
Shifting focus to the impact of chemicals on consumers, data on personal care products, ingredients, and customer reviews from online retailers and databases was collected to see if certain chemicals might increase risk of adverse reactions to products. The study scope was narrowed to shampoo products for hypothesis testing. Processing steps in the data pipeline included informatics and machine learning methods, namely, natural language processing for interpreting product reviews, text extraction from images of product labels, and feature reduction using chemical structure and ingredient source data. Fifty-one ingredient clusters were identified as having a significant correlation with higher adverse reaction rates in consumers when present in shampoos. Among these, there were a few common plant-based ingredients and synthetic preservatives known for causing skin sensitivity or irritation. In comparison with other constituents, however, the positively correlated ingredient groups had a general lack of published structural, physicochemical property and toxicity data. Results suggest an urgent need for targeted, higher-throughput chemical evaluations to safeguard consumers.
Together, these proof-of-concept studies progress our ability to quantify exposure and hazard of emerging and data-poor substances in consumer products. The outcomes of the computational frameworks can help prioritize potentially problematic substances for additional study to characterize risk.