Browsing by Subject "quantitative models"
- Results Per Page
- Sort Options
Item Open Access Current State of and Future Opportunities for Prediction in Microbiome Research: Report from the Mid-Atlantic Microbiome Meet-up in Baltimore on 9 January 2019.(mSystems, 2019-10) Sakowski, Eric; Uritskiy, Gherman; Cooper, Rachel; Gomes, Maya; McLaren, Michael R; Meisel, Jacquelyn S; Mickol, Rebecca L; Mintz, C David; Mongodin, Emmanuel F; Pop, Mihai; Rahman, Mohammad Arifur; Sanchez, Alvaro; Timp, Winston; Vela, Jeseth Delgado; Wolz, Carly Muletz; Zackular, Joseph P; Chopyk, Jessica; Commichaux, Seth; Davis, Meghan; Dluzen, Douglas; Ganesan, Sukirth M; Haruna, Muyideen; Nasko, Dan; Regan, Mary J; Sarria, Saul; Shah, Nidhi; Stacy, Brook; Taylor, Dylan; DiRuggiero, Jocelyne; Preheim, Sarah PAccurate predictions across multiple fields of microbiome research have far-reaching benefits to society, but there are few widely accepted quantitative tools to make accurate predictions about microbial communities and their functions. More discussion is needed about the current state of microbiome analysis and the tools required to overcome the hurdles preventing development and implementation of predictive analyses. We summarize the ideas generated by participants of the Mid-Atlantic Microbiome Meet-up in January 2019. While it was clear from the presentations that most fields have advanced beyond simple associative and descriptive analyses, most fields lack essential elements needed for the development and application of accurate microbiome predictions. Participants stressed the need for standardization, reproducibility, and accessibility of quantitative tools as key to advancing predictions in microbiome analysis. We highlight hurdles that participants identified and propose directions for future efforts that will advance the use of prediction in microbiome research.Item Open Access Developing Quantitative Models in Analyzing High-throughput Sequencing Data(2021) Kim, Young-SookDiverse functional genomics assays have been developed and helped to investigate complex gene regulations in various biological conditions. For example, RNA-seq has been used to capture gene expressions in diverse human tissues, helping to study tissue-common and tissue-specific gene regulation. ChIP-seq has been used to identify the genomic regions bound by numerous transcription factors, thus helping to identify collaborative and competitive binding mechanisms of the transcription factors. Despite this huge increase in the amount and the accessibility of genomic data, we have several challenges to analyze those data with proper statistical methods. Some assays such as STARR-seq do not have a proper statistical model that detects both activated and repressed regulatory elements, making researchers depend on the statistical models developed for other assays. Some assays such as ChIP-seq and RNA-seq have limited joint analysis models that are flexible and computationally scalable, resulting in the limited statistical power in identifying the genomic regions or genes shared by multiple biological conditions. To solve those challenges in analyzing high-throughput assays, we first developed a statistical model called correcting reads and analysis of differential active elements or CRADLE to analyze STARR-seq data. CRADLE removes technical biases that can confound quantification of regulatory activity and then detects both activated and repressed regulatory elements. We observed the corrected read counts improved the visualization of regulatory activity, allowing for more accurate detection of regulatory elements. Indeed, through simulation study, we showed CRADLE significantly improved precision and recall in detecting regulatory elements compared to the previous statistical approaches and that improvement was especially prominent in identifying repressed regulatory elements. Based on our work on developing CRADLE, we adapted the statistical framework of CRADLE and developed a joint analysis model of multiple data for biology or JAMMY that can be applied to diverse high-throughput sequencing data. JAMMY is a flexible statistical model that jointly analyzes multiple conditions, identifies condition-shared and condition-specific genomic regions, and then quantifies the preferential activity of a subset of biological conditions for each genomic region. We applied JAMMY to STARR-seq, ChIP-seq, and RNA-seq data, and observed JAMMY overall improved the precision and recall in identifying condition-shared activity compared to the traditional condition-by-condition analysis. This gain of statistical power from the joint analysis led us to find a novel co-binding of two transcription factors in our study. Those results show the substantial advantages of using joint analysis model in integrating genomic data from multiple biological conditions.