Browsing by Subject "Cancer genomics"
- Results Per Page
- Sort Options
Item Open Access Bayesian meta-analysis models for heterogeneous genomics data(2013) Zheng, LinglingThe accumulation of high-throughput data from vast sources has drawn a lot attentions to develop methods for extracting meaningful information out of the massive data. More interesting questions arise from how to combine the disparate information, which goes beyond modeling sparsity and dimension reduction. This dissertation focuses on the innovations in the area of heterogeneous data integration.
Chapter 1 contextualizes this dissertation by introducing different aspects of meta-analysis and model frameworks for high-dimensional genomic data.
Chapter 2 introduces a novel technique, joint Bayesian sparse factor analysis model, to vertically integrate multi-dimensional genomic data from different platforms.
Chapter 3 extends the above model to a nonparametric Bayes formula. It directly infers number of factors from a model-based approach.
On the other hand, chapter 4 deals with horizontal integration of diverse gene expression data; the model infers pathway activities across various experimental conditions.
All the methods mentioned above are demonstrated in both simulation studies and real data applications in chapters 2-4.
Finally, chapter 5 summarizes the dissertation and discusses future directions.
Item Open Access Graph-based Approaches for Cancer Genomics, with Applications to Cancer Signaling and Dependencies(2018) Cakir, MerveThe era of widely applicable sequencing and genomic technologies led to the generation of many large-scale datasets exploring genomes, transcriptomes, or epigenomes of tumors. Availability of a wide range of datasets necessitated the development of new computational analysis approaches to generate novel insights from these datasets and improve our understanding of tumor development and progression.
This work focuses on a variety of graph-based approaches to evaluate their use in cancer genomics. We first focused on a graph-based semi-supervised learning approach called label propagation as a method to generate signaling networks from a gene set of interest. A distance metric based on the concept of maximal common subgraph was then established to quantify the degree of similarity observed across different networks. These two approaches were then combined to examine two separate cancer genomics datasets. Our first application focused on genes frequently altered across patients to build signaling networks that represent genes and pathways that are transcriptionally altered as a result of these mutations. These networks revealed the range of molecular events affected by each mutation and conserved changes observed across networks highlighted the critical signaling pathways tumors dysregulate through distinct alterations. The other area of focus for label propagation was the analysis of a set of melanoma samples resistant to BRAF inhibitors. Evaluation of networks of individual resistant samples revealed signaling changes shared across samples that have similar resistance mechanisms or originated from the same patient. Finally, drug response profiles of a large set of drugs were examined across cell lines belonging to eighteen different tumor types, by building bipartite graphs representing sensitivity patterns of drugs. These bipartite graphs were used to generate drug similarity graphs that revealed shared response profiles of drugs targeting distinct processes, which provided opportunities to refine the annotations of drug targets. Degree distributions of bipartite graphs also revealed drugs connected to exceptional responder cell lines, whose unique genomic profiles nominated potential markers of drug response. Collectively, the studies discussed here emphasize a variety of use cases for graph-based approaches in cancer genomics.
Item Open Access Methods for Systematic Exploratory Analysis of Gene Expression Data with Applications to Cancer Genomics(2017) Wagner, FlorianAdvances in technologies for gene expression profiling have resulted in an unprecedented abundance of gene expression data. However, computational methods available for the exploratory analysis of such data are limited in their ability to generate an interpretable overview of biologically relevant similarities and differences among samples. This work first introduces the XL-mHG test, a sensitive and specific hypothesis test for detecting gene set enrichment, and discusses its algorithmic and statistical properties. It further introduces GO-PCA, a method for exploratory analysis of gene expression data using prior knowledge. The XL-mHG test serves as a building block for GO-PCA. The output of GO-PCA consists of functional expression signatures, designed to provide an interpretable representation of biologically meaningful variation in the data. The power and versatility of the method is demonstrated on heterogeneous human and mouse expression data. Finally, applications of the proposed methods to carcinoma and lymphoma expression data aim to demonstrate their clinical relevance. The effective utilization of prior knowledge in the exploratory analysis of gene expression data through carefully designed computational methods is essential for successfully harnessing the power of current and future platforms for gene expression profiling, with the aim of generating clinically relevant insights into complex diseases such as cancer.
Item Open Access Profiling Blood Cancer Drivers through Large-Scale Genomics(2022) Kositsky, RachelThere are 140,000 new cases of blood cancers each year in the US and even more worldwide. Understanding the molecular and genomic origins of blood cancers can refine diagnosis, predict survival, and identify appropriate treatment. Large-scale projects profiling cancer genomes prioritized common cancer types, while other blood cancer genomics studies have been completed in an ad-hoc fashion. In this thesis, I will describe advances made to systemic large-scale molecular profiling of blood cancers.
To identify cancer drivers in both protein-coding and noncoding regions of the genome, I designed two novel capture panels. In my second chapter, I describe bioinformatics approaches I developed to identify accidental sample switches, refine alignment methods, and prioritize genes as potential cancer drivers.
Translocations are a major class of blood cancer drivers, which occur when two chromosomes break and repair incorrectly by fusing to each other. Previous studies using sequencing data to identify blood cancer-related translocations had only moderate sensitivity for several of the translocations compared to the clinical test. In my third chapter, I describe the development of a new translocation caller that is more sensitive to translocations in hypermutated regions that may have poor alignment to the reference genome, which is common in B-cell lymphomas.
Many patients with relapsed/refractory large B-cell lymphoma (R/R LBCL) have had success with chimeric antigen receptor T-cell (CAR-T) products approved by the FDA in 2017. However, a significant proportion of patients fail to respond to this highly expensive therapy and suffer from severe side effects while destined for poor survival. In my fourth chapter, I apply the genomic methods described earlier to identify predictors of resistance to CAR-T cell therapy in R/R LBCL. We found that complete response and survival were associated with clinical and molecular factors in the pre-treatment tumor.