Dave, Sandeep SWagner, Florian2017-05-162019-04-262017https://hdl.handle.net/10161/14375<p>Advances in technologies for gene expression profiling have resulted in an unprecedented abundance of gene expression data. However, computational methods available for the exploratory analysis of such data are limited in their ability to generate an interpretable overview of biologically relevant similarities and differences among samples. This work first introduces the XL-mHG test, a sensitive and specific hypothesis test for detecting gene set enrichment, and discusses its algorithmic and statistical properties. It further introduces GO-PCA, a method for exploratory analysis of gene expression data using prior knowledge. The XL-mHG test serves as a building block for GO-PCA. The output of GO-PCA consists of functional expression signatures, designed to provide an interpretable representation of biologically meaningful variation in the data. The power and versatility of the method is demonstrated on heterogeneous human and mouse expression data. Finally, applications of the proposed methods to carcinoma and lymphoma expression data aim to demonstrate their clinical relevance. The effective utilization of prior knowledge in the exploratory analysis of gene expression data through carefully designed computational methods is essential for successfully harnessing the power of current and future platforms for gene expression profiling, with the aim of generating clinically relevant insights into complex diseases such as cancer.</p>BiologyComputer scienceBioinformaticsAlgorithmsCancer genomicsexploratory data analysisGene expressionNonparametric statisticsTranscriptomicsMethods for Systematic Exploratory Analysis of Gene Expression Data with Applications to Cancer GenomicsDissertation