Show simple item record

dc.contributor.advisor Lucas, Joseph E en_US
dc.contributor.author Mayrink, Vinicius Diniz en_US
dc.date.accessioned 2011-05-20T19:35:37Z
dc.date.available 2011-05-20T19:35:37Z
dc.date.issued 2011 en_US
dc.identifier.uri http://hdl.handle.net/10161/3865
dc.description Dissertation en_US
dc.description.abstract <p>An important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, is still highly co-expressed in another data set. For some microarray platforms there are many, relatively short, probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted transcript, but rather a different gene with a similar region (called cross-hybridization). Similarly, the incorrect mapping of short nucleotide sequences to a target gene is a common issue related to the young technology producing RNA-Seq data. The expression pattern across samples is a valuable source of information, which can be used to address distinct problems through the application of factor models. Our first study is focused on the identification of the presence/absence status of a gene in a sample. We compare our factor model to state-of-the-art detection methods; the results suggest superior performance of the factor analysis for detecting transcripts. In the second study, we apply factor models to investigate gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. Copy number alteration is detected for a group of genes in breast cancer; our goal is to examine this abnormality in the same chromosomal region for other types of tumors (Ovarian, Lung and Brain). In the third application, the expression pattern related to RNA-Seq count data is evaluated through a factor model based on the Poisson distribution. Here, the presence/absence of coherent patterns is closely associated with the number of incorrect read mappings. The final study of this dissertation is dedicated to the analysis of multi-factor models with linear and non-linear structure of interactions between latent factors. The interaction terms can have important implications in the model; they represent relationships between genes which cannot be captured in an ordinary analysis.</p> en_US
dc.subject Statistics en_US
dc.subject Bayesian analysis en_US
dc.subject Coherent pattern en_US
dc.subject Factor Model en_US
dc.subject Gene expression en_US
dc.subject Interactions en_US
dc.subject Sparsity prior en_US
dc.title Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data en_US
dc.type Dissertation en_US
dc.department Statistical Science en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record