Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data

Mayrink, Vinicius Diniz

Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data

View / Download26.57 MB

Date

2011

Authors

Mayrink, Vinicius Diniz

Advisors

Lucas, Joseph E

Repository Usage Stats

532
views

241
downloads

Abstract

An important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, is still highly co-expressed in another data set. For some microarray platforms there are many, relatively short, probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted transcript, but rather a different gene with a similar region (called cross-hybridization). Similarly, the incorrect mapping of short nucleotide sequences to a target gene is a common issue related to the young technology producing RNA-Seq data. The expression pattern across samples is a valuable source of information, which can be used to address distinct problems through the application of factor models. Our first study is focused on the identification of the presence/absence status of a gene in a sample. We compare our factor model to state-of-the-art detection methods; the results suggest superior performance of the factor analysis for detecting transcripts. In the second study, we apply factor models to investigate gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. Copy number alteration is detected for a group of genes in breast cancer; our goal is to examine this abnormality in the same chromosomal region for other types of tumors (Ovarian, Lung and Brain). In the third application, the expression pattern related to RNA-Seq count data is evaluated through a factor model based on the Poisson distribution. Here, the presence/absence of coherent patterns is closely associated with the number of incorrect read mappings. The final study of this dissertation is dedicated to the analysis of multi-factor models with linear and non-linear structure of interactions between latent factors. The interaction terms can have important implications in the model; they represent relationships between genes which cannot be captured in an ordinary analysis.

Type

Dissertation

Department

Statistical Science

Subjects

Statistics, Bayesian analysis, Coherent pattern, Factor model, Gene expression, Interactions, Sparsity prior

Permalink

https://hdl.handle.net/10161/3865

Citation

Mayrink, Vinicius Diniz (2011). Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/3865.

Collections

Dissertations

Full item page

Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.

Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections