Gene selection using iterative feature elimination random forests for survival outcomes.

Pang, Herbert; George, Stephen L; Hui, Ken; Tong, Tiejun

Gene selection using iterative feature elimination random forests for survival outcomes.

View / Download1.73 MB

Date

2012-09

Authors

Repository Usage Stats

208
views

215
downloads

Citation Stats

Abstract

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Type

Journal article

Subjects

Algorithms, Artificial Intelligence, Gene Expression Profiling, Oligonucleotide Array Sequence Analysis, Pattern Recognition, Automated

Permalink

https://hdl.handle.net/10161/9228

Published Version (Please cite this version)

10.1109/TCBB.2012.63

Publication Info

Pang, Herbert, Stephen L George, Ken Hui and Tiejun Tong (2012). Gene selection using iterative feature elimination random forests for survival outcomes. IEEE/ACM Trans Comput Biol Bioinform, 9(5). pp. 1422–1431. 10.1109/TCBB.2012.63 Retrieved from https://hdl.handle.net/10161/9228.

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.

Collections

Scholarly Articles

Full item page

Scholars@Duke

Herbert Pang

Adjunct Assistant Professor in the Department of Biostatistics & Bioinformatics

Classification and Predictive Models
Design and Analysis of Biomarker Clinical Trials
Genomics
Pathway Analysis

Stephen L. George

Professor Emeritus of Biostatistics & Bioinformatics

Statistical issues related to the design, conduct, and analysis of clinical trials and related biomedical studies including sample size and study length determinations, sequential procedures, and the analysis of prognostic or predictive factors in clinical trials.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.

Gene selection using iterative feature elimination random forests for survival outcomes.

Date

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Citation Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Published Version (Please cite this version)

Publication Info

Collections

Scholars@Duke

Herbert Pang

Stephen L. George