Gene selection using iterative feature elimination random forests for survival outcomes.
Abstract
Although many feature selection methods for classification have been developed, there
is a need to identify genes in high-dimensional data with censored survival outcomes.
Traditional methods for gene selection in classification problems have several drawbacks.
First, the majority of the gene selection approaches for classification are single-gene
based. Second, many of the gene selection procedures are not embedded within the algorithm
itself. The technique of random forests has been found to perform well in high-dimensional
data settings with survival outcomes. It also has an embedded feature to identify
variables of importance. Therefore, it is an ideal candidate for gene selection in
high-dimensional data with survival outcomes. In this paper, we develop a novel method
based on the random forests to identify a set of prognostic genes. We compare our
method with several machine learning methods and various node split criteria using
several real data sets. Our method performed well in both simulations and real data
analysis.Additionally, we have shown the advantages of our approach over single-gene-based
approaches. Our method incorporates multivariate correlations in microarray data for
survival outcomes. The described method allows us to better utilize the information
available from microarray data with survival outcomes.
Type
Journal articleSubject
AlgorithmsArtificial Intelligence
Gene Expression Profiling
Oligonucleotide Array Sequence Analysis
Pattern Recognition, Automated
Permalink
https://hdl.handle.net/10161/9228Published Version (Please cite this version)
10.1109/TCBB.2012.63Publication Info
Pang, Herbert; George, Stephen L; Hui, Ken; & Tong, Tiejun (2012). Gene selection using iterative feature elimination random forests for survival outcomes.
IEEE/ACM Trans Comput Biol Bioinform, 9(5). pp. 1422-1431. 10.1109/TCBB.2012.63. Retrieved from https://hdl.handle.net/10161/9228.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
Stephen L. George
Professor Emeritus of Biostatistics & Bioinformatics
Statistical issues related to the design, conduct, and analysis of clinical trials
and related biomedical studies including sample size and study length determinations,
sequential procedures, and the analysis of prognostic or predictive factors in clinical
trials.
Herbert Pang
Adjunct Assistant Professor in the Department of Biostatistics & Bioinformatics
Classification and Predictive Models Design and Analysis of Biomarker Clinical Trials
Genomics Pathway Analysis
Alphabetical list of authors with Scholars@Duke profiles.

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info