Gene selection using iterative feature elimination random forests for survival outcomes.

dc.contributor.author

Pang, Herbert

dc.contributor.author

George, Stephen L

dc.contributor.author

Hui, Ken

dc.contributor.author

Tong, Tiejun

dc.coverage.spatial

United States

dc.date.accessioned

2014-11-07T20:03:27Z

dc.date.issued

2012-09

dc.description.abstract

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

dc.identifier

http://www.ncbi.nlm.nih.gov/pubmed/22547432

dc.identifier.eissn

1557-9964

dc.identifier.uri

https://hdl.handle.net/10161/9228

dc.language

eng

dc.publisher

Institute of Electrical and Electronics Engineers (IEEE)

dc.relation.ispartof

IEEE/ACM Trans Comput Biol Bioinform

dc.relation.isversionof

10.1109/TCBB.2012.63

dc.subject

Algorithms

dc.subject

Artificial Intelligence

dc.subject

Gene Expression Profiling

dc.subject

Oligonucleotide Array Sequence Analysis

dc.subject

Pattern Recognition, Automated

dc.title

Gene selection using iterative feature elimination random forests for survival outcomes.

dc.type

Journal article

duke.contributor.orcid

George, Stephen L|0000-0002-3625-5852

pubs.author-url

http://www.ncbi.nlm.nih.gov/pubmed/22547432

pubs.begin-page

1422

pubs.end-page

1431

pubs.issue

5

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Duke

pubs.organisational-group

School of Medicine

pubs.publication-status

Published

pubs.volume

9

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pang et al 2012.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format