Reconsider Machine Learning Method for Variable Selection and Validation with High Dimensional Data

dc.contributor.advisor

Jung, Sin-Ho

dc.contributor.author

Liu, Lu

dc.date.accessioned

2025-01-08T17:44:58Z

dc.date.available

2025-01-08T17:44:58Z

dc.date.issued

2024

dc.department

Biostatistics and Bioinformatics Doctor of Philosophy

dc.description.abstract

The big data tendency influences how people think and inspires potential research directions. Recent feats of machine learning have seized collective attention because of its profound performance in conducting big data analysis including text analysis and image processing. Machine learning is also a popular topic in clinical medicine to implement analysis on electronic health records and medical image data, which traditional statistics model is not adequate for. However, we realize that machine learning is not panacea and its defects such as loss of interpretability and excess selection may restrict its application. And we must also recognize that for many clinical prediction analyses, the simpler approach-generalized linear model is enough for what we need.

In this dissertation, we propose to use standard regression methods, without any penalizing approach, combined with a stepwise variable selection procedure to overcome the over-selection issue of popular machine learning methods. For model validation, we propose a permutation approach to estimate the performance of various validation methods. Finally, we propose a repeated sieving approach, extending the standard regression methods with stepwise variable selection, to handle high dimensional modeling.

dc.identifier.uri

https://hdl.handle.net/10161/31967

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Biostatistics

dc.subject

logistic regression

dc.subject

machine learning

dc.subject

permutation

dc.subject

prediction model

dc.subject

ROC curve

dc.subject

variable selection

dc.title

Reconsider Machine Learning Method for Variable Selection and Validation with High Dimensional Data

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Liu_duke_0066D_18185.pdf
Size:
563.56 KB
Format:
Adobe Portable Document Format

Collections