Variable and threshold selection to control predictive accuracy in logistic regression

dc.contributor.author

Kuk, AYC

dc.contributor.author

Li, J

dc.contributor.author

John Rush, A

dc.date.accessioned

2022-04-14T00:10:29Z

dc.date.available

2022-04-14T00:10:29Z

dc.date.issued

2014-01-01

dc.date.updated

2022-04-14T00:10:29Z

dc.description.abstract

Summary: Using data collected from the 'Sequenced treatment alternatives to relieve depression' study, we use logistic regression to predict whether a patient will respond to treatment on the basis of early symptom change and patient characteristics. Model selection criteria such as the Akaike information criterion AIC and mean-squared-error of prediction MSEP may not be appropriate if the aim is to predict with a high degree of certainty who will respond or not respond to treatment. Towards this aim, we generalize the definition of the positive and negative predictive value curves to the case of multiple predictors. We point out that it is the ordering rather than the precise values of the response probabilities which is important, and we arrive at a unified approach to model selection via two-sample rank tests. To avoid overfitting, we define a cross-validated version of the positive and negative predictive value curves and compare these curves after smoothing for various models. When applied to the study data, we obtain a ranking of models that differs from those based on AIC and MSEP, as well as a tree-based method and regularized logistic regression using a lasso penalty. Our selected model performs consistently well for both 4-week-ahead and 7-week-ahead predictions. © 2014 Royal Statistical Society.

dc.identifier.issn

0035-9254

dc.identifier.issn

1467-9876

dc.identifier.uri

https://hdl.handle.net/10161/24819

dc.language

en

dc.publisher

Wiley

dc.relation.ispartof

Journal of the Royal Statistical Society. Series C: Applied Statistics

dc.relation.isversionof

10.1111/rssc.12058

dc.subject

Antidepression trials

dc.subject

Cross-validation

dc.subject

Indeterminate class

dc.subject

Logistic regression

dc.subject

Model selection

dc.subject

Negative predictive value

dc.subject

Positive predictive value

dc.title

Variable and threshold selection to control predictive accuracy in logistic regression

dc.type

Journal article

pubs.begin-page

657

pubs.end-page

672

pubs.issue

4

pubs.organisational-group

Duke

pubs.organisational-group

School of Medicine

pubs.publication-status

Published

pubs.volume

63

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kuk_Rush-2014_Variable and threshold selection to control predictive accuracy in logistic regression.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format