Comparison of regression imputation methods of baseline covariates that predict survival outcomes.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats


Citation Stats



Missing data are inevitable in medical research and appropriate handling of missing data is critical for statistical estimation and making inferences. Imputation is often employed in order to maximize the amount of data available for statistical analysis and is preferred over the typically biased output of complete case analysis. This article examines several types of regression imputation of missing covariates in the prediction of time-to-event outcomes subject to right censoring.


We evaluated the performance of five regression methods in the imputation of missing covariates for the proportional hazards model via summary statistics, including proportional bias and proportional mean squared error. The primary objective was to determine which among the parametric generalized linear models (GLMs) and least absolute shrinkage and selection operator (LASSO), and nonparametric multivariate adaptive regression splines (MARS), support vector machine (SVM), and random forest (RF), provides the "best" imputation model for baseline missing covariates in predicting a survival outcome.


LASSO on an average observed the smallest bias, mean square error, mean square prediction error, and median absolute deviation (MAD) of the final analysis model's parameters among all five methods considered. SVM performed the second best while GLM and MARS exhibited the lowest relative performances.


LASSO and SVM outperform GLM, MARS, and RF in the context of regression imputation for prediction of a time-to-event outcome.





Published Version (Please cite this version)


Publication Info

Solomon, Nicole, Yuliya Lokhnygina and Susan Halabi (2020). Comparison of regression imputation methods of baseline covariates that predict survival outcomes. Journal of clinical and translational science, 5(1). p. e40. 10.1017/cts.2020.533 Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Nicole Solomon

Biostatistician, Senior

Yuliya Vladimirovna Lokhnygina

Associate Professor of Biostatistics & Bioinformatics

Statistical methods in clinical trials, survival analysis, adaptive designs, adaptive treatment strategies, causal inference in observational studies, semiparametric inference


Susan Halabi

James B. Duke Distinguished Professor

Design and analysis of clinical trials, statistical analysis of biomarker and high dimensional data, development and validation of prognostic and predictive models.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.