Cross-validation for supervised learning with tuning parameters
Files
(Published version)
Date
2022
Authors
Winderbaum, L.
Koch, I.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Journal article
Citation
Journal of Statistics and Computer Science, 2022; 1(1):55-76
Statement of Responsibility
Conference Name
Abstract
Recent advances in machine learning and data science have led to widespread adoption of complex predictive modelling. Increasing awareness of the ‘reproducibility crisis’ has led to calls for improved transparency and accountability in scientic reporting. One important aspect of veridical data science is the robust estimation of prediction error. Availability of computational resources has led to cross-validation (CV) as a main tool for such estimation. We consider CV estimation in supervised learning for high-dimensional data, and focus on linear regression and discriminant analysis approaches based on variable selection with direct dimension reduction as well as lasso-type sparsity criteria. We highlight how the same description of a method could in fact apply to any one of several different cross-validation implementations. We outline key principles underpinning good cross-validation practice, several ‘pitfall’ implementations which subtly violate these principles in different ways as well as a more complex and computationally intensive implementation which does not. We demonstrate the differences in the estimated error resulting from these different implementations with real data relating to endometrial cancer, in the context of high-stakes decision making where accurate and robust estimation of prediction error is critical. We use simulated data to illustrate how these different implementations result in estimators for prediction error with very different properties and relationships to the true prediction error. We call for increased detail in method-reporting, present principles for good practice in the implementation of cross-validation, and make recommendations to guide cross-validation implementation.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
Copyright 2022 ARF India. ARF India provides online open access to all its journals. (https://www.arfjournals.com/ethics-policy)