Cross-validation for supervised learning with tuning parameters

Date

2022

Authors

Winderbaum, L.
Koch, I.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Journal article

Citation

Journal of Statistics and Computer Science, 2022; 1(1):55-76

Statement of Responsibility

Conference Name

Abstract

Recent advances in machine learning and data science have led to widespread adoption of complex predictive modelling. Increasing awareness of the ‘reproducibility crisis’ has led to calls for improved transparency and accountability in scientic reporting. One important aspect of veridical data science is the robust estimation of prediction error. Availability of computational resources has led to cross-validation (CV) as a main tool for such estimation. We consider CV estimation in supervised learning for high-dimensional data, and focus on linear regression and discriminant analysis approaches based on variable selection with direct dimension reduction as well as lasso-type sparsity criteria. We highlight how the same description of a method could in fact apply to any one of several different cross-validation implementations. We outline key principles underpinning good cross-validation practice, several ‘pitfall’ implementations which subtly violate these principles in different ways as well as a more complex and computationally intensive implementation which does not. We demonstrate the differences in the estimated error resulting from these different implementations with real data relating to endometrial cancer, in the context of high-stakes decision making where accurate and robust estimation of prediction error is critical. We use simulated data to illustrate how these different implementations result in estimators for prediction error with very different properties and relationships to the true prediction error. We call for increased detail in method-reporting, present principles for good practice in the implementation of cross-validation, and make recommendations to guide cross-validation implementation.

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

Copyright 2022 ARF India. ARF India provides online open access to all its journals. (https://www.arfjournals.com/ethics-policy)

License

Grant ID

Call number

Persistent link to this record