On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

Maier, H.R.; Zheng, F.; Gupta, H.; Chen, J.; Mai, J.; Savic, D.; Loritz, R.; Wu, W.; Guo, D.; Bennett, A.; Jakeman, A.; Razavi, S.; Zhao, J.

doi:10.1016/j.envsoft.2023.105779

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

Files

hdl_139464.pdf (1.95 MB)

(Published version)

Date

2023

Authors

Maier, H.R.

Zheng, F.

Gupta, H.

Chen, J.

Mai, J.

Savic, D.

Loritz, R.

Wu, W.

Guo, D.

Bennett, A.

Type:

Journal article

Citation

Environmental Modelling and Software, 2023; 167:105779-1-105779-8

Statement of Responsibility

Holger R. Maier, Feifei Zheng, Hoshin Gupta, Junyi Chen, Juliane Mai, Dragan Savic, Ralf Loritz, Wenyan Wu, Danlu Guo, Andrew Bennett, Anthony Jakeman, Saman Razavi, Jianshi Zhao

DOI

10.1016/j.envsoft.2023.105779

Abstract

Models play a pivotal role in advancing our understanding of Earth’s physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the “elephant in the room”, leading to improved model generalization capabilities.

Rights

Grant ID

http://purl.org/au-research/grants/arc/DE210100117

Published Version

https://doi.org/10.1016/j.envsoft.2023.105779

Persistent link to this record

https://hdl.handle.net/2440/139464

Full item page

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record