A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

Wu, W.; May, R.; Maier, H.; Dandy, G.

doi:10.1002/2012WR012713

A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

dc.contributor.author	Wu, W.
dc.contributor.author	May, R.
dc.contributor.author	Maier, H.
dc.contributor.author	Dandy, G.
dc.date.issued	2013
dc.description.abstract	<jats:p>Data splitting is an important step in the artificial neural network (ANN) development process, whereby the available data are divided into training, testing, and validation subsets to ensure good generalization ability of the model. Considering that only one split of the data is typically used when developing ANN models, data splitting has a significant impact on model performance, depending on which data are allocated to the three subsets. Therefore, it is important to find a data splitting method that consistently results in predictive validation errors that are representative of the predictive errors obtained over the full range of the available data. This paper addresses this issue by introducing a benchmarking approach for comparing different data splitting methods in terms of (1) bias, which is the difference between the <jats:italic>expected</jats:italic> validation performance over the entire data set and that obtained using a particular data splitting method and (2) variability, which is the spread of the validation errors obtained by repeated implementation of that method. The utility of the proposed approach is assessed on a number of well‐known data splitting methods in the context of four water resources ANN modelling problems. The results obtained indicate that the proposed approach for comparing data splitting methods is more representative than the previous approach where a value of zero is used as the predictive performance benchmark, as it can avoid the selection of an over‐optimistic data splitting method that under‐represents extreme data in the validation set.</jats:p>
dc.description.statementofresponsibility	Wenyan Wu, Robert J. May, Holger R. Maier, and Graeme C. Dandy
dc.identifier.citation	Water Resources Research, 2013; 49(11):7598-7614
dc.identifier.doi	10.1002/2012WR012713
dc.identifier.issn	0043-1397
dc.identifier.issn	1944-7973
dc.identifier.orcid	Wu, W. [0000-0003-3907-1570]
dc.identifier.orcid	Maier, H. [0000-0002-0277-6887]
dc.identifier.orcid	Dandy, G. [0000-0001-5846-7365]
dc.identifier.uri	http://hdl.handle.net/2440/82055
dc.language.iso	en
dc.publisher	Amer Geophysical Union
dc.rights	©2013. American Geophysical Union. All Rights Reserved.
dc.source.uri	https://doi.org/10.1002/2012wr012713
dc.subject	artificial neural networks
dc.subject	data variability
dc.subject	data splitting methods
dc.subject	benchmark
dc.subject	validation
dc.title	A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks
dc.type	Journal article
pubs.publication-status	Published

Collections

Civil and Environmental Engineering publications

A benchmarking approach for comparing data splitting methods for modeling water resources parameters using artificial neural networks

Files

Collections