Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/77208
Type: Conference paper
Title: A method for comparing data splitting approaches for developing hydrological ANN models
Author: Wu, W.
May, R.
Dandy, G.
Maier, H.
Citation: Proceedings of the International Congress on Environmental Modelling and Software: Managing Resources of a Limited Planet, 6th Biennial Meeting, held in Leipzig, Germany, 2012, 1-5 July, 2012 / R. Seppelt, A.A. Voinov, S. Lange and D. Bankamp (eds.): pp.1-8
Publisher: International Environmental Modelling & Software Society
Publisher Place: online
Issue Date: 2012
ISBN: 9788890357428
Conference Name: International Congress on Environmental Modelling and Software (6th : 2012 : Leipzig, Germany)
Statement of
Responsibility: 
Wenyan Wu, Robert May, Graeme C. Dandy and Holger R. Maier
Abstract: Data splitting is an important step in the artificial neural network (ANN)development process whereby data are divided into training, test and validation subsets to ensure good generalization ability of the model. Considering that only one split of data is typically used when developing ANN models, data splitting has a significant impact on the performance of the final model by potentially introducing bias and variance into the model development process. Therefore, it is important to find a robust data splitting method which results in an ANN model that represents the underlying data generation process of a given dataset. In practice, ANN models developed using different data splitting methods are often assessed based on validation results. In previous research, however, it has been found that validation results alone are not adequate for assessing the performance of ANN models. Data splitting methods have the potential to bias the validation results by allocating extreme observations into the training set and therefore, the test and validation sets contain fewer patterns compared to the training set. Consequently, the generalization ability of the model may be compromised and the trained model cannot be adequately validated. This paper introduces a method to compare different data splitting methods for developing ANN models fairly. The methodology is applied to compare a number of well-known data splitting techniques in the context of some hydrological ANN modeling problems.
Keywords: Artificial neural networks
data splitting
Rights: The copyright of all papers is an exclusive right of the authors. No work can be reproduced without written permission of the authors.
Description (link): http://www.iemss.org/sites/iemss2012/proceedings.html
Appears in Collections:Aurora harvest 4
Civil and Environmental Engineering publications
Environment Institute publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.