Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models

Lughofer, E.; Pratama, M.

doi:10.1109/TFUZZ.2017.2654504

Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models

dc.contributor.author	Lughofer, E.
dc.contributor.author	Pratama, M.
dc.date.issued	2018
dc.description.abstract	In this paper, we propose three criteria for efficient sample selection in case of data stream regression problems within an online active learning context. The selection becomes important whenever the target values, which guide the update of the regressors as well as the implicit model structures, are costly or time-consuming to measure and also in case when very fast models updates are required to cope with stream mining real-time demands. Reducing the selected samples as much as possible while keeping the predictive accuracy of the models on a high level is, thus, a central challenge. This should be ideally achieved in unsupervised and single-pass manner. Our selection criteria rely on three aspects: 1) the extrapolation degree combined with the model's nonlinearity degree, which is measured in terms of a new specific homogeneity criterion among adjacent local approximators; 2) the uncertainty in model outputs, which can be measured in terms of confidence intervals using so-called adaptive local error bars-we integrate a weighted localization of an incremental noise level estimator and propose formulas for online merging of local error bars; 3) the uncertainty in model parameters, which is estimated by the so-called A-optimality criterion, which relies on the Fisher information matrix. The selection criteria are developed in combination with evolving generalized Takagi-Sugeno (TS) fuzzy models (containing rules in arbitrarily rotated position), as it could be shown in previous publications that these outperform conventional evolving TS models (containing axis-parallel rules). The results based on three high-dimensional real-world streaming problems show that a model update based on only 10%-20% selected samples can still achieve similar accumulated model errors over time to the case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer. Random sampling with the same percentages of samples selected, however, achieved much higher error rates. Hence, the intelligence in our sample selection concept leads to an economic balance between model accuracy and measurement as well computational costs for model updates. Our selection criteria rely on three aspects: 1) the extrapolation degree combined with the model's nonlinearity degree, which is measured in terms of a new specific homogeneity criterion among adjacent local approximators; 2) the uncertainty in model outputs, which can be measured in terms of confidence intervals using so-called adaptive local error bars-we integrate a weighted localization of an incremental noise level estimator and propose formulas for online merging of local error bars; 3) the uncertainty in model parameters, which is estimated by the so-called A-optimality criterion, which relies on the Fisher information matrix. The selection criteria are developed in combination with evolving generalized Takagi-Sugeno (TS) fuzzy models (containing rules in arbitrarily rotated position), as it could be shown in previous publications that these outperform conventional evolving TS models (containing axis-parallel rules). The results based on three high-dimensional real-world streaming problems show that a model update based on only 10%-20% selected samples can still achieve similar accumulated model errors over time to the case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer. Random sampling with the same percentages of samples selected, however, achieved much higher error rates. Hence, the intelligence in our sample selection concept leads to an economic balance between model accuracy and measurement as well computational costs for model updates.
dc.identifier.citation	IEEE transactions on fuzzy systems, 2018; 26(1, article no. 7820039):292-309
dc.identifier.doi	10.1109/TFUZZ.2017.2654504
dc.identifier.issn	1063-6706
dc.identifier.issn	1941-0034
dc.identifier.uri	https://hdl.handle.net/11541.2/27222
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation.funding	The federal state of Upper Austria
dc.relation.funding	Austrian federal government
dc.rights	Copyright 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission (http://www.ieee.org/publications standards/publications/rights/index.html)
dc.source.uri	https://doi.org/10.1109/TFUZZ.2017.2654504
dc.subject	active learning latency buffer (ALLB)
dc.subject	data stream regression
dc.subject	evolving generalized Takagi–Sugeno (TS) fuzzy systems
dc.subject	extrapolation degree
dc.subject	nonlinearity degree
dc.subject	online active learning
dc.subject	single-pass uncertainty-based sampling
dc.subject	uncertainty in model outputs and parameters
dc.title	Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models
dc.type	Journal article
pubs.publication-status	Published
ror.mmsid	9916608451301831

Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models

Files

Collections