Heterogeneous univariate outlier ensembles in multidimensional data

Pang, G.; Cao, L.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/128808

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Pang, G.	-
dc.contributor.author	Cao, L.	-
dc.date.issued	2020	-
dc.identifier.citation	ACM Transactions on Knowledge Discovery from Data, 2020; 14(6):68-1-68-27	-
dc.identifier.issn	1556-4681	-
dc.identifier.issn	1556-472X	-
dc.identifier.uri	http://hdl.handle.net/2440/128808	-
dc.description.abstract	In outlier detection, recent major research has shifted from developing univariate methods to multivariate methods due to the rapid growth of multidimensional data. However, one typical issue of this paradigm shift is that many multidimensional data often mainly contains univariate outliers, in which many features are actually irrelevant. In such cases, multivariate methods are ineffective in identifying such outliers due to the potential biases and the curse of dimensionality brought by irrelevant features. Those univariate outliers might be well detected by applying univariate outlier detectors in individually relevant features. However, it is very challenging to choose a right univariate detector for each individual feature since different features may take very different probability distributions. To address this challenge, we introduce a novel Heterogeneous Univariate Outlier Ensembles (HUOE) framework and its instance ZDD to synthesize a set of heterogeneous univariate outlier detectors as base learners to build heterogeneous ensembles that are optimized for each individual feature. Extensive results on 19 real-world datasets and a collection of synthetic datasets show that ZDD obtains 5%–14% average AUC improvement over four state-of-the-art multivariate ensembles and performs substantially more robustly w.r.t. irrelevant features.	-
dc.description.statementofresponsibility	Guansong Pang, Longbing Cao	-
dc.language.iso	en	-
dc.publisher	Association for Computing Machinery	-
dc.rights	© 2020 Association for Computing Machinery. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.	-
dc.source.uri	http://dx.doi.org/10.1145/3403934	-
dc.subject	Outlier detection; outlier ensemble; anomaly detection; univariate outlier; multidimensional data; heterogeneous data	-
dc.title	Heterogeneous univariate outlier ensembles in multidimensional data	-
dc.type	Journal article	-
dc.identifier.doi	10.1145/3403934	-
dc.relation.grant	http://purl.org/au-research/grants/arc/DP190101079	-
pubs.publication-status	Published	-
dc.identifier.orcid	Pang, G. [0000-0002-9877-2716]	-
Appears in Collections:	Aurora harvest 4 Australian Institute for Machine Learning publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship