Adaptive performance anomaly detection in distributed systems using online SVMs

dc.contributor.authorAlvarez Cid-Fuentes, J.
dc.contributor.authorSzabo, C.
dc.contributor.authorFalkner, K.
dc.date.issued2020
dc.description.abstractPerformance anomaly detection is crucial for long running, large scale distributed systems. However, existing works focus on the detection of specific types of anomalies, rely on historical failure data, and cannot adapt to changes in system behavior at run time. In this work, we propose an adaptive framework for the detection and identification of complex anomalous behaviors, such as deadlocks and livelocks, in distributed systems without historical failure data. Our framework employs a two-step process involving two online SVM classifiers on periodically collected system metrics to identify at run time normal and anomalous behaviors such as deadlock, livelock, unwanted synchronization, and memory leaks. Our approach achieves over 0.70 F-score in detecting previously unseen anomalies and 0.78 F-score in identifying the type of known anomalies with a short delay after the anomalies appear, and with minimal expert intervention. Our experimental analysis uses system execution traces from our in-house distributed system with varied behaviors and a dataset by Yahoo!, and shows the benefits of our approach as well as future research challenges.
dc.description.statementofresponsibilityJavier Alvarez Cid-Fuentes, Claudia Szabo, Katrina Falkner
dc.identifier.citationIEEE Transactions on Dependable and Secure Computing, 2020; 17(5):9281-941
dc.identifier.doi10.1109/TDSC.2018.2821693
dc.identifier.issn1545-5971
dc.identifier.issn1941-0018
dc.identifier.orcidSzabo, C. [0000-0003-2501-1155]
dc.identifier.orcidFalkner, K. [0000-0003-0309-4332]
dc.identifier.urihttp://hdl.handle.net/2440/116460
dc.language.isoen
dc.publisherIEEE
dc.rights© 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
dc.source.urihttps://doi.org/10.1109/tdsc.2018.2821693
dc.subjectMeasurement; anomaly detection; detectors; correlation; cloud computing; system analysis and design; adaptation models
dc.titleAdaptive performance anomaly detection in distributed systems using online SVMs
dc.typeJournal article
pubs.publication-statusPublished

Files