Features engineering to differentiate between malware and legitimate software

Daeef, A.Y.; Al Naji, A.; Nahar, A.K.; Chahl, J.

doi:10.3390/app13031972

Features engineering to differentiate between malware and legitimate software

dc.contributor.author	Daeef, A.Y.
dc.contributor.author	Al Naji, A.
dc.contributor.author	Nahar, A.K.
dc.contributor.author	Chahl, J.
dc.date.issued	2023
dc.description.abstract	Malware is the primary attack vector against the modern enterprise. Therefore, it is crucial for businesses to exclude malware from their computer systems. The most responsive solution to this issue would operate in real time at the edge of the IT system using artificial intelligence. However, a lightweight solution is crucial at the edge because these options are restricted by the lack of available memory and processing power. The best contender to offer such a solution is application programming interface (API) calls. However, creating API call characteristics that offer a high malware detection rate with quick execution is a significant challenge. This work uses visualisation analysis and Jaccard similarity to uncover the hidden patterns produced by different API calls in order to accomplish this goal. This study also compared neural networks which use long sequences of API calls with shallow machine learning classifiers. Three classifiers are used: support vector machine (SVM), k-nearest neighbourhood (KNN), and random forest (RF). The benchmark data set comprises 43,876 examples of API call sequences, divided into two categories: malware and legitimate. The results showed that RF performed similarly to long short-term memory (LSTM) and deep graph convolutional neural networks (DGCNNs). They also suggest the potential for performing inference on edge devices in a real-time setting.
dc.identifier.citation	Applied Sciences, 2023; 13(3, article no. 1972):1-13
dc.identifier.doi	10.3390/app13031972
dc.identifier.issn	2076-3417
dc.identifier.issn	2076-3417
dc.identifier.uri	https://hdl.handle.net/11541.2/32567
dc.language.iso	en
dc.publisher	MDPIAG
dc.rights	Copyright 2023 The author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/)
dc.source.uri	https://doi.org/10.3390/app13031972
dc.subject	machine learning
dc.subject	Jaccard similarity
dc.subject	malware classification
dc.subject	API call sequence
dc.title	Features engineering to differentiate between malware and legitimate software
dc.type	Journal article
pubs.publication-status	Published
ror.fileinfo	12257839410001831 13257839400001831 applsci-13-01972
ror.mmsid	9916716431101831

Files

Original bundle

Now showing 1 - 1 of 1

Name:: applsci-13-01972.pdf
Size:: 6.13 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Features engineering to differentiate between malware and legitimate software

Files

Original bundle

Collections