Publications

Data augmentation and transfer learning to classify malware images in a deep learning context  (2021)

Authors:
Marastoni, N; Giacobazzi, R; Dalla Preda, M
Title:
Data augmentation and transfer learning to classify malware images in a deep learning context
Year:
2021
Type of item:
Articolo in Rivista
Tipologia ANVUR:
Articolo su rivista
Language:
Inglese
Referee:
No
Name of journal:
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES
ISSN of journal:
2274-2042
Page numbers:
1-19
Keyword:
Deep learning; Binaries; Malware
Short description of contents:
In the past few years, malware classification techniques have shifted from shallow traditional machine learning models to deeper neural network architectures. The main benefit of some of these is the ability to work with raw data, guaranteed by their automatic feature extraction capabilities. This results in less technical expertise needed while building the models, thus less initial pre-processing resources. Nevertheless, such advantage comes with its drawbacks, since deep learning models require huge quantities of data in order to generate a model that generalizes well. The amount of data required to train a deep network without overfitting is often unobtainable for malware analysts. We take inspiration from image-based data augmentation techniques and apply a sequence of semantics-preserving syntactic code transformations (obfuscations) to a small dataset of programs to generate a larger dataset. We then design two learning models, a convolutional neural network and a bi-directional long short-term memory, and we train them on images extracted from compiled binaries of the newly generated dataset. Through transfer learning we then take the features learned from the obfuscated binaries and train the models against two state of the art malware datasets, each containing around 10 000 samples. Our models easily achieve up to 98.5% accuracy on the test set, which is on par or better than the present state of the art approaches, thus validating the approach.
Product ID:
122512
Handle IRIS:
11562/1049357
Last Modified:
February 12, 2025
Bibliographic citation:
Marastoni, N; Giacobazzi, R; Dalla Preda, M, Data augmentation and transfer learning to classify malware images in a deep learning context «JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES»2021pp. 1-19

Consulta la scheda completa presente nel repository istituzionale della Ricerca di Ateneo IRIS

<<back

Activities

Research facilities

Share