Multi-Labeling of Complex, Multi-Behavioral Malware Samples
Identificadores
URI: http://hdl.handle.net/10481/76559Metadata
Show full item recordMateria
Malware Dataset Android
Date
2022-10Referencia bibliográfica
García Teodoro, P., Gómez Hernández, J. A., Abellán Galera, A., Multi-Labeling of Complex, Multi-Behavioral Malware Samples, Computers & Security, Volume 121, 102845
Abstract
The use of malware samples is usually required to test cyber security solutions. For that, the correct typology of the samples is of interest to properly estimate the exhibited performance of the tools under evaluation. Although several malware datasets are publicly available at present, most of them are not labeled or, if so, only one class or tag is assigned to each malware sample. We defend that just one label is not enough to represent the usual complex behavior exhibited by most of current malware. With this hypothesis in mind, and based on the varied classification generally provided by automatic detection engines per sample, we introduce here a simple multi-labeling approach to automatically tag the usual multiple behavior of malware samples. In the paper, we first analyze the coherence between the behaviors exhibited by a specific number of well-known malware samples dissected in the literature and the multiple tags provided for them by our labeling proposal. After that, the automatic multi-labeling scheme is executed over four public Android malware datasets, the different results and statistics obtained regarding their composition and representativeness being discussed. We share in a GitHub repository the multi-labeling tool developed, for public usage.