Microsoft and Intel have devised a new approach to malware detection, dubbed STAMINA, that involves deep learning and the representation of malware as images.
STAtic Malware-as-Image Network Analysis (STAMINA) is a new approach to malware detection proposed by Microsoft and Intel. The study is based on a previous work of Intel’s researchers on static malware classification through deep transfer learning, its deliverable is then applied to a real-world dataset provided by Microsoft that allowed to evaluate its efficiency.
“We studied the practical benefits of applying deep transfer learning from computer vision to static malware classification. Recall that in the transfer learning scheme, we borrowed knowledge from natural images or objects and applied it to the target domain of static malware detection. The training time of deep neural networks is accelerated while high classification performance is still maintained.” reads the research paper on STAMINA. “In this paper, Intel Labs and the Microsoft Threat Intelligence Team have demonstrated the effectiveness of this approach on a real-world user dataset and have shown that transfer learning from computer vision for malware classification can achieve highly desirable classification performance. For this collaboration, we called this approach STAtic Malware-as-Image Network Analysis (STAMINA)”
The STAMINA approach is composed of four steps: preprocessing (image conversion), transfer learning, evaluation, and interpretation.
The approach relies on a new technique that converts malware samples into grayscale images, then the process of detection is based on the image scanning for textural and structural patterns associated with malware samples.
“The approach was motivated by visual inspection of application binaries
plotted as grey-scale images: there are textural and structural similarities among malware from the same family and dissimilarities between malware and benign software as well as across different malware families.” continues the report.
Experts pointed out the limits of a classic malware detection signature-based approach, static and dynamic approaches might not be accurate or time-efficient due to the evolution of malicious code.
STAMINA, the researchers explain, consists of four steps: preprocessing (image conversion), transfer learning, evaluation, and interpretation.
Preprocessing consists in creating a pixel stream assigning to every byte a value between 0 and 255 corresponding to a pixel intensity, reshaping the pixel streams into two dimensions, and resizing (“to 224 or 299 so that the image models trained on ImageNet can be used for fine tuning on the images”).
Then the approach involve transfer learning to train a malware classifier for static classification of malware samples. The systems are trained against malware and benign images during the preprocessing step.
“What has been done in the computer vision space is that, for specific tasks, models pretrained on a large number of images are used, and transfer learning is conducted on target tasks. Major transfer learning schemes include using as a feature extractor and fine-tuning the network.” the researchers note.
To evaluated the STAMINA approach, experts onsidered accuracy, false positive rate, precision, recall, F1 score, and area under the receiver operating curve (ROC).
The researchers used a Microsoft dataset composed of 2.2 million malware binary hashes, along with 10 columns of data information.
“In particular, per feedback from malware analysis practitioners, we also reported recall at 0.1% –10% false positive rate via ROC.” continues the paper.
“They split the training set, validation set and testing set 60:20:20, segmented along first time seen for benign and malicious.”
The test results confirmed that STAMINA can achieve a 99.07% accuracy with a false positive rate at 2.58% (precision is at 99.09% and recall at 99.66%). The precision is at 99.09% and recall at 99.66%. F1 score is 0.9937.
Experts highlighted that the approach is effective when applied to small-size applications, while is less effective for larger-size software due to the difficulty of converting “billions of pixels into JPEG images” and then resize them.
“For future work, we would like to evaluate hybrid models of using intermediate representations of the binaries and information extracted from binaries with deep learning approaches –these datasets are expected to be bigger but may provide higher accuracy.” the researchers conclude. “We also will continue to explore platform acceleration optimizations for our deep learning models so we can deploy such detection techniques with minimal power and performance impact to the end-user,”