What is Genetic Malware Analysis?

By Roy Halevi

At Intezer, we view malware analysis as a key component in properly and effectively responding to security incidents. We have introduced a new innovative approach to automate malware analysis and provide clear insights into any suspicious file. We call this approach Genetic Malware Analysis. In writing this piece, I wanted to define Genetic Malware Analysis and explain how our technology empowers security teams to improve and accelerate all stages of their incident response; from the initial alert to the final step of remediation.

Examining Code Similarities

Software is evolutionary; all software, whether legitimate or malicious, is composed of previously written code. Researchers have leveraged this principle in the past, searching for reused pieces of code in order to obtain unique and invaluable insights about malware samples. However, this process is typically done manually and requires a high level of understanding and expertise.

Examining malware code and looking for similarities is a proven technique that was used many times in the past few years in performing advanced analysis of threats. A very well-known example is the WannaCry ransomware, which contained fragments of code that were only seen before in malware samples associated with the Lazarus threat actor group. That piece of information was key to the attribution of the malware to North Korea. Earlier this year, in a joint research initiative with McAfee, our teams looked into 10 years of malware associated with North Korea and revealed links among many of the malware families.

Genetic Malware Analysis

While reverse engineering and searching for code similarities was once a manual process that required time and expertise, and could only be applied to a limited number of files, Genetic Malware Analysis is an automated and scalable process, that compares files against a huge database of both trusted and malicious software within seconds. Our research and engineering teams work hard to implement this approach and refine each step in the process in order to provide a Genetic Malware Analysis technology that is accessible to the masses.

At the first stage of the process, we parse and disassemble files using our own optimized disassemblers into assembly or intermediate language code. Then, the code is transformed into searchable tokens or “genes”. This process is similar to a tokenization process that is being used in search engines. A search engine uses techniques like breaking a document into words, removing stop words (like “the”), and lowercasing the tokens to be tolerant to different variations the text can appear in. Our gene extraction algorithm does the same to code. It uses techniques like breaking code into small fragments, ignoring variable names, and removing what is not essential to find matches between different variations code can appear in.

When the genes are extracted, we compare them against the code genome database and identify reused code. This database contains a massive amount of cataloged malware and trusted software, and is constantly growing and being updated. When new malware is added to the code genome database, we ensure that any file that is analyzed and contains shared malicious code is flagged, therefore, preventing the attacker from reusing the code without getting detected.

While focused on code, Genetic Malware Analysis is also used to explore similarities within other features of executables, like strings, metadata, resources, and more.

Intezer Analyze

About one year ago we launched our malware analysis platform Intezer Analyze, based upon the Genetic Malware Analysis approach. Additionally, during that time we have released our free community edition and made Genetic Malware Analysis available for every security analyst and researcher in the world. The platform dives deeply into the code level but is designed with simplicity in mind, which makes it accessible to analysts of all skill levels. Intezer Analyze is built for automation and offers powerful API and a variety of integrations with SIEM, SOAR, and other security solutions.

Detecting malware and finding links and connections between attacks are not the only features Genetic Malware Analysis technology offers. After a threat has been analyzed, our technology is able to highlight the malicious and unique code and automatically create vaccines, which are YARA signatures that are based on the code itself. Code-based signatures can identify different and future variants of the malware and are also effective in-memory, and therefore, are more powerful than hash-based signatures and can help in identifying infected machines and remediating the threat. In addition, the vaccines can be used to hunt for new, undetected malware samples.

Genetic Malware Analysis is an approach that applies wherever code is being executed. Recently, we demonstrated how Intezer Analyze was able to detect new IoT malware based on string reuse.

Looking ahead, Intezer is committed to expanding its platform by supporting executables from more operating systems and devices and adding more features and capabilities to the platform to help our customers. We want to continue in our mission to help as many enterprises as possible improve and accelerate incident response, identify and accurately classify malware at scale, and reduce the constant risk they face from cyber attacks.

February 20, 2019

cyber defense awardsWe are in our 11th year, and Global InfoSec Awards are incredibly well received – helping build buzz, customer awareness, sales and marketing growth opportunities, investment opportunities and so much more.
Cyber Defense Awards

12th Anniversary Global InfoSec Awards for 2024 are now Open! Take advantage of co-marketing packages and enter today!

X