MITRE ATT&CK vulnerability spotlight: Obfuscated files or information
Introduction
MITRE helps support the U.S. government’s R&D efforts by acting as a Federally Funded Research and Development Center (FFRDC), a non-profit organization dedicated to acting as a trusted development and testing organization. One of the fields where MITRE performed research is cybersecurity.
The MITRE ATT&CK Matrix is probably the most famous product created by MITRE in the cybersecurity domain. This tool breaks down the cyberattack life cycle into a series of stages and describes the various means by which the attacker can achieve the goals of each stage.
One stage of the attack life cycle used by MITRE is Defense Evasion, where the attacker attempts to bypass or defeat protections or detection tools put in place by the defender. And one means of accomplishing some of the goals of defense evasion is through the use of obfuscated files or information.
What are obfuscated files or information?
Many cybersecurity detection products (antivirus, IDS and so on) are designed to work based on signatures of malware. Once a particular malware variant has been identified in the wild, unique features of the malware are extracted and used to detect and identify it in future infections. Every piece of data passing through the network perimeter or being downloaded to a host is compared to these signatures. If a match is found, action is taken (deletion, quarantine, alerting and so on).
The goal of obfuscation is to defeat these signature-based detection systems and to increase the difficulty of performing a forensic analysis of a malware sample. If the data or code that a signature is based upon is obfuscated in some way, detection engines looking for the plaintext signature will be unable to find a match.
There are many obfuscation algorithms in existence; however, many of them can be classified into a few different categories. Some of the main forms of obfuscation include:
- Compression: While compression algorithms like ZIP are designed to shrink the size of a file or data, they also do a great job of obfuscating it. Malware authors will use compression to slip malicious code or data past systems incapable of decompressing it
- Encoding: Encoding algorithms are another means of obfuscating code or data. Algorithms like Base64 or URL encoding are designed to make arbitrary data fit the rules of a certain protocol (alphanumeric, no reserved characters and so on), but they also serve to defeat signature matching
- Encryption: Encryption algorithms are designed to ensure that data looks random to anyone without the decryption key. While malware must have the key embedded in it to perform decryption, most scanning systems don’t have the ability to find the key and decrypt the data for signature matching
- Steganography: Steganography involves hiding data within another file. For example, the least significant bit of each byte of a picture file can be used to encode binary data. The change is visually undetectable to a human and can sneak data past many systems
Using these methods, malware authors can conceal a variety of different types of files and data. For example, malware may be designed to hide its malicious code using obfuscation, leaving only code designed to unwrap the obfuscated code exposed. Or a malware variant may encrypt its configuration files, making it more difficult for a malware analyst to understand how it functions. These (and a variety of other options) are ways that malware authors actively use obfuscation for defense evasion.
Examples of obfuscated files or information
Obfuscation is commonly used to disguise easily identifiable code or data within a malware sample. For example, malware may encode its Command and Control (C2) traffic to hide the data that it is exfiltrating from a compromised machine.
A simple obfuscation strategy in common use is XOR encryption. The malware will use a repeated key to encrypt the entirety of the data. For example, it’s possible to encrypt the phrase XORENCRYPTION with the secret key OBFUSCATION (or actually OBFUSCATIONOB since the key is shorter than the plaintext).
The first characters of the plaintext and key are X and O, which are 1011000 and 1001111 in ASCII-encoded binary. Performing the XOR operation produces a ciphertext of 0010111. This process is repeated for each character in the plaintext.
Since this operation produces non-printable characters, malware will commonly follow XOR encryption with Base64 encoding. The purpose of Base64 encoding is to produce a ciphertext that consists solely of alphanumeric characters, which can be transmitted over a variety of different protocols. At the other end, the same secret key is used to reverse the XOR encryption and retrieve the original plaintext.
Detection and mitigation
Detection of obfuscated data can be performed in a couple different ways. The first is to look for the unwrapper code that the malware needs to decode the code or data into a usable form. Each encoding algorithm can be recognizable and indicates that the remaining data contained within the file requires decoding. However, in some cases, a file downloads and decodes a fully-obfuscated file, making it necessary to correlate the two to identify and decode the obfuscated data. Some detection tools can identify and reverse obfuscation on files crossing the network boundary.
For file obfuscation, detection can be performed based upon the actions taken when the file is “dropped” onto the disk. After the file is deobfuscated by the malware itself, the malware will use file handles to create a usable file on disk. Scanning for these newly created files and matching them against signatures can help with detection of the malware.
Protecting against obfuscated data
Data obfuscation is designed to defeat traditional detection mechanisms based upon matching data to known malware signatures. However, the malware that uses the obfuscated data also has to have the capability to deobfuscate the data before using it. If the obfuscated data can be detected, along with any associated keys, it is still possible to view and analyze the hidden data or code.