YARA: Simple and effective way of dissecting malware
In this article, we will learn about the YARA tool, which gives a very simple and highly effective way of identifying and classifying malware. We all know that Reverse Engineering is the highly recommended method for performing a complete post-mortem of malicious files, but it is very expensive. Reverse Engineering involves the analysis of malicious files in depth which involves time and cost. It is considered good practice to observe malware behavior, group them in a signature, and then identify the related infected files. To cut the costs of R.E and identifying malware families based on signatures,q we can use an open source tool known as YARA.
YARA
YARA is a popular tool that provides a robust language, which is compatible with Perl-based Regular Expressions, and is used to examine the suspected files/directories and match strings as is defined in the YARA rules with the file.
Syntax of YARA rules
At its most basic, the following is the syntax of a YARA rule set:
rule RuleName
{
strings:
$test_string1= ”Testing”
$test_string2= {E1 D2 C3 B4}
Conditions:
$test_string1 or $test_string2
}
# RuleName is the identifier of the rule. Identifiers must follow the same lexical conventions of the C programming language, they can contain any alphanumeric character and the underscore character, but the first character cannot be a digit. Rule identifiers are case sensitive and cannot exceed 128 characters.
As you can see, the main body of the YARA rules contains two sections:
Strings
This section contains the strings/pattern/signature that we need to match against a file. The strings section is optional and can be left out if necessary. In YARA there are 3 types of strings named as follows:
- Hexadecimal Strings: Hexadecimal Strings will match hexadecimal characters in the output file. This allows three special instructions such as wildcard, jumps, and alternatives.
Wildcard: This is represented by a ‘?’ and it indicates that some bytes in the pattern is unknown and should match anything. For example:$hex_example = {B1 B2 ? ? B8}
Jumps: In circumstances when weknow the values of the pattern but their length varies then we can use jump. For example:$jump_example= {F1 F2 [2-3] 24}; this indicates that any arbitrary sequence from 2 bytes to 3 bytes can occupy the sequence.
- Text Strings: Text strings are in form of ASCII text which is then matched up with the condition set. This section also contains further types:
- Case Sensitive Strings: Example = $text_case_example=”test”
- Case Insensitive Strings: Example= $text_nocase_example=”test” nocase
- Wide Character Strings: Example= $text_wide_example=”test” wide
- Regular Expressions: Starting from v2.0 YARA has its own regular expression engine, which mostly resembles PCRE. YARA regular expression can be followed by any of the text strings mentioned above.
- Conditions: Conditions sets evaluate Boolean expressions. For example, in the main example above, it evaluates either of $test_string1 or $test_string2 to be true. Under the Condition set, we can:
- Count the string presence: #test_string1=2 and #test_string2<10
- String Offsets: This used to find out that if a particular string is available at a specified offset of the running process. This is further achieved by following keywords:
- at: $test_string1 at 200 and $test_string2 at 500; this will find whether the test_string1 is located at offset 200 of the running process and test_string2 at offset 500 of the running process
- in: this is used when we define to define a range of memory location we need to search the string. For example $test_string1 in (100 .. 200) will find the test_string1 in the memory location between 100 to 200 of the running process.
- Check file size: Example= filesize >10000
- Set of strings: Example= 2 of ($test_string1, $test_string2, $test_string3); this will say at least two of the string enclosed must match with the file.
Use cases
YARA has many use cases. Below section will highlight some of the important and famous ones
YARA with ClamAv rules
YARA can be integrated with ClamAv rule database. Perform the below steps to integrate ClamAv rules with YARA:
- Download the ClamAV to YARA Python script here: https://code.google.com/p/malwarecookbook/source/browse/trunk/3/3/clamav_to_yara.py
- Download and unpack the ClamAV db: http://database.clamav.net/main.cvd
- Run the ClamAV to YARA Python script:
- python clamav_to_yara.py –f main.cvd –o testing_clamav.yara
- Now test the converted rules with YARA like below:
- yara –r testing_clamav.yara /directory/to/check
- YARA with PEiD
YARA can also be integrated with PEiD to check what packer was used to compile the malicious/suspected executable. To integrate the PEiD with YARA, perform the following steps:
- Download the PEiD to YARA script from here:
https://code.google.com/p/malwarecookbook/source/browse/trunk/3/4/peid_to_yara.py
- Download the PEiD signatures directly from
- http://research.pandasecurity.com/blogs/images/userdb.txt
- Run the script:
- Python peid_to_yara.py –f useddb.txt –o testing_peid.yara
- Run the YARA rule over the malicious executable:
- yara testing_peid.yara malware_testing.exe; this will tell you that with which encoder it is packed with.
- YARA with PE
Starting with version 3.0, YARA can parse Portable Executable (PE) files For example the following rule will parse the PE file and look for import section of PE along with the string:
Import “PE”
Rule PE_Parse_Check
{
Strings:
$ string_pe=”abc” nocase
Condition:
pe.imports (“Kernel32.dll”,”CreateProcess”) and
pe.imports (“wininet.dll”,”httpsendrequest”) and
$ string_pe
}
This rule, PE_Parse_Check, will check for the string “abc” and match it with PE import statements looking out for a “process creation” and “http send request”.
YARA with WMI
Earlier versions of YARA had the ability to match only a single process in which we have to give the processID which means that only one process can be checked at one time. To overcome this limitation, YARA has introduced support for WMI with help of which all running process can be scanned in one go. For example:
import os
import sys
import wmi
import yara
process=wmi.WMI() # This is used to load all the attributes of WMI in a variable known as process
test_pid = os.getpid() # this is used to get the PID.
rules = yara.compile(filepath="File_Path") # This is used to compile YAR rules in java with the location of rules given in File_Path
for process in process.Win32_Process(): # For loop to iterate over all the running process
process_id = process.ProcessId
process_name = process.Name
process_path = process.ExecutablePath
if process_id != test_pid:
try:
rule_match = rules.match(pid=process_id) # this is to check for rules matching from the file_path and running processes.
except:
print('Error Matching for PID: %d' % process_id)
continue
if rule_match:
print("%s matched %s [PID:%s]" % (rule_match, process_path, process_id))
else:
print("No matches for %s [PID:%s]" % (process_name, process_id))
This can be easily modified to detect for rogue Process ID detection as well.
Limitations of YARA
While reading this, most of you might have felt that considering how sophisticated malware has become these days, YARA detection can be easily bypassed since YARA only does pattern/string/signature matching where a more effective method of detecting malware is available, i.e. behavior analysis. I totally agree that YARA has this limitation but with all other features, ad support YARA is necessary tool for analyzing malicious files.
Become a certified reverse engineer!
Conclusion
In this article, we have learned about YARA, its structure, and famous use cases like YARA with WMI, PE, PEiD and ClamAV, as well as some limitations.