Malware analysis

YARA: Simple and effective way of dissecting malware

Security Ninja
March 13, 2018 by
Security Ninja

In this article, we will learn about the YARA tool, which gives a very simple and highly effective way of identifying and classifying malware. We all know that Reverse Engineering is the highly recommended method for performing a complete post-mortem of malicious files, but it is very expensive. Reverse Engineering involves the analysis of malicious files in depth which involves time and cost. It is considered good practice to observe malware behavior, group them in a signature, and then identify the related infected files. To cut the costs of R.E and identifying malware families based on signatures,q we can use an open source tool known as YARA.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.


YARA is a popular tool that provides a robust language, which is compatible with Perl-based Regular Expressions, and is used to examine the suspected files/directories and match strings as is defined in the YARA rules with the file.

Syntax of YARA rules

At its most basic, the following is the syntax of a YARA rule set:

rule RuleName  



              $test_string1= ”Testing”

              $test_string2= {E1 D2 C3 B4}


              $test_string1 or $test_string2


# RuleName is the identifier of the rule. Identifiers must follow the same lexical conventions of the C programming language, they can contain any alphanumeric character and the underscore character, but the first character cannot be a digit. Rule identifiers are case sensitive and cannot exceed 128 characters.

As you can see, the main body of the YARA rules contains two sections:


This section contains the strings/pattern/signature that we need to match against a file. The strings section is optional and can be left out if necessary. In YARA there are 3 types of strings named as follows:

  • Hexadecimal Strings: Hexadecimal Strings will match hexadecimal characters in the output file. This allows three special instructions such as wildcard, jumps, and alternatives.

Wildcard: This is represented by a ‘?’ and it indicates that some bytes in the pattern is unknown and should match anything. For example:$hex_example = {B1 B2 ? ? B8}

Jumps: In circumstances when weknow the values of the pattern but their length varies then we can use jump. For example:$jump_example= {F1 F2 [2-3] 24}; this indicates that any arbitrary sequence from 2 bytes to 3 bytes can occupy the sequence.

  • Text Strings: Text strings are in form of ASCII text which is then matched up with the condition set. This section also contains further types:
  • Case Sensitive Strings: Example = $text_case_example=”test”
  • Case Insensitive Strings: Example= $text_nocase_example=”test” nocase
  • Wide Character Strings: Example= $text_wide_example=”test” wide


  • Regular Expressions: Starting from v2.0 YARA has its own regular expression engine, which mostly resembles PCRE. YARA regular expression can be followed by any of the text strings mentioned above.


  • Conditions: Conditions sets evaluate Boolean expressions. For example, in the main example above, it evaluates either of $test_string1 or $test_string2 to be true. Under the Condition set, we can:
    • Count the string presence: #test_string1=2 and #test_string2<10
    • String Offsets: This used to find out that if a particular string is available at a specified offset of the running process. This is further achieved by following keywords:
      • at: $test_string1 at 200 and $test_string2 at 500; this will find whether the test_string1 is located at offset 200 of the running process and test_string2 at offset 500 of the running process
      • in: this is used when we define to define a range of memory location we need to search the string. For example $test_string1 in (100 .. 200) will find the test_string1 in the memory location between 100 to 200 of the running process.
    • Check file size: Example= filesize >10000
    • Set of strings: Example= 2 of ($test_string1, $test_string2, $test_string3); this will say at least two of the string enclosed must match with the file.


Use cases

YARA has many use cases. Below section will highlight some of the important and famous ones

YARA with ClamAv rules

YARA can be integrated with ClamAv rule database. Perform the below steps to integrate ClamAv rules with YARA:

YARA can also be integrated with PEiD to check what packer was used to compile the malicious/suspected executable. To integrate the PEiD with YARA, perform the following steps:

  • Download the PEiD to YARA script from here:

  • Download the PEiD signatures directly from
  • Run the script:
  • Python –f useddb.txt –o testing_peid.yara
  • Run the YARA rule over the malicious executable:
  • yara testing_peid.yara malware_testing.exe; this will tell you that with which encoder it is packed with.
  • YARA with PE

Starting with version 3.0, YARA can parse Portable Executable (PE) files For example the following rule will parse the PE file and look for import section of PE along with the string:

Import “PE”

Rule PE_Parse_Check



$ string_pe=”abc” nocase


pe.imports (“Kernel32.dll”,”CreateProcess”) and

pe.imports (“wininet.dll”,”httpsendrequest”) and

$ string_pe


This rule, PE_Parse_Check, will check for the string “abc” and match it with PE import statements looking out for a “process creation” and “http send request”.


Earlier versions of YARA had the ability to match only a single process in which we have to give the processID which means that only one process can be checked at one time. To overcome this limitation, YARA has introduced support for WMI with help of which all running process can be scanned in one go. For example:

import os

import sys

import wmi

import yara


process=wmi.WMI()  # This is used to load all the attributes of WMI in a variable known as process

test_pid = os.getpid() # this is used to get the PID.

rules = yara.compile(filepath="File_Path") # This is used to compile YAR rules in java with the location of rules given in File_Path


for process in process.Win32_Process(): # For loop to iterate over all the running process

 process_id = process.ProcessId

 process_name = process.Name

 process_path = process.ExecutablePath


 if process_id != test_pid:


              rule_match = rules.match(pid=process_id) # this is to check for rules matching from the file_path and running processes.


              print('Error Matching for PID: %d' % process_id)


 if rule_match:

              print("%s matched %s [PID:%s]" % (rule_match, process_path, process_id))


              print("No matches for %s [PID:%s]" % (process_name, process_id))

This can be easily modified to detect for rogue Process ID detection as well.

Limitations of YARA

While reading this, most of you might have felt that considering how sophisticated malware has become these days, YARA detection can be easily bypassed since YARA only does pattern/string/signature matching where a more effective method of detecting malware is available, i.e. behavior analysis. I totally agree that YARA has this limitation but with all other features, ad support YARA is necessary tool for analyzing malicious files.


In this article, we have learned about YARA, its structure, and famous use cases like YARA with WMI, PE, PEiD and ClamAV, as well as some limitations.

Security Ninja
Security Ninja