Malware analysis

Reversing Malware Loaders - The Matsnu-A Case

Kyriakos Economou
August 7, 2012 by
Kyriakos Economou

The AV industry is growing every day along with the underground industry that produces all types of malware from simple file infectors to more sophisticated Trojan types that are able to gather and send sensitive information to the bad guys.

The fight between AV companies and malware authors is getting bigger and bigger every single day. Both good and bad guys dedicate a lot of time to researching and implementing ways to detect and ways to avoid detection ( depending on which side these people are).

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Most of the malware research is usually concentrated on the infection mechanisms of the malware. Other points of focus include techniques used for the malware to communicate with its creator and completely surpassing the anti-virus evasion techniques used by the malware in the first place.

This article aims to dig inside the loader used by the Matsnu malware family in order to deploy itself and avoid detection by AV products. Fortunately, at this point the variant is already detected by most AV vendors.

In my job as a malware analyst, I very often hear this kind of AV evasion technique described as a "packer". In a very abstract way, this might be true, but in a technical way, it really isn't.

From my experience with packers and manual unpacking, I expect that a packer will incorporate some compression algorithm and most probably an encryption algorithm (custom or not). Furthermore, the behaviour of a packer is usually a lot different. A packer will usually decompress and decrypt the code of the original executable and then will jump to its original entry point (OEP).

On the other hand, I prefer calling these "packers" used by more and more malware authors as loaders. This is because of the technical details. These loaders will usually launch a child process in suspended mode, overwrite its memory with the decrypted code of the malware, and then resume its main thread. Some of them might then choose to allocate some extra memory on the child process instead of overwriting its memory and insert there the decrypted viral code. Additionally, this loader might then inject a thread to the child process with the starting address at the beginning of the allocated memory where the viral code is placed. Some others might overwrite themselves through a code stub written into an extra chunk of allocated memory and then jump back to the PE image address space.

In addition, very often the malware authors will choose to first compress the original viral code using a common packer (such as UPX, PECompact etc.) and then encrypt it and incorporate it inside the loader.

From a technical point of view, it is quite fair to distinguish these two types of mechanisms, and even if we keep calling them all "packers" for simplicity, it is necessary to understand the differences between them.

The final goal of this article is to manage to isolate a fully working executable of the original malware under the various anti-AV protection layers.

Self-Decryption Stage I

A big part of the code of the loader will be decrypted on run-time through a "slow" decryption algorithm which does a lot of operations in each loop, decrypting the code dword by dword.

The outer loop:

00401752 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10]

00401755 83C1 01 ADD ECX, 1

00401758 894D F0 MOV DWORD PTR SS:[EBP-10], ECX

0040175B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter

00401762 7D 5E JGE SHORT 004017C2 ß exit the loop once finished

…more code here

0040178F E8 D7040000 CALL 00401C6B ß call to the decryption routine

…more code here

004017C0 EB 90 JMP SHORT 00401752 ß jump up to loop start

Inside the Decryption Routine:

Some additional loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX register to the memory location pointed by EAX register:

00401ED8 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584, it is incremented by a dword in each iteration.

Self-Decryption Stage II

When the outer loop mentioned above has finished, there is another one taking place a few instructions later.

004017DE 8B4D E0 MOV ECX, DWORD PTR SS:[EBP-20]

004017E1 83C1 05 ADD ECX, 5

004017E4 894D E0 MOV DWORD PTR SS:[EBP-20], ECX

004017E7 817D E0 DF0C0 CMP DWORD PTR SS:[EBP-20], 0CDF ß check counter

004017EE 7D 77 JGE SHORT 00401867 ß exit the loop

…more code here

00401862 E9 77FFFFFF JMP 004017DE ß jump up to loop start

Self-Decryption Stage III

There is one more loop coming next during the self-decryption stage.

0040187E BA 01000000 MOV EDX, 1

00401883 85D2 TEST EDX, EDX

00401885 0F84 D2000000 JE 0040195D

The three instructions above create a fake execution flow redirection. In fact, since the value 1 is always passed to the EDX register, after performing the TEST instruction on the same register, the conditional JE jump that follows will never have any effect on the execution flow.

0040188B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter

00401892 0F85 A1000000 JNZ 00401939 ß if not equal jump to increase_counter

Some more code is presented below:

increase_counter:

00401939 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10]

0040193C 83C1 01 ADD ECX, 1

0040193F 894D F0 MOV DWORD PTR SS:[EBP-10], ECX

enter_next_decryprion_routine:

00401942 68 F7480700 PUSH 748F7

00401947 68 18194F00 PUSH 4F1918

0040194C 8B55 F4 MOV EDX, DWORD PTR SS:[EBP-C]

0040194F 52 PUSH

00401950 E8 4A000000 CALL 0040199F ß call decryption routine

00401955 83C4 0C ADD ESP, 0C

00401958 E9 21FFFFFF JMP 0040187E ß jump to loop start

Inside the Decryption Routine:

Some more loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX and registered to the memory location pointed by EAX register:

00401B70 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584. It is incremented by a dword in each iteration.

Self-Decryption Stage IV

Going back to the loop outside the decryption function, we saw the condition which would normally signal the end of the looping process. It is fake, and we need to examine it more carefully in order to locate the next step.

Indeed, when the conditions are correct, the execution will reach a CALL instruction:

0040191D E8 8FF8FFFF CALL 004011B1

The CALL to the beginning of the previously encrypted code is located inside this function :

004013B6 FF15 108B4000 CALL NEAR DWORD PTR DS:[408B10] ß value stored in this address is 00408584

Once we enter the function at address 00408584 we see the following:

00408584 E8 07000000 CALL 00408590

00408589 75 3A JNZ SHORT 004085C5

Note the obfuscation trick in the first instruction that confuses the disassembling engine. In fact, the CALL instruction will bring the execution in the end of the instruction starting at address 0040858B, which means that all those bytes in between are junk bytes in this case.

0040858B 03A0 21D64F5B ADD ESP, DWORD PTR DS:[EAX+5B4FD621]

00408591 81EB 05103A00 SUB EBX, 3A1005

00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E]

0040859D B9 8B020000 MOV ECX, 28B

004085A2 66BF 7592 MOV DI, 9275

004085A6 66313E XOR WORD PTR DS:[ESI], DI

004085A9 6683C7 02 ADD DI, 2

004085AD 83C6 02 ADD ESI, 2

004085B0 E2 F4 LOOPD SHORT 004085A6

004085B2 FC CLD

004085B3 7E 2A JLE SHORT 004085DF

004085B5 1B95 CFF6215C SBB EDX, DWORD PTR SS:[EBP+5C21F6CF]

004085BB
8745
92 XCHG DWORD PTR SS:[EBP-6E], EAX

004085BE D7 XLAT BYTE PTR DS:[EBX+AL]

004085BF
1F POP DS

004085C0 30D5 XOR CH, DL

004085C2 94 XCHG EAX, ESP

This is what we see once we execute the CALL instruction:

00408590 5B POP EBX

00408591 81EB 05103A00 SUB EBX, 3A1005

00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E] ß starts from address 004085B2

0040859D B9 8B020000 MOV ECX, 28B ß loop counter

004085A2 66BF 7592 MOV DI, 9275 ß decryption key

004085A6 66313E XOR WORD PTR DS:[ESI], DI ß decrypt by XORing with 9275, one word in each iteration.

004085A9 6683C7 02 ADD DI, 2

004085AD 83C6 02 ADD ESI, 2

004085B0 E2 F4 LOOPD SHORT 004085A6

The above decryption algorithm will decrypt an extra portion of code starting from the instruction located immediately after the LOOPD.

So, at this point we saw the various steps used by this loader to decrypt the next parts of the code. Now it's time to continue with the rest of its mechanisms.

Dynamic Imports Resolving & PEB Loader Data Structure

Normally, malware authors retrieve the VAs of the APIs by using two Windows APIs, which are the LoadLibrary and the GetProcAddress APIs. These are employed in order to avoid detection through the imports normally listed inside the imports table. However, in this case the author of the loader has decided to go through the PEB (Process Environment Block) Loader Data Structure - PEB_LDR_DATA structure in order to retrieve the necessary information, which is a more stealth way to retrieve the VAs of the necessary APIs.

The pointer to this structure is located at PEB + 0x0C.

Back to where we stopped, immediately after the end of the decryption loop we locate a CALL at address 004085CD and by entering this function we see another CALL at address 004086EF, and inside that function is where the loader of the malware will access the PEB_LDR_DATA structure.

0040870E 64FF35 3000000 PUSH DWORD PTR FS:[30]

00408715 58 POP EAX

In the two instructions above, we notice another obfuscation attempt. In fact, instead of pushing the address of PEB onto the stack and then popping that value back to EAX, we could just do MOV EAX, DWORD PTR FS:[30].

00408716 8B40 0C MOV EAX, DWORD PTR DS:[EAX+C] ß move to EAX the pointer to the PEB_LDR_DATA

00408719 8B48 0C MOV ECX, DWORD PTR DS:[EAX+C] ß mov to ECX the pointer to the first LDR_MODULE structure of the first module loaded by the windows loader

0040871C 8B11 MOV EDX, DWORD PTR DS:[ECX] ß save to EDX the pointer to the LDR_MODULE structure of the next module loaded by the windows loader

0040871E 8B41 30 MOV EAX, DWORD PTR DS:[ECX+30] ß mov to EAX the pointer to the name of the first module name loaded by the windows loader.

Then follow another CALL at address 00408728, to a function dedicated to calculate a magic dword from the name of the currently examined module. If the dword matches the predefined constant, then the loader knows it found the necessary loaded module to continue its mechanisms.

Calculation Algorithm:

00408797 8A10 MOV DL, BYTE PTR DS:[EAX] ß go through all chars one by one

00408799 80CA 60 OR DL, 60 ß start dword calculation

0040879C 01D3 ADD EBX, EDX

0040879E D1E3 SHL EBX, 1 ß end dword calculation

004087A0 0345 10 ADD EAX, DWORD PTR SS:[EBP+10] ß increase pointer to string name by 2, because it's stored as Unicode

004087A3 8A08 MOV CL, BYTE PTR DS:[EAX] ß mov next char value to CL

004087A5 84C9 TEST CL, CL ß check if it's zero, which means we reached the end of the string

004087A7 E0 EE LOOPDNE SHORT 00408797 ß if it's not jump up to loop for the next char

004087A9 31C0 XOR EAX, EAX ß zero out EAX

004087AB 8B4D 0C MOV ECX, DWORD PTR SS:[EBP+C] ß move to ECX magic dword

004087AE 39CB CMP EBX, ECX ß check if calculated dword = magic dword

004087B0 74 01 JE SHORT 004087B3 ß if it is, module located

004087B2 40 INC EAX

004087B3 5A POP EDX

004087B4 5B POP EBX

004087B5 59 POP ECX

004087B6 89EC MOV ESP, EBP

004087B8 5D POP EBP

004087B9 C2 0C00 RET 0C

The figure that follows demonstrates the condition in which the two values matchwhen checking the kernel32.dll loaded module.

Figure 1 - Kernel32.dll module located

Once the necessary module is located, we will reach the next part of the code that will attempt to find the VAs of specific exported functions from the kernel32.dll after exiting from the previous function.

00408735 8B41 18 MOV EAX, DWORD PTR DS:[ECX+18] ß get the image base of kernel32.dll from LDR_MODULE structure

00408738 50 PUSH EAX

00408739 8B58 3C MOV EBX, DWORD PTR DS:[EAX+3C] ß get the offset of its PE Header

0040873C 01D8 ADD EAX, EBX

0040873E 8B58 78 MOV EBX, DWORD PTR DS:[EAX+78] ß get the RVA of its Export Table

Once the loader of the malware locates the export table of the kernerl32.dll will use it in order to retrieve the VAs of few APIs, four in total, necessary to proceed.

Here is the table that is created at this stage:

00408AA5 760CBC8B kernel32.LoadLibraryExA

00408AA9 760D05F4 kernel32.VirtualAlloc

00408AAD 760C50AB kernel32.VirtualProtect

00408AB1 760D1837 kernel32.GetProcAddress

In the next instalment, I will begin by showing how to Locate and Isolate the Embedded Decrypted Executable.

Have fun!

Kyriakos Economou
Kyriakos Economou

Kyriakos has a BSc in Computer Science and an MSc in Information Security. Deeply interested in the analysis and the reverse engineering of commercial/custom packers and software protections for the last 6 years, he was a speaker in Athcon 2011 and also the author of the Reversing Challenges for 2011 and 2012 for Athcon conference.

Except from taking them apart, he also has a deep interest in studying anti-reversing tricks and creating his own custom tricks that go from simple debugger detection to custom obfuscation and other anti-analysis mechanisms for the sake of imagination and creativity. More recently, he has moved into the area of malware research/analysis at Sophos as a Thread Researcher. Kyriakos spends some of his free time developing his own reversing tools, as well as experimenting with C++ and Asm.