Reversing Malware Loaders - The Matsnu-A Case
The AV industry is growing every day along with the underground industry that produces all types of malware from simple file infectors to more sophisticated Trojan types that are able to gather and send sensitive information to the bad guys.
The fight between AV companies and malware authors is getting bigger and bigger every single day. Both good and bad guys dedicate a lot of time to researching and implementing ways to detect and ways to avoid detection ( depending on which side these people are).
Become a certified reverse engineer!
Become a certified reverse engineer!
Most of the malware research is usually concentrated on the infection mechanisms of the malware. Other points of focus include techniques used for the malware to communicate with its creator and completely surpassing the anti-virus evasion techniques used by the malware in the first place.
This article aims to dig inside the loader used by the Matsnu malware family in order to deploy itself and avoid detection by AV products. Fortunately, at this point the variant is already detected by most AV vendors.
In my job as a malware analyst, I very often hear this kind of AV evasion technique described as a "packer". In a very abstract way, this might be true, but in a technical way, it really isn't.
From my experience with packers and manual unpacking, I expect that a packer will incorporate some compression algorithm and most probably an encryption algorithm (custom or not). Furthermore, the behaviour of a packer is usually a lot different. A packer will usually decompress and decrypt the code of the original executable and then will jump to its original entry point (OEP).
On the other hand, I prefer calling these "packers" used by more and more malware authors as loaders. This is because of the technical details. These loaders will usually launch a child process in suspended mode, overwrite its memory with the decrypted code of the malware, and then resume its main thread. Some of them might then choose to allocate some extra memory on the child process instead of overwriting its memory and insert there the decrypted viral code. Additionally, this loader might then inject a thread to the child process with the starting address at the beginning of the allocated memory where the viral code is placed. Some others might overwrite themselves through a code stub written into an extra chunk of allocated memory and then jump back to the PE image address space.
In addition, very often the malware authors will choose to first compress the original viral code using a common packer (such as UPX, PECompact etc.) and then encrypt it and incorporate it inside the loader.
From a technical point of view, it is quite fair to distinguish these two types of mechanisms, and even if we keep calling them all "packers" for simplicity, it is necessary to understand the differences between them.
The final goal of this article is to manage to isolate a fully working executable of the original malware under the various anti-AV protection layers.
Self-Decryption Stage I
A big part of the code of the loader will be decrypted on run-time through a "slow" decryption algorithm which does a lot of operations in each loop, decrypting the code dword by dword.
The outer loop:
00401752 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10]
00401755 83C1 01 ADD ECX, 1
00401758 894D F0 MOV DWORD PTR SS:[EBP-10], ECX
0040175B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter
00401762 7D 5E JGE SHORT 004017C2 ß exit the loop once finished
…more code here
0040178F E8 D7040000 CALL 00401C6B ß call to the decryption routine
…more code here
004017C0 EB 90 JMP SHORT 00401752 ß jump up to loop start
Inside the Decryption Routine:
Some additional loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX register to the memory location pointed by EAX register:
00401ED8 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584, it is incremented by a dword in each iteration.
Self-Decryption Stage II
When the outer loop mentioned above has finished, there is another one taking place a few instructions later.
004017DE 8B4D E0 MOV ECX, DWORD PTR SS:[EBP-20]
004017E1 83C1 05 ADD ECX, 5
004017E4 894D E0 MOV DWORD PTR SS:[EBP-20], ECX
004017E7 817D E0 DF0C0 CMP DWORD PTR SS:[EBP-20], 0CDF ß check counter
004017EE 7D 77 JGE SHORT 00401867 ß exit the loop
…more code here
00401862 E9 77FFFFFF JMP 004017DE ß jump up to loop start
Self-Decryption Stage III
There is one more loop coming next during the self-decryption stage.
0040187E BA 01000000 MOV EDX, 1
00401883 85D2 TEST EDX, EDX
00401885 0F84 D2000000 JE 0040195D
The three instructions above create a fake execution flow redirection. In fact, since the value 1 is always passed to the EDX register, after performing the TEST instruction on the same register, the conditional JE jump that follows will never have any effect on the execution flow.
0040188B 817D F0 688E0 CMP DWORD PTR SS:[EBP-10], 28E68 ß check counter
00401892 0F85 A1000000 JNZ 00401939 ß if not equal jump to increase_counter
Some more code is presented below:
increase_counter:
00401939 8B4D F0 MOV ECX, DWORD PTR SS:[EBP-10]
0040193C 83C1 01 ADD ECX, 1
0040193F 894D F0 MOV DWORD PTR SS:[EBP-10], ECX
enter_next_decryprion_routine:
00401942 68 F7480700 PUSH 748F7
00401947 68 18194F00 PUSH 4F1918
0040194C 8B55 F4 MOV EDX, DWORD PTR SS:[EBP-C]
0040194F 52 PUSH
00401950 E8 4A000000 CALL 0040199F ß call decryption routine
00401955 83C4 0C ADD ESP, 0C
00401958 E9 21FFFFFF JMP 0040187E ß jump to loop start
Inside the Decryption Routine:
Some more loops are taking place here, but the important instruction is the one that actually writes every time the result is a dword stored in ECX and registered to the memory location pointed by EAX register:
00401B70 8908 MOV DWORD PTR DS:[EAX], ECX ß Initial value in EAX is 00408584. It is incremented by a dword in each iteration.
Self-Decryption Stage IV
Going back to the loop outside the decryption function, we saw the condition which would normally signal the end of the looping process. It is fake, and we need to examine it more carefully in order to locate the next step.
Indeed, when the conditions are correct, the execution will reach a CALL instruction:
0040191D E8 8FF8FFFF CALL 004011B1
The CALL to the beginning of the previously encrypted code is located inside this function :
004013B6 FF15 108B4000 CALL NEAR DWORD PTR DS:[408B10] ß value stored in this address is 00408584
Once we enter the function at address 00408584 we see the following:
00408584 E8 07000000 CALL 00408590
00408589 75 3A JNZ SHORT 004085C5
Note the obfuscation trick in the first instruction that confuses the disassembling engine. In fact, the CALL instruction will bring the execution in the end of the instruction starting at address 0040858B, which means that all those bytes in between are junk bytes in this case.
0040858B 03A0 21D64F5B ADD ESP, DWORD PTR DS:[EAX+5B4FD621]
00408591 81EB 05103A00 SUB EBX, 3A1005
00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E]
0040859D B9 8B020000 MOV ECX, 28B
004085A2 66BF 7592 MOV DI, 9275
004085A6 66313E XOR WORD PTR DS:[ESI], DI
004085A9 6683C7 02 ADD DI, 2
004085AD 83C6 02 ADD ESI, 2
004085B0 E2 F4 LOOPD SHORT 004085A6
004085B2 FC CLD
004085B3 7E 2A JLE SHORT 004085DF
004085B5 1B95 CFF6215C SBB EDX, DWORD PTR SS:[EBP+5C21F6CF]
004085BB
8745 92 XCHG DWORD PTR SS:[EBP-6E], EAX
004085BE D7 XLAT BYTE PTR DS:[EBX+AL]
004085BF
1F POP DS
004085C0 30D5 XOR CH, DL
004085C2 94 XCHG EAX, ESP
This is what we see once we execute the CALL instruction:
00408590 5B POP EBX
00408591 81EB 05103A00 SUB EBX, 3A1005
00408597 8DB3 2E103A00 LEA ESI, DWORD PTR DS:[EBX+3A102E] ß starts from address 004085B2
0040859D B9 8B020000 MOV ECX, 28B ß loop counter
004085A2 66BF 7592 MOV DI, 9275 ß decryption key
004085A6 66313E XOR WORD PTR DS:[ESI], DI ß decrypt by XORing with 9275, one word in each iteration.
004085A9 6683C7 02 ADD DI, 2
004085AD 83C6 02 ADD ESI, 2
004085B0 E2 F4 LOOPD SHORT 004085A6
The above decryption algorithm will decrypt an extra portion of code starting from the instruction located immediately after the LOOPD.
So, at this point we saw the various steps used by this loader to decrypt the next parts of the code. Now it's time to continue with the rest of its mechanisms.
Dynamic Imports Resolving & PEB Loader Data Structure
Normally, malware authors retrieve the VAs of the APIs by using two Windows APIs, which are the LoadLibrary and the GetProcAddress APIs. These are employed in order to avoid detection through the imports normally listed inside the imports table. However, in this case the author of the loader has decided to go through the PEB (Process Environment Block) Loader Data Structure - PEB_LDR_DATA structure in order to retrieve the necessary information, which is a more stealth way to retrieve the VAs of the necessary APIs.
The pointer to this structure is located at PEB + 0x0C.
Back to where we stopped, immediately after the end of the decryption loop we locate a CALL at address 004085CD and by entering this function we see another CALL at address 004086EF, and inside that function is where the loader of the malware will access the PEB_LDR_DATA structure.
0040870E 64FF35 3000000 PUSH DWORD PTR FS:[30]
00408715 58 POP EAX
In the two instructions above, we notice another obfuscation attempt. In fact, instead of pushing the address of PEB onto the stack and then popping that value back to EAX, we could just do MOV EAX, DWORD PTR FS:[30].
00408716 8B40 0C MOV EAX, DWORD PTR DS:[EAX+C] ß move to EAX the pointer to the PEB_LDR_DATA
00408719 8B48 0C MOV ECX, DWORD PTR DS:[EAX+C] ß mov to ECX the pointer to the first LDR_MODULE structure of the first module loaded by the windows loader
0040871C 8B11 MOV EDX, DWORD PTR DS:[ECX] ß save to EDX the pointer to the LDR_MODULE structure of the next module loaded by the windows loader
0040871E 8B41 30 MOV EAX, DWORD PTR DS:[ECX+30] ß mov to EAX the pointer to the name of the first module name loaded by the windows loader.
Then follow another CALL at address 00408728, to a function dedicated to calculate a magic dword from the name of the currently examined module. If the dword matches the predefined constant, then the loader knows it found the necessary loaded module to continue its mechanisms.
Calculation Algorithm:
00408797 8A10 MOV DL, BYTE PTR DS:[EAX] ß go through all chars one by one
00408799 80CA 60 OR DL, 60 ß start dword calculation
0040879C 01D3 ADD EBX, EDX
0040879E D1E3 SHL EBX, 1 ß end dword calculation
004087A0 0345 10 ADD EAX, DWORD PTR SS:[EBP+10] ß increase pointer to string name by 2, because it's stored as Unicode
004087A3 8A08 MOV CL, BYTE PTR DS:[EAX] ß mov next char value to CL
004087A5 84C9 TEST CL, CL ß check if it's zero, which means we reached the end of the string
004087A7 E0 EE LOOPDNE SHORT 00408797 ß if it's not jump up to loop for the next char
004087A9 31C0 XOR EAX, EAX ß zero out EAX
004087AB 8B4D 0C MOV ECX, DWORD PTR SS:[EBP+C] ß move to ECX magic dword
004087AE 39CB CMP EBX, ECX ß check if calculated dword = magic dword
004087B0 74 01 JE SHORT 004087B3 ß if it is, module located
004087B2 40 INC EAX
004087B3 5A POP EDX
004087B4 5B POP EBX
004087B5 59 POP ECX
004087B6 89EC MOV ESP, EBP
004087B8 5D POP EBP
004087B9 C2 0C00 RET 0C
The figure that follows demonstrates the condition in which the two values matchwhen checking the kernel32.dll loaded module.
Figure 1 - Kernel32.dll module located
Once the necessary module is located, we will reach the next part of the code that will attempt to find the VAs of specific exported functions from the kernel32.dll after exiting from the previous function.
00408735 8B41 18 MOV EAX, DWORD PTR DS:[ECX+18] ß get the image base of kernel32.dll from LDR_MODULE structure
00408738 50 PUSH EAX
00408739 8B58 3C MOV EBX, DWORD PTR DS:[EAX+3C] ß get the offset of its PE Header
0040873C 01D8 ADD EAX, EBX
0040873E 8B58 78 MOV EBX, DWORD PTR DS:[EAX+78] ß get the RVA of its Export Table
Once the loader of the malware locates the export table of the kernerl32.dll will use it in order to retrieve the VAs of few APIs, four in total, necessary to proceed.
Here is the table that is created at this stage:
00408AA5 760CBC8B kernel32.LoadLibraryExA
00408AA9 760D05F4 kernel32.VirtualAlloc
00408AAD 760C50AB kernel32.VirtualProtect
00408AB1 760D1837 kernel32.GetProcAddress
In the next instalment, I will begin by showing how to Locate and Isolate the Embedded Decrypted Executable.
Have fun!