Reverse engineering

The Import Directory: Part 1

Dejan Lukan
April 24, 2013 by
Dejan Lukan

We know that when the operating system loads the executable, it will scan through its IAT table to locate the DLLs and functions the executable is using. This is done because the OS must map the required DLLs into the executable's address space.

To be more precise, IAT is the table that contains references between the function names and their virtual addresses, which are exported from different loaded modules. Each executable or DLL library contains a PE header, which has all the information that the executable needs for the operating system to start successfully, including the IAT table. To understand where the IAT table is located, we must first talk about the PE header.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Now we're ready to explore the actual IAT of the process. Let's first present the program we'll be using to do that:

[cpp]

#include "stdafx.h"

#include <Windows.h>

int _tmain(int argc, _TCHAR* argv[]) {

HANDLE hFile = CreateFile(L"C:temp.txt", GENERIC_WRITE, 0, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);

if(hFile == INVALID_HANDLE_VALUE) {

printf("Unable to open file.");

}

else {

printf("File successfully opened/created.");

}

CloseHandle(hFile);

getchar();

return 0;

}

[/cpp]

When we compile the program, another createfilee.exe executable will be created. We can start the createfilee.exe program and let it run. It will stop the execution on the getchar() function call, which will wait until we press certain keystroke. After that, start the WinDbg debugger and attach it to the process like this:

Now we'll use the !dh command to print the PE header elements that we need. Let's first print all the options of the !dh command, which we can see below. If we pass the -a parameter to the !dh command, we'll be printing everything to the console window. If we use the -f parameter, we'll print only the file headers and with -s we'll print only the section headers.

In our case, we'll use the -f parameter because we need to dump the file headers. The output below was generated by the "!dh 00400000 -f" command:

[plain]
0:002> !dh 00400000 -f

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES

14C machine (i386)

7 number of sections

515EBA3E time date stamp Fri Apr 05 13:49:18 2013

0 file pointer to symbol table

0 number of symbols

E0 size of optional header

102 characteristics

Executable

32 bit word machine

OPTIONAL HEADER VALUES

10B magic #

10.00 linker version

3800 size of code

3A00 size of initialized data

0 size of uninitialized data

11078 address of entry point

1000 base of code

----- new -----

00400000 image base

1000 section alignment

200 file alignment

3 subsystem (Windows CUI)

5.01 operating system version

0.00 image version

5.01 subsystem version

1B000 size of image

400 size of headers

0 checksum

00100000 size of stack reserve

00001000 size of stack commit

00100000 size of heap reserve

00001000 size of heap commit

8140 DLL characteristics

Dynamic base

NX compatible

Terminal server aware

0 [ 0] address [size] of Export Directory

18000 [ 3C] address [size] of Import Directory

19000 [ 459] address [size] of Resource Directory

0 [ 0] address [size] of Exception Directory

0 [ 0] address [size] of Security Directory

1A000 [ 2EC] address [size] of Base Relocation Directory

15720 [ 1C] address [size] of Debug Directory

0 [ 0] address [size] of Description Directory

0 [ 0] address [size] of Special Directory

0 [ 0] address [size] of Thread Storage Directory

0 [ 0] address [size] of Load Configuration Directory

0 [ 0] address [size] of Bound Import Directory

181BC [ 180] address [size] of Import Address Table Directory

0 [ 0] address [size] of Delay Import Directory

0 [ 0] address [size] of COR20 Header Directory

0 [ 0] address [size] of Reserved Directory

[/plain]

At the bottom of the output we can see the data directories that we're after. The data directory that we want to read is the IAT directory, which is located at the 0x181BC RVA address and is 180 bytes in size. Now we know the exact address of the IAT table in memory: if the base address of the executable is 0x00400000 and the RVA of the IAT in the executable is 0x181BC, then the whole address of the IAT table in memory is 0x00400000+0x181BC = 0x004181BC. Also, the size of the IAT is 0x180 bytes and each entry is 4 bytes in size. This is why the whole command should be as follows:

[plain]

> dps 004181bc L180/4

[/plain]

The output of that command can be seen below (the whole table is presented even though it might be rather long), just so we can observe all the entries in that table:

[plain]

0:002> dps 004181bc L180/4

004181bc 7c809be7 kernel32!CloseHandle

004181c0 7c864042 kernel32!UnhandledExceptionFilter

004181c4 7c80de95 kernel32!GetCurrentProcess

004181c8 7c801e1a kernel32!TerminateProcess

004181cc 7c80ac7e kernel32!FreeLibrary

004181d0 7c80e4dd kernel32!GetModuleHandleW

004181d4 7c80ba71 kernel32!VirtualQuery

004181d8 7c80b475 kernel32!GetModuleFileNameW

004181dc 7c80ac61 kernel32!GetProcessHeap

004181e0 7c9100c4 ntdll!RtlAllocateHeap

004181e4 7c90ff2d ntdll!RtlFreeHeap

004181e8 7c8017e9 kernel32!GetSystemTimeAsFileTime

004181ec 7c8099c0 kernel32!GetCurrentProcessId

004181f0 7c8097d0 kernel32!GetCurrentThreadId

004181f4 7c80934a kernel32!GetTickCount

004181f8 7c80a4c7 kernel32!QueryPerformanceCounter

004181fc 7c9132ff ntdll!RtlDecodePointer

00418200 7c8449cd kernel32!SetUnhandledExceptionFilter

00418204 7c80aeeb kernel32!LoadLibraryW

00418208 7c80ae40 kernel32!GetProcAddress

0041820c 7c80be56 kernel32!lstrlenA

00418210 7c812f81 kernel32!RaiseException

00418214 7c809c98 kernel32!MultiByteToWideChar

00418218 7c81f424 kernel32!IsDebuggerPresent

0041821c 7c80a174 kernel32!WideCharToMultiByte

00418220 7c839471 kernel32!HeapSetInformation

00418224 7c809842 kernel32!InterlockedCompareExchange

00418228 7c802446 kernel32!Sleep

0041822c 7c80982e kernel32!InterlockedExchange

00418230 7c9132d9 ntdll!RtlEncodePointer

00418234 7c810cd9 kernel32!CreateFileW

00418238 00000000

0041823c 00000000

00418240 00000000

00418244 00000000

00418248 00000000

0041824c 00000000

00418250 00000000

00418254 00000000

00418258 00000000

0041825c 00000000

00418260 00000000

00418264 00000000

00418268 00000000

0041826c 00000000

00418270 00000000

00418274 00000000

00418278 00000000

0041827c 10322e30 MSVCR100D!_crt_debugger_hook

00418280 10327ce0 MSVCR100D!_wsplitpath_s

00418284 10274390 MSVCR100D!wcscpy_s

00418288 10326190 MSVCR100D!_wmakepath_s

0041828c 10323040 MSVCR100D!_except_handler4_common

00418290 10319d40 MSVCR100D!_onexit

00418294 102496d0 MSVCR100D!_lock

00418298 10319fa0 MSVCR100D!__dllonexit

0041829c 10249720 MSVCR100D!_unlock

004182a0 10316310 MSVCR100D!_invoke_watson

004182a4 103329b0 MSVCR100D!_controlfp_s

004182a8 102fb0c0 MSVCR100D!terminate

004182ac 10248680 MSVCR100D!_initterm_e

004182b0 10248650 MSVCR100D!_initterm

004182b4 103151e0 MSVCR100D!_CrtDbgReportW

004182b8 10319ac0 MSVCR100D!_CrtSetCheckCount

004182bc 10362730 MSVCR100D!__winitenv

004182c0 10248080 MSVCR100D!exit

004182c4 102480c0 MSVCR100D!_cexit

004182c8 1031d090 MSVCR100D!_XcptFilter

004182cc 102480a0 MSVCR100D!_exit

004182d0 10248ce0 MSVCR100D!__wgetmainargs

004182d4 10248100 MSVCR100D!_amsg_exit

004182d8 10245130 MSVCR100D!__set_app_type

004182dc 103635f8 MSVCR100D!_fmode

004182e0 103632fc MSVCR100D!_commode

004182e4 10247580 MSVCR100D!__setusermatherr

004182e8 1031ecd0 MSVCR100D!_configthreadlocale

004182ec 10321270 MSVCR100D!_CRT_RTC_INITW

004182f0 10267ee0 MSVCR100D!printf

004182f4 1025f660 MSVCR100D!getchar

004182f8 00000000

004182fc 00000000

00418300 00000000

00418304 00000000

00418308 00000000

0041830c 00000000

00418310 00000000

00418314 00000000

00418318 00000000

0041831c 00000000

00418320 00000000

00418324 00000000

00418328 00000000

0041832c 00000000

00418330 00000000

00418334 00000000

00418338 00000000

[/plain]

First, we can see a number of entries from the kernel32.dll library and later there are entries from the msvcr100d.dll library. All the entries that we're directly using in our C++ code have been marked in bold font.

We've just figured out the library names used by the executable, and all the function names plus their virtual addresses in memory. Let's also print all the loaded modules with the lmi command. The output of that command can be seen below:

[plain]

0:002> lmi

start end module name

00400000 0041b000 createfilee C (no symbols)

00940000 00949000 Normaliz (export symbols) C:WINDOWSsystem32Normaliz.dll

10200000 10373000 MSVCR100D (pdb symbols) C:WINDOWSsystem32MSVCR100D.dll

3d930000 3da16000 WININET (pdb symbols) C:WINDOWSsystem32WININET.dll

3dfd0000 3e1bc000 iertutil (pdb symbols) C:WINDOWSsystem32iertutil.dll

5b860000 5b8b5000 NETAPI32 (pdb symbols) C:WINDOWSsystem32NETAPI32.dll

5d090000 5d12a000 comctl32_5d090000 (pdb symbols) C:WINDOWSsystem32comctl32.dll

71aa0000 71aa8000 WS2HELP (pdb symbols) C:WINDOWSsystem32WS2HELP.dll

71ab0000 71ac7000 WS2_32 (pdb symbols) C:WINDOWSsystem32WS2_32.dll

76390000 763ad000 IMM32 (pdb symbols) C:WINDOWSsystem32IMM32.DLL

76b40000 76b6d000 WINMM (pdb symbols) C:WINDOWSsystem32WINMM.dll

77120000 771ab000 OLEAUT32 (pdb symbols) C:WINDOWSsystem32OLEAUT32.dll

773d0000 774d3000 comctl32 (pdb symbols) C:WINDOWSWinSxSx86_Microsoft.Windows.Common-Controls_6595b64144ccf1df_6.0.2600.6028_x-ww_61e65202comctl32.dll

774e0000 7761e000 ole32 (pdb symbols) C:WINDOWSsystem32ole32.dll

77a80000 77b15000 CRYPT32 (pdb symbols) C:WINDOWSsystem32CRYPT32.dll

77b20000 77b32000 MSASN1 (pdb symbols) C:WINDOWSsystem32MSASN1.dll

77c00000 77c08000 VERSION (pdb symbols) C:WINDOWSsystem32VERSION.dll

77c10000 77c68000 msvcrt (pdb symbols) C:WINDOWSsystem32msvcrt.dll

77dd0000 77e6b000 ADVAPI32 (pdb symbols) C:WINDOWSsystem32ADVAPI32.dll

77e70000 77f03000 RPCRT4 (pdb symbols) C:WINDOWSsystem32RPCRT4.dll

77f10000 77f59000 GDI32 (pdb symbols) C:WINDOWSsystem32GDI32.dll

77f60000 77fd6000 SHLWAPI (pdb symbols) C:WINDOWSsystem32SHLWAPI.dll

77fe0000 77ff1000 Secur32 (pdb symbols) C:WINDOWSsystem32Secur32.dll

78130000 78263000 urlmon (private pdb symbols) C:WINDOWSsystem32urlmon.dll

7c800000 7c8f6000 kernel32 (pdb symbols) C:WINDOWSsystem32kernel32.dll

7c900000 7c9b2000 ntdll (pdb symbols) C:WINDOWSsystem32ntdll.dll

7c9c0000 7d1d7000 SHELL32 (pdb symbols) C:WINDOWSsystem32SHELL32.dll

7e410000 7e4a1000 USER32 (pdb symbols) C:WINDOWSsystem32USER32.dll

[/plain]

The two libraries kernel32.dll and msvcr100d.dll have been bolded to be easily found. Notice that their base addresses are 0x10200000 and 0x7c800000, which correlates with all the functions in the IAT table. All those function pointers are correct, because the OS filled the IAT table with correct function pointers when the executable was loaded.

The Import Directory

The import function is the function that's not located in the current module, but is imported from some other module, usually from several. The information about the function must be kept in the import directory of the current module because when the operating system loads the executable and memory and starts it, it must also load all the dependent libraries in the process's memory space, so that the program can call those functions.

The import table contains IMAGE_IMPORT_DESCRIPTOR structures, which has the following members:

Each IMAGE_IMPORT_DESCRIPTOR element structure in the import directory contains information about a DLL the current module needs in order to reference its symbols and call its functions. The array will always contain another terminating structure, which has its members initialized to zero.

At the beginning of the IMAGE_IMPORT_DESCRIPTOR, we can see a union data structure being used. Union variables occupy the same memory and are normally used to specify that certain variable can have different variable types. In our case, both variables, the Characteristics and OriginalFirstThunk, have the same variable type DWORD, so the union declaration is only used to make an alias for both of those members.

Remember that the union declaration occupies only 4 bytes in our case (which is the size of the DWORD type) and not 8 bytes; this is how the union declarations work. Because of this, the size of IMAGE_IMPORT_DESCRIPTOR data structure is 20 bytes: 4 bytes for the union declaration and 16 bytes for the TimeDateStamp, ForwarderChain, Name and FirstThunk elements.

We haven't yet mentioned what the elements of the structure actually mean. This is why we're describing them below:

  • OriginalFirstThunk: this element contains the RVA to the IMAGE_THUNK_DATA structure, which we can see below:

We can see that the IMAGE_THUNK_DATA structure is a union structure, which is 4-bytes in size. When we come to this structure, we must remember that a function can be imported by name or by ordinal. In the case of a latter, the Ordinal field of the union in IMAGE_THUNK_DATA structure will have the most significant, but set to 1 and the ordinal number can be extracted from the least significant bits.

The structure actually contains a pointer to the array of RVAs that point to the IMAGE_IMPORT_BY_NAME structures, terminated by 0. Let's look at how the IMAGE_IMPORT_BY_NAME structure look, which can be seen below:

There are two elements inside the structure:

  • Hint: this field is not of particular importance.
  • Name: contains the name of the import function; the field is actually a variable-sized pointer to the string.
  • Keep in mind that the OriginalFirstThunk will contain as many elements as is the number of imported functions for a particular library. Each imported function name represents one element in the array.

    • TimeDateStamp
    • ForwarderChain
    • Name: contains the RVA address where the name of the library is saved.
    • FirstThunk: contains the RVA to the array of IMAGE_THUNK_DATA structures, like the OriginalFirstThunk above. Both arrays contain the same number of elements. The OriginalFirstThunk is an array of names of imported functions, also called the ILT. The FirstThunk is an array of addresses of imported functions, also called the IAT.The OriginalFirstThunk uses the AddressOfData element of the IMAGE_THUNK_DATA structure, which points to another structure that contains the Name element of the library. The FirstThunk uses the Function element of the IMAGE_THUNK_DATA structure, which points to the address of the imported function. When the executable is loaded, the loader must traverse the OriginalFirstThunk array to find all the imported function names the executable is using. Then it must calculate the addresses of the functions and populate the FirstThunk array, so that the functions can be called whenever needed.

    Conclusion

    To conclude, the Import Table contains one entry for each DLL we're importing from. Each entry contains Import Lookup Table (ILT) and Import Address Table (IAT) pointers [7]. If we would like to go over the whole PE file structure, there's a great picture available at https://www.openrce.org/reference_library/files/reference/PE%20Format.pdf, which was provided by the OpenRCE team.

    References:

    [1] Import Address Table, accessible at http://en.wikipedia.org/wiki/Import_Address_Table#Import_Table.

    [2] Dynamic-link library, accessible at http://en.wikipedia.org/wiki/Dynamic-link_library#Symbol_resolution_and_binding.

    [3] CreateFile function, accessible at http://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx.

    [4] Linker Options, accessible at http://msdn.microsoft.com/en-us/library/y0zzbyt4.aspx.

    [5] PE File Structure, accessible at http://www.thehackademy.net/madchat/vxdevl/papers/winsys/pefile/pefile.htm.

    [6] Tutorial 6: Import Table, accessible at http://win32assembly.programminghorizon.com/pe-tut6.html.

    [7] What's the difference between "Import Table address" and "Import Address Table address" in Date Directories of PE?, accessible at http://stackoverflow.com/questions/3801571/whats-the-difference-between-import-table-address-and-import-address-table-a.

    Dejan Lukan
    Dejan Lukan

    Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.