Complete Tour of PE and ELF: Data Structures
We covered some important sections in Part 1 of this series. In this part, we will cover some more complex data structures including important concepts of binaries. Here is what we are looking at:
What should you learn next?
If you can recall in Optional header, we have a field _Image_DATA_DIRECTORY[16] that we have not discussed yet. This article is based on the different data directories structure pointed to by this field.First we will discuss the data directory IMAGE_DIRECTORY_ENTRY_IMPORT, which contains all the DLL imports present in the binary.
To jump to Data Directory Import Section look at the import section in Optional Header.
Looking at the Import RVA of AO48 we move to the .text section as this range falls within this section.
This data directory contains the RVA of the DLL structure it is pointing to. Size field contains the size of DLL imports. Virtual Address points to actual DLL structure which looks like below
Here the important fields that we care about are:
Here is the IAT for this exe
Here is the Import INT for this exe
As discussed above that the OFT, FT is an address pointer which points to further structure as discussed below:
OFT, FT points to IMAGE_THUNK_DATA which in turn points to another structure _IMAGE_IMPORT_BY_NAME which contains the Hint and Name which contains the actual name. So to simplify that it's like OFT and FT point to Import Name Table(INT) and Import Address Table(IAT) which both, in turn, points to IMAGE_THUNK_DATA structure. In the case of IAT -> IMAGE_THUNK_DATA; the IMPORT_BY_NAME will be the address of the function imported by this DLL. In the case of INT -> IMAGE_THUNK_DATA; the IMPORT_BY_NAME will point to another structure which contains the HINT (a possible ordinal of n imported function) and the actual name of the function imported by the DLL. It must be noted that before OS loader loads the binary both the IAT and INT points to same things and are interpreted by union member u.AddressOfData. After loading the IAT loads the actual memory address of functions and is interpreted union member by u.Function .
IAT has the whole data directory [12] named IMAGE_DIRECTORY_ENTRY_IAT and is defined as follows
It contains the RVA of the IAT in the binary and its size.
There is also a concept of Bound Imports, which is done for speed optimization. What it is is for a specific version of DLL's that are resolved at link time only and are placed into the IAT . and if DLL changes then the IAT is again filled up with new entries of functions in memory. As you might have guessed with ASLR, this bound import is never going to work.
Below is the import HINI/NAMES as is pointed out by the INT.
If you can see above if the field TimeDateStamp has a value of -1, then it points to a separate data directory[11] named IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT which is defined as follows:
This structure has an RVA to another structure named ad IMPORT_BOUND_IMPORT_DESCRIPTOR, which contains an important field of TimeDateStamp which tells the compile time of the DLL. From this entry it is usually specified whether the same version of DLL is required or not. If not then, the corresponding IAT is updated.
Here you can see that there is a TimeDateStamp field in this which suggests the version of this DLL. Also, since the field Number Of Module Forwarder Refs are 0 in this case, it means for this exe these DLLs is not dependent.
After Bound Import DLL, there are other types of DLL loaded called as Delay Loaded DLLs. These DLLs will not be loaded into the memory until they are being called upon. This also has a specific data directory[13] named as IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT, and it looks like below
This has an RVA to another structure named as IMAGE_DELAY_IMPORT_DESCRIPTOR
Here important fields are:
szNAME: Name of the DLL
pIAT: This points to a spate IAT which gets filled up when needed
pINT: This points to names table which gets filled up with function names as needed.
Below is the DELAY_IMPORT_DESCRIPTOR for this DLL. It has got its own IAT and INT. Also, HINT/NAMES is being pointed out by the INT.
So how this delay loads IAT works is like initially it points to some stub code within the binary. When the DLL is being called upon it loads the function address in the IAT and calls it. Whenever next time the DLL is loaded, and function is invoked it will be directly pointed to by the IAT.
Here is the DELAY_IMPORT NAME TABLE
You can map these RVA as to entries in DELAY IMPORT HINTS/NAMES
So here is what we stand now. We have covered the most complex structure(IMPORT) in this article.
FREE role-guided training plans
So I think this is enough this sections as it involved a lot of nested structures and required a bit to understand. IN the next part of this series we will cover the remaining sections of PE and will start with ELF.