Handling Memory in Protected Mode
Introduction
In the past, systems such as MSDOS used the real mode, and it had no protections against accessing any memory address. Programs then were able to access even the system data that was stored in memory. We can imagine that how bad this was when some programs did something stupid like overwrite system data structure, causing the whole operating system to crash.
Nowadays, something called the protected mode is used most often. It provides virtual address space to programs, which is also used to ensure that programs can access only their part of physical memory. There are also other memory protection mechanisms that can be used in protected mode.
When the computer is powered on, it starts in real mode but only to initiate the operating system, which sets up the protected mode by enabling the Protection Enabled PE bit in the control register CR0 [1]. Control register is used to change how the processor behaves; it can control interrupts, switch between real or protected mode, etc [2].
The protected mode has the same registers as the real mode to maintain backward compatibility. The names of the registers have changed from AX to EAX, where the letter E was pre-appended to notify us that the registers were 'extended' from 16-bits to 32-bits. All of the registers in IA32 are 32-bits wide, except for segment registers which remained 16-bits wide. Besides those registers, the processors need additional values that have to be present all the time when in protected mode, like the address of the interrupt descriptor table, global descriptor table and others. Therefore, it needs additional registers so that those values can be as close to the processor as possible and because they are used very often.
All of the new registers are used to tell the processor where the special data structures are located in memory, because they are needed in protected mode for the computer to function correctly.
Why did the segment registers that stayed at 16-bits were not enlarged to 32-bits?
Let's try to describe this in detail. Real mode uses 20-bit memory addresses, which are constructed from the 16-bit segment register and a 16-bit logical address. Since both registers can hold only 16-bit address, we can't directly construct the 20-bit physical address to address the value in memory. This is done by adding an additional zero to the segment register to make it 20-bits wide and then adding the logical address to it.
The real mode uses segment registers to build a part of the physical address, which is then used to access the value in physical memory. The logical addresses are used to build the other part of the physical address and are still 16-bits wide; one segment register can be used to access at most 64KB of physical memory. In protected mode, the segment registers are not used to build part of the physical address directly, but the address in the segment registers are used differently, which will be discussed later.
First, let's talk about protected mode in general. Protected mode was added to the 80286 processor, which had segment registers that were 16-bits wide.
Let's present the amount of memory that various processors could address [3, 4]:
In the first column, we've represented an example of the process. The second column shows how many bits the registers have to hold the address, which can be used to address the memory location. The third column shows the number of address pins the processor has, which directly correlates to the maximum amount of memory that can be addressed. The last column represents the maximum amount of memory the processors could address.
One question remains: How can we address 2^36 amount of memory if we only have 32-bit wide registers? The answer is Physical Address Extension (PAE), which allows us to access memory that's larger than the maximum allowed address space. In IA-32, there are 32 address lines to access the memory, which can only access the maximum of 4GB of memory. But if we add additional address lines, we could address considerably more memory, about 64GB of memory, because 4 address lines are added.
But what happens with the virtual memory addresses? The virtual memory addresses are still 32-bits wide, and since the operating systems use the flat memory model, each process can address 4GB of memory. The OS uses paging to map that virtual address space to the 64GB of physical address space, which makes sense, because the programs can share the 64GB of memory even though a single process can't allocate that much memory at the same time.
Let's schematically present how the physical memory addresses are calculated in real mode. We can see this on the picture below, which was taken from [6]:
The picture above is really clear and gives us all the information we need.
Protected Mode with Segmentation
We've seen that in real mode, the processor can address only 1MB of memory, even if the computer has a larger address bus than 20 bits. However, the whole address space can be addressed in protected mode. The other difference between real and protected mode is that the real mode has segments 64KB long, while the protected mode can have segments of variable size. They are not limited to 64KB, but to 4GB.
Moreover, in protected mode, the segments are not shifted and used as a base address of the physical address, but as an index of the descriptor table, which contains a descriptor that describes the segment of memory that we're trying to access. The segment registers CS, DS, SS, ES, and FS are still 16-bits long, but in protected mode, they are used as follows:
- 0-1 : RPL (Request Privilege Level)
- 2-2 : TI (Table Indicator)
- 3-15 : Segment Selector
The bits from 3-15 are used to select the appropriate segment in the segment table, which means that only 13 bytes are used to select the segment. This further implies that the segmentation table can hold 8192 segment descriptors (2^13 = 8192). But why are the two additional fields RPL and TI present in the segment register value?
The TI field can hold the value 0 or 1 and describes whether the segment descriptor is stored in the local or the global segmentation table. The RPL field is used to specify the requested privilege level of a memory segment: since this field is only 2 bits long, the values can only be 00, 01, 10 or 11. The value 00 specifies the highest permission that's equal to kernel mode, and the value 11 specifies the lowest permission that's equal to user mode. The privilege specified must be higher than the one specified in the DPL field of the segment descriptor for access to be granted.
The segment selector is then used to get the right segment from the segmentation table, either GDTR or LDTR. Let's now present the format of the descriptors that are accessible in the GDTR and LDTR. The picture below shows the format and was taken from [6]:
The picture is really good and presents everything we need to know (the author of the picture did a really good job on this: thank you). From it we can see that the segment descriptor is 64-bits long. The segment descriptor contains quite a few elements, but the most important one is the Base[0..31] base address that points to the starting location of the memory segment. Because the address is 32-bits long, the segment can begin at any location in the 4GB memory.
The Limit[0..19] is used to specify the size of the segment. There are only 20-bits that can be used for the segment size, which implies that the maximum size of the segment can be 1MB. But that can't be right since we said that segments can be of any size up to 4GB. The trick is that the flag G (Granularity) is used to specify whether each of the elements in the Limit[0..19] are bytes of 4KB regions of memory. They can be both:
- 1B : 2^20 = 1MB
- 4KB : 2^20 * 4KB = 2^32 = 4GB
This explains how the segments can be any size up to the 4GB limit. There are also other flags in the descriptors described below:
- D: are we operating in 16-bit MSDOS environment or in full fledged 32-bit environment like Windows XP?
- X: does the segment contain the executable code?
- R/W: is the segment readable or writeable?
- C/D: the segment growth direction.
- S: system or application segment.
- P: is the segment present in memory or has been swapped out?
- DPL: a privilege lever of the current segment; can contain values 00, 01, 10 or 11 (00 is used by the segment that is used by kernel mode, while 11 is a segment used by user mode).
- G: should the size of the segment be multiplied by 4KB or not?
Let's now talk about how the logical or virtual address is translated to physical address. We know that programs operate with virtual addresses and don't know anything about physical addresses; it's the processor's job to access the right physical address. At first, the segment registers are used as an index into the GDTR/LDTR to get the segment descriptor, then the Base[0..31] plus the offset are used to calculate the linear address.
There are also additional registers that are available in protected mode:
- CR0-CR4 : controlling registers
- GDTR : 48-bit global descriptor table register
- LDTR : 48-bit local descriptor table register
- IDTR : interrupt descriptor table register
- PDBR : page directory base register
- TR : task register
We can also explain why the computer must first boot into the real mode, before being able to boot into the protected mode. One of the reasons is that the real mode must set up the data for the IDT, GDT and LDT tables, otherwise nothing would work. Each of the segment descriptors also contains a cache that is used so that the processor doesn't need to access memory every time the segment register is being used. The GDTR as well as LDTR are 48-bit registers that contain two data members: the first 16-bits contain the size of the GDT table, while the remaining 32-bits contain the address of the GDT table.
The task register TR points to the segment descriptor that represents a program. This is the register that enables the use of context and task switching in multitasking systems. The TR register is necessary if we want to have multiple processes running at the same time. Well actually only one process is running at the same time, but we get the feeling that multiple are running, because the context switching is occurring so fast that we don't even notice it.
Protected Mode with Paging
Earlier in the article we mentioned that paging is optional and that only segmentation can be used. Let's take a look at the picture from [6] that can be seen below:
We can see here that applications use the logical or virtual addresses that are sent to the segmentation unit to get the linear address. If paging is not being used, the linear addresses are the same as the physical addresses, and with a linear address we can directly access the physical memory. If paging is enabled, we must send the linear address through the paging unit to further translate the linear address into the physical address that can be used to address a specific value in the physical memory. Paging is enabled when the PG bit is set to 1 in the control register CR0.
When paging is enabled, the linear address space is divided into pages of predefined size, usually 4KB, but that needn't be the case. The physical address is then divided into page frames. At any given time, a page can be located in a page frame in memory or on disk. If the page is located on disk, a page fault occurs and the processor must transfer the page from disk into the page frame (the memory). When paging is enabled, the linear address is not a physical address, but its bits are used as follows:
- 0-11 : offset within a page frame (this is actually a part of the physical address)
- 12-21 : offset into page table (this value selects the right entry in the page table)
- 22-31 : offset into page directory (this value selects the right entry in the page directory)
When paging is enabled, we must also use the PDBR register that holds the physical base address of the page directory structure in physical memory. 10-bits in the linear address are used to get the entry in the directory table, which means that the directory table can hold at most 2^10=1024 page directory entries (PDE). The format of each PDE is presented below (again, the picture is taken from [6]):
We can see that each PDE contains the physical base address of the Page Table. The right page table entry (PTE) into the page table is selected based on the 12-21-bits of the linear address. Since the index is using 10-bits, there can be at most 1024 PTEs in each page table. Each PTE further contains the physical base address of the page in memory. To select the appropriate value in the page in memory, the 0-11 bits of the linear address are used.
Conclusion
Keep in mind that while in protected mode you have to use segmentation, while paging is optional. Thus, an operating system can use just segmentation by itself, or segmentation with paging. The translation of virtual address into the physical address can be quite complicated, especially if paging is enabled, but if you really want to understand what's happening behind the curtains, you should take the time to fully understand this.
References:
[1]: Protected mode, accessible at http://en.wikipedia.org/wiki/Protected_mode.
[2]: Control register, accessible at http://en.wikipedia.org/wiki/Control_register.
[3] x86, accessible at http://en.wikipedia.org/wiki/X86.
[4] RAM Limit, accessible at http://en.wikipedia.org/wiki/RAM_limit.
[5] Physical Address Extension, accessible at http://en.wikipedia.org/wiki/Physical_Address_Extension.
[6] x86 memory management and linux kernel, accessible at http://manavar.blogspot.com/2011/05/x86-memory-management-and-linux-kernel.html.