Application security

Format String Bug Exploration

AJ Kumar
May 19, 2015 by
AJ Kumar


The Format String vulnerability significantly introduced in year 2000 when remote hackers gain root access on host running FTP daemon which had anonymous authentication mechanism. This was an entirely new tactics of exploitation the common programming glitches behind the software, and now this deadly threat for the software is everywhere because programmers inadvertently used to make coding loopholes which are targeting none other than Format string attack. The format string vulnerability is an implication of misinterpreting the stack for handling functions with variable arguments especially in Printf function, since this article demonstrates this subtle bug in C programming context on windows operating system. Although, this class of bug is not operating system–specific as with buffer overflow attacks, you can detect vulnerable programs for Mac OS, Linux, and BSD. This article drafted to delve deeper at what format strings are, how they are operate relative to the stack, as well as how they are manipulated in the perspective of C programming language.


To be cognizance with the format string bug explained in this article, you will require to having rudimentary knowledge of the C family of programming languages, as well as a basic knowledge of IA32 assembly over window operating system, by mean of visual studio development editor. Moreover, know-how about 'buffer overflow' exploitation will definitely add an advantage.


The format string bug was first explained in June 2000 in a renowned journal. This notorious exploitation tactics enable a hacker to subvert memory stack protections and allow altering arbitrary memory segments by unsolicited writing over there. Overall, the sole cause behind happening is not to handle or properly validated the user-supplied input. Just blindly trusting the used supplied arguments that eventually lead to disaster. Subsequently, when hacker controls arguments of the Printf function, the details in the variable argument lists enable him to analysis or overwrite arbitrary data. The format string bug is unlike buffer overrun; in which no memory stack is being damaged, as well as any data are being corrupted at large extents. Hackers often execute this attack in context of disclosing or retrieving sensitive information from the stack for instance pass keys, cryptographic privates keys etc.

Now the curiosity around here is how exactly the hackers perform this deadly attack. Consider a program where we are trying to produce some string as "kmaraj" over the screen by employing the simple C language library Printf method as;

Program: sample code

#include "stdio.h"

int main(){

string sVal="kmaraj";

printf("My name is %s",sVal);

return 0;



The above program code seems quite simple, where at line 5, it turns out the first parameter can specify a string using %s format specifiers which causes Printf method to replace the %s with the content of null-terminating string buffer. However, everything gets going good at first glance, but what if the programmer doesn't specify the format string specifiers in the Printf method. The program will again compile and run error-free and produce desired output but this time exploitable to Format String Attack.


In the prior section, you have discovered how the hacker manipulates the format string bug. Subsequently, in this section, how the stack is being misinterpreted by mean of this bug will be explored. Suppose in the earlier program, you accomplish the task by most obvious direct way, without using the 'specifier' in the Printf method as following;


printf("My name is %s",sVal); printf(sVal);


In fact, both of the preceding printf implementation will achieve the same undertaking. However, which one among those is more secure and better? When analyzing the printf stack usage in C programming, it is detected that the arguments placement in the stack happens from last to first as. The assembly representation of first code block (with specifiers) in the stack as following;


push address of "sVal"

push address of "%s"

call printf


Once the parameter is pushed onto the stack and call instruction is processed, the stack looks like the following.


Caller Local Variables

Address of sVal

Address of "%s"

Return Address

Printf() Variable

The Printf method uses the first parameter to interpret what it sees on the stack. Therefore, the content referenced by one stack parameter can dictate the number of parameters, and whether each parameter is interpreted as values or reference. On the other hand, the Printf function expects the stack diagram without being specifiers as following;


Caller Local Variables

Address of sVal

Return Address

Printf() Variable

Here, the sVal buffer is the first parameter to Printf function, so it will be interpreted as a format by the function. When an untrustworthy input data specifies format string specifiers, the coder would not have indeed anticipated it as part of the input in sVal buffer. Therefore, the attacker could manipulate the stack by his own choice, could get the desired data to the top of the stack by removing value off the stack using the essential format string specifiers. If the hacker somehow knows the correct offset to where something interesting is on the stack, he can compute the indispensable number of percentages and other format specifiers to inject to have the value referenced appear in the output.


The C/C++ language is mostly prone to format string attack whereas other modern languages like C#, Java won't typically allow the execution of arbitrary code. In fact, C/C++'s internal design makes it harder to detect format string problems which include some especially dangerous commands that do not exist in some other languages' format string languages. A successful attack can lead immediately to the execution of arbitrary code, and to information disclosure. The format string is actually defined using a limited data processing language that's streamline output formats. Unfortunately, many programmers committed a subtle mistake while coding. They typically, exhaust data from untrusted source as the format string and intruders write unsolicited format strings to cause serious harms.

The sole issue in C/C++ language in context of this bug is that, function can be by specifying an ellipsis (…) as the last argument stated to take number of arguments which being called without knowing how many arguments are being passed in even at run time. Hence, let's have a look at the following simple C++ code snippet which simply displaying strings value by accepting it from command line.

Program: FSA.CPP

#include <stdio.h>

void main(int argc, char* argv[])


if( argc != 2 )


printf("Enter the command Argumentn");


if(argc > 1)






At first glance, the previously mentioned codes look fairly simple but the programmer isn't aware of the inherent format string bug in this code, since the Printf method is without specifiers. Let's observe the sample vulnerable program. Although we have supplied the format string, we haven't supplied the command variable to be substituted into the string. Interestingly, Printf doesn't fail, instead producing bizarre output that looks like this:


D:temp>fsa.exe Hello%x%x



If you open this FSA.EXE in any hex editor like WinHex, you will observer the same memory stack sequence as the prior sample yielding using %x .Moreover, using multiple %x specifiers we can penetrate the inner stack, and the result can be the exposure of sensitive data in memory including passwords, encryption keys, etc.


D:temp>fsa.exe "%x %x %x %x"

350 ffce 158 2


Another more interesting specifier is %n which modifies values in memory in order to change the default behavior of the program. For example, a program might store a password for some administrative feature in memory. Hence, that password can be null-terminated using the %n specifier, which would allow access to that administrative feature with a blank password too.

Format string attacks typically use the %x and %n specifiers in particular. However, the others can be used to crash either the machine or advance through the stack. What is truly happening in the preceding demonstration, when the number of actual arguments does not correspond to the number of tokens in the format string, the output includes various bits of the stack. The following figure shows the reading of four values from the stack and prints them in abusive manner as;

Figure : Stack after exploitation

The format string bug is also capable of causing denial of service attack via a malicious user forcing the process to crash. It is relatively easy to cause a program to crash with malicious format specifiers especially %s (NULL terminating string) and %n. In case of supplying a malicious format string containing either of these format specifiers, and no valid memory address exists, where the corresponding variable should be, then the process will succumb to attempting to dereference the stack, which causes a denial of service.


D:temp>fsa.exe %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s


On the other hand, if a large number of "%d" is specified, then an instruction might read from illegal addresses, which are not mapped. This in turn will result in a denial of service attack.

Moreover, the %s format parameter can be used to read from the memory address and subsequently, retrieve the address and print the desired value. The %x and %d retrieve the double word from the stack and print them in hexadecimal or decimal notation. At the end, the specifier %c, obtains the paired double word from the stack then converts it into the single byte of type character and finally displays it as a character.


There are several approaches to detect the format string loophole, which is typically bound to format family functions. Format functions are used to specify the format of output. They can perform conversion so that data types in C are converted into printable form. Perhaps the most efficient way is through rigorous code reviewing because programmer looks for all of the aspects where format string specifiers are used. First, there is list of functions in C/C++ programming that do not have a fixed list of arguments. Instead, they practice special ANSI C standard method to access arguments on the stack, no matter how many arguments there are presented as.

Table 1: The printf() Family of Functions

Printf Fprintf Fwprintf Vprintf _cprintf

Scanf Fscanf Fwscanf Vfprintf _cscanf

Wprintf Vwprintf Sscanf Vfwprintf _cwprintf

Wscanf Vwscanf Swsscanf Vsprintf _cwscanf

_sctprintf _snprintf _tprintf _sntprintf _ftscanf

_scwprintf _snscanf _tscanf _sntscanf _scprintf

_snwprintf _snwscanf _vstprint _vsnwprintf _vftprintf

These preceding functions in the table enable to generate a string based on a format string and a variable number of arguments. Thus, a format string can be contemplated a blueprint holding the basic structure of the string and tokens that determines what kinds of variable data goes where, and how it should be formatted in the Printf function.

Second, we have also different format specifiers at our disposal, available for numerous types of arguments to de displayed via Printf function; each of them can also have additional modifiers and field-width definitions. Following table illustrates a few of them especially used for format string attacks as;

Table 2: Format Token

Token To be Displayed Argument Category

%X Value of argument in hex notation as an unsigned integer Unsigned Int, char/short

%D or %I Integer of value of an argument in decimal notation Int, char/short

%S Character string pointed by the argument Char *, char[]

%U Value of argument in decimal notation as an unsigned integer Unsigned Int, char/short

%P Value of pointer will printed in hex form notation (Void *)

%N Nothing will be displayed (Int *)

If the program produces strange output while supplying the format-string input, then it may be deem to vulnerable for this attack. For instance, feeding a program with such malicious sequences of %x%x%x%x%x%x%x…, %n%n%n%n%n…%s%s%s%s%s…, may make it crash or output data from the stack.


Once, the hacker successfully detects this loophole and subsequently exploits it. He can perform multiple vicious operations like, read memory from the target process using the %s specifier, write the number of characters output so far to an arbitrary address using the %n specifier, and modify the number of characters output so far using width modifiers.

So, the rule of thumb in preventing format string bugs to be exploited by hackers is not to use a non-constant as a format string argument in all Printf family of functions. Instead, the correct usage of Printf without non-constant should be as following;


printf(user_supplied_data); // Prone to Vulnerable

printf("%s", user_supplied_data); // Correct Usage

fprintf(stderr, user_supplied_data); // Prone to Vulnerable

fprintf(stderr, "%s", user_supplied_data); // Correct Usage


Besides, if circumstances truly demands that a format string include input read from outside the program, then it is suggested to perform rigorous input validation on any values read from outside the program that are included in the format string. Moreover, perform a regular checking via third party tools

Like Flawfinder, RATS, and ITS4 in pursuit of this vulnerability.


Format string bug is the consequence of wrong programming practice, allowing externally supplied, un-sanitized data in the format string argument that results in exploitable format string vulnerabilities. In fact, this is an excellent paradigm of what can happen when functions used untrusted input to determine the layout the security. This editorial introduced this bug and explained the root cause behind the occurrence in context of C programming language code. It explained in deep, how to detect and walk through the details of how exactly this vulnerability works.




[3] Writing Secure Code, Second Edition



[6] Write It Secure: Format Strings and Locale Filtering"

AJ Kumar
AJ Kumar

AJ Kumar is a Cyber security evangelist, has a great passion for open source programming, IT security, bug detection, penetration testing, and assembly language on diverse platforms including Windows and Linux. He can be reached via ajkumarhv[at]gmail[dot]com;