Every minute of production downtime usually costs a company money. If your application has a serious problem causing the stack to break, you̵
What’s Stack Smashing?
Working as a quality assurance engineer, sooner or later will encounter the term: pile crushing. As a developer, you’re probably more likely to discover this term, especially if you’ve introduced a bug in the code that causes a crushed pile. It’s relatively easy (as in ‘somewhat easy’) for a developer to make a mistake introducing stack smashing. As a user, when you learn about breaking stacks, the damage is probably already done.
Stack smashing can happen involuntarily, such as when the developer introduced a bug that caused stack smashing to occur, or maliciously, an attacker trying to somehow overrun or corrupt a program’s stack.
Stack smashing is a somewhat loosely defined term that can indicate different problems and come from different sources. The two most prominent issues that stack smashing can cause are; 1) to write/overallocate too much data in one part of the stack, overwriting another part of the stack, and 2) where an external source (malicious or not) is stacking another program has overwritten, although this is much less common.
So what is a stack? This is also a broadly defined term. In general, a stack refers to a program processing stack, a stack of functions as defined in a particular software program/code.
Start by imagining a stack of bathroom tiles, stacked, ready to be used by a tiler. This is a pretty good representation of a computer stack, with a few tweaks. If each tile was slightly offset from the previous one, it would be a better image, and we’ll see why soon.
Imagine that each stacked tile is a function in the computer program. The most basic function is at the bottom and can, for example, de
main() function in a C or C++ program. C and C++ are two programming languages that use the stack extensively.
Each of these functions in the C/C++ program has a name and probably a set of incoming and outgoing variables. In simplified terms, imagine that one of those variables was 10 characters long and another function accidentally wrote 100 characters to that variable. This can damage the entire stack.
In terms of the tile example above, imagine someone hitting the first tile a little too hard with a hammer, destroying all the other tiles. uh voila; pile crushing
The analogy works because, just as all tiles are now broken in our fictional memory image, a broken stack will result in ‘broken functions’ if you want. Each tile offset is a feature nested more deeply – more on broken features in the next section.
Debugging Broken stack(s)
While technically a reference to ‘broken functions’ may not be quite correct, that is, there is probably only one broken function, and there may not even be a broken function if there is an external attack or a program that is not working properly, it’s a great way to think about a broken pile.
Suddenly, variable and function names can be corrupted, and a backtrace (the flow of functions it took for the computer to get to a particular function that crashed and (in our example) broke the stack) no longer makes sense.
In general, if we look at a backtrace, it has a clear flow of functions that have been called. Although a crashing program cannot immediately be called ‘healthy’ in terms of backtracing/debugging, a ‘healthy’ backtrace looks like this:
However, when a stack is damaged, debugging becomes much more difficult. The stack might look like this:
This is an example of a stack smashing issue that occurred in MySQL, the database server (see this
log.txt attachment to MySQL Bug 37815 for the full output) in 2008, causing the database server daemon (
mysqld) to end.
While the operating system library
libc.so.6, seems to have handled the stacking of the stack pretty well in this case (using some boost functionality in the
__fortify_fail function), the problem existed somewhere in the code and has since been fixed.
Also note that in this case we don’t see resolved function names, we only get the binary name (interestingly, the problem seems to have been in the client (
mysql) causing the server (
mysqld) to end) that is
mysql, along with a memory address of the function:
Normally, when we use debug symbols (see below for an article on GDB explaining what debug symbols are in detail), we would see function names with variables, and even with some levels of binary optimization/minification, we would see at least see function names, just like what we see in the first ‘healthy’ backtrace above.
However, in the case of a broken stack, the output of the function names, variable names or values is never guaranteed and often complete mumbo-jumbo 🙂 We may even see different function names or a complete mangled pile (another jargon often used by IT people) of various function names that don’t make much sense (and are probably fictitious/false because the stack has been overwritten in some way).
This makes it more difficult for both the test engineer (who can have many different outcomes for a single bug, complicating the handling of known bug filtering mechanisms) and the developer (who may need step-by-step tracing or a reverse execution debugger like RR to fix the existing bug). to discover).
What to do when faced with Stack Smashing?
When you come across stack smashing, the first thing you want to do is understand the problem and the environment a little better to know the source. If you have a popular web server on the internet with many game users trying to win a tournament while the server is also mining Bitcoin, you will want to take the possibility of cheating and find out if someone is tampering with the server.
However, in most cases, the problem is just an application error. while i say ‘only’, the problem can be very significant, can lead to service outages, can be costly and ultimately unresolvable. For example, a database server may crash persistently on startup because the data is in a certain state along with a code flaw or limitation.
If such a situation is exacerbated by not smashing the stack, or in other words, not being able to generate a clean backtrace of the problem, debugging will be more complicated and sometimes nearly impossible. However, don’t worry that the same basic debugging as with any bug or application error/crash/problem remains the same.
Carefully read all log files before, during and after the problem occurred. Make some backups and try the operation again. Is it going wrong again or not? Examine the errors, parts of the stack and even the frames (i.e. individual stack features shown, such as the like
do_the_maths feature in our original ‘healthy’ stack trace) can be placed in your favorite search engines.
By merging (with a space) the most selective (top) crashing frames and searching for the same online, you will often find an existing bug report for the problem you are facing. However, it is likely that these frames (function names) have become corrupted in the case of stack smashing and are therefore no longer usable in the same way. If you see a confirmation message (a developer has put a claim in the code) of any kind, look for that as well.
Always log a new bug report if the problem doesn’t seem to be online yet (you can help others who see the same!) and provide as much information about the problem as you can find. Thousands of bug reports against as many applications are logged online every day. Hopefully the support team for your stack smashing application will be on hand to help quickly.
You might also like to read our article Debugging with GDB: Getting Started, as it builds on how C and C++ programs (and others) can be discovered with the GDB debugger. It also explains the concepts of a stack in detail.