BOLTS : Click to return to MacEdition homepage
 

Memory corruption and malloc tools

December 10, 2002

Feedback Farm

Have something to say about this article? Let us know below and your post might be the Post of the Month! Please read our Official Rules and Sponsor List.

Forums

Want to dig even deeper? Post to the new MacEdition Forums!

Memory corruption happens when a piece of code writes data into the wrong location in memory. At best you’ll try writing into memory you don’t have access to and will crash. At worst you’ll slightly corrupt some data structure which will manifest itself in an error millions of instructions in the future.

The most common kinds of memory errors in C are buffer overruns and dangling pointers.

Buffer overruns are when you think you have a certain amount of memory at your disposal but you actually have less than that allocated. A classic example is forgetting to account for the trailing zero byte for C string termination.

For instance:

char *stringCopy = malloc (strlen(mystring));
strcpy (stringCopy, mystring);

You’ve just written one byte past the end of your allocated block of memory. To correct this, you need to account for that extra byte:

char *stringCopy = malloc (strlen(mystring) + 1);
strcpy (stringCopy, mystring);

Off-by-one errors (also called “obiwans” or “fence-post errors”) can also cause a buffer overrun. For instance:

void myFunction ()
{
    int i;
    ListNode mynodes[20];
    char stringBuffer[2048];

   // ... other data on the stack

   for (i = 0; i <= 20; i++) {
        mynodes[i].stuff = i;
   }

} // myFunction

Note that the loop runs from 0 through 20, which is 21 times through the loop. The last time through the loop is indexing past the end of the mynodes array and (most likely) has just trashed the beginning of the stringBuffer array.

What can make buffer overruns like this so nasty is that malloc stores bookkeeping information in memory immediately before the pointer it gives you. If you overrun a buffer off the end it could smash the malloc() information of another buffer. When you go to free that second piece of memory, you might crash inside of free(), and then spend a while on a wild goose chase wondering why some piece of good code just failed.

Another nasty side effect of buffer overruns like this is that malicious data could clobber the stack in such a way that when the function returns, program control will jump to an unexpected place. Many Windows platform exploits work like this.

Dangling pointers are memory addresses stored in pointer variables that don’t have any correlation with the memory they should be pointing to. Uninitialized pointers can cause this, as can forgetting to assign the return value of realloc(), as well as not propagating the address when memory moves or is changed. For instance:

char *g_username;

const char *getUserName ()
{
    return (g_username);
} // getUserName

void setUserName (const char *newName)
{
    free (g_username);
    g_username = strdup (newName); // performs a malloc
} // setUserName

Now consider this scenario:

name = getUserName ();  // say it's address 0x1000, "markd"
setUserName ("bork");  // the memory at address 0x1000 has been freed
printf (name);   // using a dangling pointer now

The OS X malloc() libraries have some built-in tools to help track down some of these conditions. You control it by setting environment variables and then running your program. (If you’re debugging a GUI app, you can run it from the command line by doing “open /path/to/your/AppBundle.app”)

% setenv MallocHelp 1

... will display help. Here’s what some of the different environment variables do when set (which are all case-sensitive):

MallocGuardEdges

... for large blocks, puts a 4K page with no permissions before and after the allocation. This will catch buffer overruns before and after the allocated block. The size of a “large block” is undefined, but experimentally, 12K and larger seem to be considered large blocks.

Here’s a little program to show it in action:

mallocguard.m
// mallocguard.m – exercise MallocGuardEdges.
/* compile with
cc -g -o mallocguard mallocguard.m
 */

#import <stdlib.h>

int main (int argc, char *argv[])
{
    unsigned char *memory = malloc (1024 * 16);
    unsigned char *dummy = malloc (1024 * 16);

    unsigned char *offTheEnd = memory + (1024 * 16) + 1;

    *offTheEnd = 'x';

    return (0);

} // main

The first malloc() gets us 16K of memory. The second one is there to give us some more pages of memory that we can clobber with the bad assignment to offTheEnd.

Running it normally gives us this:

% ./mallocguard
%

... like nothing happened. Let’s turn on the guard:

% setenv MallocGuardEdges 1
% ./mallocguard
malloc[20622]: protecting edges
Bus error

... a program error was found for us. We can use gdb to see exactly where the error happened.

MallocScribble

This writes over freed blocks with a known value (0x55), which will catch attempts to re-use memory blocks. That is a bad pointer value (an odd address) which will cause addressing errors if we try to dereference it. Judging from experiments, free() will always clear the first 8 or so bytes to zero, which will catch some errors, but not all.

Here’s a little example:

mallocscribble.m
// mallocscribble.m – exercise MallocScribble
// run this, then run after doing a 'setenv MallocScribble 1' in the shell

/* compile with
cc -g -o mallocscribble mallocscribble.m
*/

#import <stdlib.h>

typedef struct Thingie {
    char blah[16];
    char string[30];
} Thingie;

int main (int argc, char *argv[])
{
    Thingie *thing = malloc (sizeof(Thingie));
    
    strcpy (thing->string, "hello there");
    printf ("before free: %s\n", thing->string);
    free (thing);
    printf ("after free: %s\n", thing->string);

} // main

(The 16 character blah entry is to work around free()’s zeroing of the data so we can show what it’s doing with MallocScribble enabled.)

Here’s the run without anything set in the environment:

% ./mallocscribble
before free: hello there
after free: hello there

and after:

% setenv MallocScribble 1
% ./mallocscribble
malloc[20701]: enabling scribbling to detect mods to free blocks
before free: hello there
after free: UUUUUUUUUUUUUUUUUUUUUUUUUUUU
MallocStackLogging
MallocStackLoggingNoCompact

... records stacks on memory management calls for later use by tools like malloc_history.

MallocCheckHeapStart n

... starts performing sanity checks of the malloc() data structures for any signs of corruption after n dynamic memory operations.

malloccheckstart.m
// malloccheckstart.m – play with MallocCheckHeapStart
// run this, then run after doing a 'setenv MallocCheckHeapStart 100' in the shell

/* compile wth
cc -g -o malloccheckstart malloccheckstart.m
*/

#import <stdlib.h>

int main (int argc, char *argv[])
{
    int i;
    unsigned char *memory;

    for (i = 0; i < 10000; i++) {
        memory = malloc (10);

        if (i == 3783) {
            // smash some memory
            memset (memory-16, 0x55, 26);
        }
    }
    return (0);
} // main

If you just run this, it seems to work okay:

% ./malloccheckstart
%

But setenv MallocCheckHeapStart 1 and you get a lot of information:

% ./malloccheckstart
malloc[20765]: checks heap after 100th operation and each 1000 operations
MallocCheckHeap: PASSED check at 100th operation
MallocCheckHeap: PASSED check at 1100th operation
MallocCheckHeap: PASSED check at 2100th operation
MallocCheckHeap: PASSED check at 3100th operation
*** malloc[20765]: invariant broken for 0x52f20 (prev_free=0) this
msize=21845
*** malloc[20765]: Region 0 incorrect szone_check_all() counter=5
*** malloc[20765]: error: Check: region incorrect
*** MallocCheckHeap: FAILED check at 4100th operation
Stack for last operation where the malloc check succeeded: 0x70056c80
0x700042b0 ...
(Use 'atos' for a symbolic stack)
*** Recommend using 'setenv MallocCheckHeapStart 3100;
setenv MallocCheckHeapEach 100' to narrow down failure
*** Sleeping for 100 seconds to leave time to attach

We can then (if we wish) use gdb to attach to the running program and poke around to see what’s going on.

These are really handy utilities. They don’t pinpoint exactly what went wrong, but they’re useful for narrowing down the error possibilities.

Mark Dalrymple (markd@borkware.com) has been wrangling Mac and Unix systems for entirely too many years. In addition to random consulting and custom app development at Borkware, he also teaches the Core Mac OS X and Unix Programming class for the Big Nerd Ranch.

E-mail this story to a friend

Talkback on this story!