Tuesday, October 26, 2010

A Pointer on Pointers Part 1

People often express uncertainty about the working of pointers, structures, etc.  They may not be my readers, but hearing their plight moves me to write nonetheless.  Others have noticed this too, again looking at StackOverflow.  Everything in this post is written from a C-centric point of view, unless otherwise noted.

There are two separate items to understand about pointers.  First, where they come from.  And second, what they point to.  We will roughly be working off of the following picture of memory usage, but understand that this is so simplified that it is almost wrong.


Memory can be allocated from two places: stack and heap.  Both are generalizations of the actual mechanics.  Stack allocations are explicitly named variables in the source that are either made globally or local to a function.  Global variables use space delineated in the binary image, rather than the stack pages, but they should be considered identically by a programmer in that the allocations are only good within their scope.  This leads to my first example:

struct foo* newFoo()
{
    struct foo F;
    return &F;
}

Managed language programmers often attempt the above example.  newFoo() creates a foo and returns a reference; however, the referenced foo is on the stack.  In case there is uncertainty, let's see the assembly for function newFoo():

; Using 32bit x86 assembly, as 64bit has calling convention differences
  push    ebp
  mov     ebp, esp
  sub     esp, 0x10  ; Allocate 16 bytes for a "foo"
  mov     eax, esp   ; Put the address into the return value
  mov     esp, ebp   ; Cleanup
  pop     ebp
  ret

Esp points to the end of the stack, so the subtract instruction changes the location pointed to, thereby setting aside the required space for an object foo on the stack, but the space is only available while in the function.  When the cleanup section is reached, the space is "reclaimed" as esp is changed back.  (The funny thing is, an optimizing compiler may inline newFoo() and therefore change the scope of this stack allocation).  So to make an allocation persist beyond its allocation scope, we need the "heap".

Heap memory is memory that won't be reclaimed until the program explicitly requests (or termination).  While malloc() is the most common method for heap allocations, it is not the only one.  From these methods, a section of memory has been set aside for the program's usage.  To again simplify our picture of computing, a Java implementation of newFoo would disassemble to something like the following:

  push    ebp
  mov     ebp, esp
  sub     esp, 0x4     ; Space for arguments
  mov     [esp], 0x10  ; "Push argument"
  call    malloc
  mov     esp, ebp     ; Cleanup
  pop     ebp
  ret

A language like Java might prohibit making an allocation of an object foo on the stack and therefore force the allocation to made from the heap.  And it does so by not using "sub esp, 0x10" to make the allocation but instead calling out to malloc() to set aside the necessary space.  Now, out of this function, the address returned is not reclaimed.

An entire series of courses would be required to fully explain the trade-offs and reasoning, and since the length is growing long, I will conclude here and discuss the usage of this memory in a later post.

No comments: