Thursday, December 30, 2010

A Pointer on Pointers Part 2

In the second part to the series on pointers, we'll be covering what they point to and how it relates to reality.  Let's consider four pointers: void*, char[], struct*, and (void *)f(int) (i.e., a function pointer).  With each pointer, we'll learn something further about C, assembly and how a processor interacts with memory.

To begin, there is the all purpose void* pointer (and yes, void* means pointer to void, but I'll keep writing void* pointer for emphasis).  A C type that means a pointer to nothing, or anything.  This pointer is important for allocating space and casting (changing the type of) the space.  This second piece is something that only has representation in the programming language, in assembly every pointer has no type information.  So by casting, the programmer tells the compiler that this block of memory is now to be treated differently.  Therefore, void* pointers are used when the type is not known, (e.g., pthread_create(..., void* arg) or CreateThread(..., LPVOID lpParameter, ...)) or to discard existing type information.

A char[] is the next C type that will be discussed here.  Every array in C is a pointer.  Learn this fact.  Internalize it.  They are so identical that you can use them interchangeably, like so:

char* foo = (char*) malloc(1024 * sizeof(char));
foo[0] = '\0'; // 1
*(foo + 0) = '\0'; // 2

Line 1 and 2 are equivalent.  So a pointer to many chars is an array of chars.  And we can access any array offset with the pointer.  Or we can use pointer arithmetic to access specific elements.  Now, next time you see char*, you can think of it as an array of characters.  (In a later post, we'll cover char** and other more complex pointers).

Working with the third type, a struct* pointer.  Structs are merely logical arrangements of data that a programmer specifies.  So accessing this data via a pointer will set up assembly to be at some offset from the base pointer.  If you want to ignore this compiler support, such things are your purview.  And in seeing how you can do this, we'll learn what the compiler is doing and what a struct* pointer is.  We want a struct with two shorts, followed by a long.  And we'll assign 'A', 'B', and 1024 to the three fields respectively.

typedef struct _pop2 {
    short a, b;
    long c;
} POP2, *PPOP2;

// Option 1
PPOP2 structP = (PPOP2) malloc(sizeof(POP2));
structP->a = 'A';
structP->b = 'B';
structP->c = 1024;

// Or option 2
char* no_struct = (char*) malloc(8);
*(short*)(no_struct + 0) = 'A';
*(short*)(no_struct + 2) = 'B';
*(long*)(no_struct + 4) = 1024;

You might be scratching your head, but option 1 and 2 do the same thing.  Option 1 is how a programmer should write the code.  Option 2 is roughly what the computer will actually be executing.  So just remember that structs and struct* pointers are just logical arrangements of the data for your conveince.

Lastly, the function pointer cannot be ignored.  A vital tool in modern architectures is the ability to call functions.  Most calls use built in addresses (established at compiler time), but sometimes where the code wants to go isn't known until runtime.  Function pointers are what enables inheritance in OO development, dynamic linking, and often finds its use in switch statements and other constructs.  (Especially fun to use them with runtime code generation, but that's another post).  And at their core, a programmer is merely telling the computer that execution should continue at some address.

To summarize, memory is just sequences of bytes that are given meaning by the types and usage that exist in the source code and consequently the program itself.  To be different types of pointers is to look at memory through different lenses.

No comments: