Tuesday, October 26, 2010

A Pointer on Pointers Part 1

People often express uncertainty about the working of pointers, structures, etc.  They may not be my readers, but hearing their plight moves me to write nonetheless.  Others have noticed this too, again looking at StackOverflow.  Everything in this post is written from a C-centric point of view, unless otherwise noted.

There are two separate items to understand about pointers.  First, where they come from.  And second, what they point to.  We will roughly be working off of the following picture of memory usage, but understand that this is so simplified that it is almost wrong.


Memory can be allocated from two places: stack and heap.  Both are generalizations of the actual mechanics.  Stack allocations are explicitly named variables in the source that are either made globally or local to a function.  Global variables use space delineated in the binary image, rather than the stack pages, but they should be considered identically by a programmer in that the allocations are only good within their scope.  This leads to my first example:

struct foo* newFoo()
{
    struct foo F;
    return &F;
}

Managed language programmers often attempt the above example.  newFoo() creates a foo and returns a reference; however, the referenced foo is on the stack.  In case there is uncertainty, let's see the assembly for function newFoo():

; Using 32bit x86 assembly, as 64bit has calling convention differences
  push    ebp
  mov     ebp, esp
  sub     esp, 0x10  ; Allocate 16 bytes for a "foo"
  mov     eax, esp   ; Put the address into the return value
  mov     esp, ebp   ; Cleanup
  pop     ebp
  ret

Esp points to the end of the stack, so the subtract instruction changes the location pointed to, thereby setting aside the required space for an object foo on the stack, but the space is only available while in the function.  When the cleanup section is reached, the space is "reclaimed" as esp is changed back.  (The funny thing is, an optimizing compiler may inline newFoo() and therefore change the scope of this stack allocation).  So to make an allocation persist beyond its allocation scope, we need the "heap".

Heap memory is memory that won't be reclaimed until the program explicitly requests (or termination).  While malloc() is the most common method for heap allocations, it is not the only one.  From these methods, a section of memory has been set aside for the program's usage.  To again simplify our picture of computing, a Java implementation of newFoo would disassemble to something like the following:

  push    ebp
  mov     ebp, esp
  sub     esp, 0x4     ; Space for arguments
  mov     [esp], 0x10  ; "Push argument"
  call    malloc
  mov     esp, ebp     ; Cleanup
  pop     ebp
  ret

A language like Java might prohibit making an allocation of an object foo on the stack and therefore force the allocation to made from the heap.  And it does so by not using "sub esp, 0x10" to make the allocation but instead calling out to malloc() to set aside the necessary space.  Now, out of this function, the address returned is not reclaimed.

An entire series of courses would be required to fully explain the trade-offs and reasoning, and since the length is growing long, I will conclude here and discuss the usage of this memory in a later post.

Thursday, October 21, 2010

Accessing Structs with Side Effects

In my recent research, I've needed to take certain additional actions when the fields in a data structure are changed.  Since all code involved is my own, my initial approach was to annotate any access with explicit calls.  Obviously, this isn't sustainable in larger projects, but for quick prototyping, it is sufficient.  Let's see a simple example (with pseudo-C as usual):

typedef struct _tree {
    pTree left, right;
    int data;
} Tree, *pTree;

Tree t;
t.left = t.right = NULL;
// Now take action

Since we want to ensure that we take action after every update, get / set routines could be introduced for each field.  Yet, now I have the overhead of calling these routines on every access.  And I still have to trust that all accesses will use the routines (though using another language like C++, I could make the fields private and force access to the routines).

My research is currently using C#, so I have further tools available to me, specifically accessors.  Initially, they seemed to be yet another silly hoop to jump through while writing code.  However, they provide a great tool for my present problem, where modifications to fields need to incur side effects.  Now for a C# snippet:

private Node pLeft;  // Internal storage
public Node left     // Public accessor
{
    get { return pLeft;}
    set
    {
        if (pLeft == value) return;  // discard redundant stores
        pLeft = value;
        pStack.Push(this); // side effect on update
    }
}

So my data structure now has implicit side effects from updates (good for the research).  And so any future development doesn't need explicit annotations.  All in all, I was quite pleased with this change (except for spending the time debugging the differences between implicit and explicit detection).

Thursday, October 14, 2010

Book Review: Beautiful Code

I came across the book, Beautiful Code: Leading Programmers Explain How They Think (Theory in Practice (O'Reilly)), several months ago and was immediately intrigued.  Could this be a book to do so much of what I want to achieve here?  In short, no.  The editors want us to read thirty or so examples and learn something about what makes code beautiful.  In principle, I can agree with this thesis, as I have learned a lot about writing better code from reading what others have programmed.

The book provides 33 chapters, the shortest is 6 pages and the longest is over 30.  And the quality is inversely proportional to the length.  The shorter contributions are far more likely to achieve their goal of demonstrating beauty in programs, as their programs are more self evidently beautiful.  Short contributions just need fewer pages for their beauty.  Thus in reading 550 pages, one will find long sections of slow development to reach an uncertain conclusion as to the quality of a contributor's code, punctuated by shorter gems of programming.

To compound the difficulties in reading, I have yet to discern the method to the organization of the chapters.  Programmers use a wide variety of languages and the book is no different.  For example, the chapters using Fortran were skipped due in part from my lack of knowledge of the language.  Yet the final chapter used Lisp and was rather interesting, even though I have virtually no experience with it either.  But that chapter succeeds because the point is not based in the language, and changing its examples to pseudo-code would be just as workable.

To conclude, the premise is still valid.  And so I will remain hopeful for a future rewrite that discards about 20 chapters and provides some degree of transition between each.  Perhaps even confining the languages down to a handful, but just deciding this might be intractable.

Friday, October 1, 2010

Is elegance pretty?

Through the course of my work and reading, I encountered a write-up about what makes code "pretty" with   reasonable suggestions for establishing and maintaining a code style.  The essential objective is to make the code readable by others.  As I work with undergrads, they are not always thrilled about the effort required for "pretty" code, yet they can accept that I have to read and understand their program.  Yet this barely compares to the effort required in a commercial context, where the code may have a multi-year lifetime and pass through many programmers.

Is pretty code elegant?  No, although the reverse is true (and thus the title).  Elegant code has an aesthetic aspect that can be termed pretty, yet it must also be more than aesthetic.  My continuing proposal of elegant is code that must also be efficient, extensible, reliable, etc.  Basically, to be elegant, code must leave the objectors behind.  Instead, you want to show others your elegant code.

So go and write pretty code.