Thursday, December 30, 2010

A Pointer on Pointers Part 2

In the second part to the series on pointers, we'll be covering what they point to and how it relates to reality.  Let's consider four pointers: void*, char[], struct*, and (void *)f(int) (i.e., a function pointer).  With each pointer, we'll learn something further about C, assembly and how a processor interacts with memory.

To begin, there is the all purpose void* pointer (and yes, void* means pointer to void, but I'll keep writing void* pointer for emphasis).  A C type that means a pointer to nothing, or anything.  This pointer is important for allocating space and casting (changing the type of) the space.  This second piece is something that only has representation in the programming language, in assembly every pointer has no type information.  So by casting, the programmer tells the compiler that this block of memory is now to be treated differently.  Therefore, void* pointers are used when the type is not known, (e.g., pthread_create(..., void* arg) or CreateThread(..., LPVOID lpParameter, ...)) or to discard existing type information.

A char[] is the next C type that will be discussed here.  Every array in C is a pointer.  Learn this fact.  Internalize it.  They are so identical that you can use them interchangeably, like so:

char* foo = (char*) malloc(1024 * sizeof(char));
foo[0] = '\0'; // 1
*(foo + 0) = '\0'; // 2

Line 1 and 2 are equivalent.  So a pointer to many chars is an array of chars.  And we can access any array offset with the pointer.  Or we can use pointer arithmetic to access specific elements.  Now, next time you see char*, you can think of it as an array of characters.  (In a later post, we'll cover char** and other more complex pointers).

Working with the third type, a struct* pointer.  Structs are merely logical arrangements of data that a programmer specifies.  So accessing this data via a pointer will set up assembly to be at some offset from the base pointer.  If you want to ignore this compiler support, such things are your purview.  And in seeing how you can do this, we'll learn what the compiler is doing and what a struct* pointer is.  We want a struct with two shorts, followed by a long.  And we'll assign 'A', 'B', and 1024 to the three fields respectively.

typedef struct _pop2 {
    short a, b;
    long c;
} POP2, *PPOP2;

// Option 1
PPOP2 structP = (PPOP2) malloc(sizeof(POP2));
structP->a = 'A';
structP->b = 'B';
structP->c = 1024;

// Or option 2
char* no_struct = (char*) malloc(8);
*(short*)(no_struct + 0) = 'A';
*(short*)(no_struct + 2) = 'B';
*(long*)(no_struct + 4) = 1024;

You might be scratching your head, but option 1 and 2 do the same thing.  Option 1 is how a programmer should write the code.  Option 2 is roughly what the computer will actually be executing.  So just remember that structs and struct* pointers are just logical arrangements of the data for your conveince.

Lastly, the function pointer cannot be ignored.  A vital tool in modern architectures is the ability to call functions.  Most calls use built in addresses (established at compiler time), but sometimes where the code wants to go isn't known until runtime.  Function pointers are what enables inheritance in OO development, dynamic linking, and often finds its use in switch statements and other constructs.  (Especially fun to use them with runtime code generation, but that's another post).  And at their core, a programmer is merely telling the computer that execution should continue at some address.

To summarize, memory is just sequences of bytes that are given meaning by the types and usage that exist in the source code and consequently the program itself.  To be different types of pointers is to look at memory through different lenses.

Sunday, December 19, 2010

Ray Tracing Performance and Practice Part 2

In this post, we shall explore creating a scene worth rendering.  In my prior work, my ray tracers were creating simple assemblages of spheres and triangles, with some having textures and others mirrored surfaces, etc.  This time, I wanted to have a scene that others might enjoy.  So for my first step, I turned to the Stanford 3D Scanning Repository and selected the angel model, Lucy.

The first problem in rendering is that the model is about 116 million triangles, or over 500MB in size.  Perhaps slightly more detail than is required for all but the largest of images.  Some time was spent searching for a tool that would provide a simplified model, and I settled on MeshLab.  However, the GUI version would crash trying to load the initial model, so I had to create a script of the required transformations and have the command-line application create the reduced model.  This reduced down to a 250k model, and subsequent transformations in the GUI gave a 50k triangle model and oriented appropriately for the ray tracer.

The second problem in rendering is that the model is stored in the ply file format.  I extended my scene parser to handle ply files without really understanding them. While I would like to have a generalized parser, I confined my work to the specific format that MeshLab was generating and did not worry about whether there were triangles or rectangles, normals present, colors specified, etc.  Yet, even this single format took some work.

Initially, the parser took in a text version of the ply file; however, the time to parse 100k lines of text and convert to the appropriate representations was approaching the time to render the scene.  Switching to a binary representation seemed a reasonable step; however, how does one take a stream of bytes and convert to a specific type?  In C, you could do the following:

Vec3f v;  // Struct of three floats
fread(&v, sizeof(float), 3, sceneFile);

But in C#?  I eventually settled on reading each value and converting via the "BitConverter" class.

Support.Assert(4 == pFile.BaseStream.Read(data, 0, 4), 
    "INVALID Ply - Fail to parse vertex");
newV.x = BitConverter.ToSingle(data, 0);

Great!  The statue is now in the tracer.  However, 50k triangles is too much for my naive ray tracer.  So there was performance work to be done.  At the time, I would render a 5k triangle version as a test model, as seen below (pending while in flight).


Thursday, December 16, 2010

Ray Tracing Performance and Practice Part 1

As the MICRO conference came to a close, my thoughts were stimulated toward new projects and undertakings.  One task in particular was developing a ray tracer as an additional workload for my research.  The first cause for specifically a ray tracer is that I developed one in C many years ago that used now deprecated libraries.  Secondly, prior to my last paper deadline, another professor suggested ray tracers as having many useful characteristics.  So I've been porting my old code from C to C#.  This series of posts will discuss some design decisions relating to the porting, additional optimizations that have been introduced, and other interesting headaches / pitfalls.

On the first order, porting the implementations was fairly straight forward.  The original code had some level of object-oriented (OO) design with triangles and spheres both encapsulated within "ray objects" (see Object Oriented C Part1).  But much of the implementation was a mangled mess.  For example, both colors and positions were represented by float triplets.  And so, some glue logic was required to convert from the ARGB color mode into the float space used by the tracer.  But being object-oriented, I could setup type conversions and operator overloads that significantly reduced the inelegance of the original implementation.

Now with a reasonable working port of the original code, the question arose: how to make it perform well?  The original used all sorts of tricks some useful like bounding spheres around the triangles and some not, like preallocating the intermediate space.  As always, there are a variety of levels of abstraction to target for performance work.  My original implementation only focused on micro optimizations like avoiding memory allocations, etc.  (And performance profiles showed that the largest single contributor was sqrt, but I didn't have a "fix" for that.)  However, especially within a higher-level language, many of these choices are denied and so my improvements have taken first to the algorithmic level.

Two (or more) posts will yet follow.  The first on finding and handling a sufficiently interesting scene to render.  And the second will be focused on reducing the computations required to render this scene.

Wednesday, December 8, 2010

Conference Time: MICRO-43 Day 2

I went to several talks yesterday, but my main purpose was achieved by meeting and discussing my research with the other attendees.  Most conversations are positive and constructive; however, one person remained thoroughly convinced that my research is without purpose.  Nonetheless, these opinions provide me with further guidance for how to present my work to others and provide better motivation for my work.

Monday, December 6, 2010

Conference Time: MICRO-43 Morning Day 1

As of noon, I've spent most of my morning as a volunteer at MICRO; however, I did slip in to hear the keynote and AMD's talk about an ISA extension for supporting transactional memory.  Transactional memory (TM) is a technology of which I am deeply skeptical.  But given that AMD's talk was from the ISA perspective rather than the HW itself, I gave it a go.

Their approach was to try and find the minimal set of instructions required to provide some meaningful TM functionality for programs.  SPECULATE, COMMIT, and LOCK MOV are the three instructions.  Furthermore, in providing minimal support, programs are not burdened by unnecessary overhead.  For example, SPECULATE begins a speculative region (i.e., transaction), but only checkpoints the instruction and stack pointers.

Within this region, only the memory operations that are made with "LOCK MOV" are given transactional semantics.  All other operations are non-speculative and therefore have normal semantics.  This enables a speculative region to only incur the overhead from required operations and not from incidental work.  And the HW support for transactions is reduced as fewer operations need to be tracked to provide sufficient support for applications.

COMMIT closes the speculative region and clears the transaction bits from the pending memory operations.  If there is a conflict during the region, then the HW rolls back to the old RIP which has a trampoline to determine the appropriate fail behavior, like retry, etc.

This approach had far greater appeal to me than previous works.  I think the minimalist approach is favorable from a performance standpoint.  This approach also provides greater flexibility to use the TM support for other applications.

For more details see: AMD's ASF or the MICRO 43 proceedings (not yet available)

Wednesday, December 1, 2010

Elegance in Structure

Recently, I came across a request for an elegant implementation of linked lists (again on stack overflow).  And now that I'm finally through conference deadlines, etc, let's consider this request in further detail.

Specifically, the issue is a singly linked, circular list.  I'd love to hear the benefit of making a singly linked list circular, but I'll assume that there is a reason.  Therefore, we'll explore three design questions.
  1. Layout of the structure
  2. Inserting elements
  3. Deleting elements
Structure layout revolves around a key issue that will decide the rest of the design.  Is the head element tracked in a particular way?  To specifically track the head element, the data structure would need two pieces: a container and individual elements.  This is the questioner's design.  Alternatively, one would treat the head implicitly as the first element, and solely use the "node" structure.  The advantage is that there is only one data structure and the necessary code for managing the special "head" is already in place.  However, with an implicit constraint, the data structure can break if it isn't obeyed.  This constraint is the container structure, but without the compiler's support / guarantees.

The second issue is preparing an elegant insertion of elements.  Two design questions are present: whether to order, and if the list is unordered then is it a queue or a stack.  Based on the poster's design, we have an unordered queue.  Unfortunately, this gives an unpleasant O(n) insertion cost, unless we can include a tail pointer.  Since elegance encompasses more than simplicity, elegant tail insertion should have a pointer whenever possible.  The potential inelegance of the implementations (for insertion and removal) is dealing with the three different insertion cases: zero elements, one elements, many elements.  Can we do it in fewer?

typedef List struct list;
typedef Node struct node;
bool Insert(List* L, int value)
{
    Node* N = (Node*) malloc(sizeof(Node));

    if (N == NULL) return false;

    N->value = value;

    if (L->head == NULL)  // case zero elements
    {
        L->head = L->tail = N->next = N;
    }
    else  // cases 1+ elements
    {
        L->tail->next = N;
        L->tail = N;
        N->next = L->head;  // keep circular
    }

    return true;
}

The final elegance piece rests on element removal. Again, we're trying to reduce the inelegance by combining the different cases into fewer tests and code.  The original implementation has three cases: remove only element, remove head, and remove element.  The distinguishing step about removing the head element is that the head pointer needs to be updated.  Additionally, when any element is removed, the previous element needs to be updated.  However, we already have the previous element for the head, the tail!  Thus, element removal has a common case of finding the element, and then a couple of quick checks based on that element.

void Remove(List* L, int value)
{
    Node* T, E, P;

    if (L == NULL) return;

    T = L->head;
    P = E = L->tail;

    if (T == NULL) return;

    do {
        if (T->value == value) break;
        P = T;
        T = T->next;
    } while (T != E);

    // Ideally, the break would skip into this block
    if (T->value == value)
    {
        P->next = T->next;   // Remove elem
        if (T == E) L->tail = P;  // Removed tail
        if (P == E) L->head = P->next;  // Removed head
        if (T == P) {L->head = L->tail = NULL;} // Removed only elem
        free(T);  // Cleanup
    }
}

Initially, I wanted to treat the list implicitly, but I find myself liking the inclusion of a tail pointer. Yet on small lists, the trade-offs may be different.  I wouldn't give this as the be all and end all of circular lists, but I would present this as my generic implementation.

Monday, November 1, 2010

To research the unknown

As part of my duties as a graduate student, I presented some of my research on Friday.  Beyond the presentation itself, which went well, an intriguing question arose: "If something is so difficult / expensive to implement that no one has written the code, then how do you know what it is?"  Indeed.  This is an honest criticism, how am I going to explore performance improvements to particular problems when the problems are so computationally expensive that they aren't implemented in existing applications.

Fortunately for my research, the scope of computationally expensive is still well within the feasible realm.  Furthermore, many of these expensive problems are already known, and are not so far removed from the current applications as to be independent.  So in the end, perhaps it isn't a horrible road block, but rather just one more piece to *research.*

For bonus points, the discussion portion of my presentation ran about the same length as my presentation itself.  And spawned further emails in the subsequent days.  So I'm delighted by how thought provoking (or perhaps contentious) my research is proving to be.

Tuesday, October 26, 2010

A Pointer on Pointers Part 1

People often express uncertainty about the working of pointers, structures, etc.  They may not be my readers, but hearing their plight moves me to write nonetheless.  Others have noticed this too, again looking at StackOverflow.  Everything in this post is written from a C-centric point of view, unless otherwise noted.

There are two separate items to understand about pointers.  First, where they come from.  And second, what they point to.  We will roughly be working off of the following picture of memory usage, but understand that this is so simplified that it is almost wrong.


Memory can be allocated from two places: stack and heap.  Both are generalizations of the actual mechanics.  Stack allocations are explicitly named variables in the source that are either made globally or local to a function.  Global variables use space delineated in the binary image, rather than the stack pages, but they should be considered identically by a programmer in that the allocations are only good within their scope.  This leads to my first example:

struct foo* newFoo()
{
    struct foo F;
    return &F;
}

Managed language programmers often attempt the above example.  newFoo() creates a foo and returns a reference; however, the referenced foo is on the stack.  In case there is uncertainty, let's see the assembly for function newFoo():

; Using 32bit x86 assembly, as 64bit has calling convention differences
  push    ebp
  mov     ebp, esp
  sub     esp, 0x10  ; Allocate 16 bytes for a "foo"
  mov     eax, esp   ; Put the address into the return value
  mov     esp, ebp   ; Cleanup
  pop     ebp
  ret

Esp points to the end of the stack, so the subtract instruction changes the location pointed to, thereby setting aside the required space for an object foo on the stack, but the space is only available while in the function.  When the cleanup section is reached, the space is "reclaimed" as esp is changed back.  (The funny thing is, an optimizing compiler may inline newFoo() and therefore change the scope of this stack allocation).  So to make an allocation persist beyond its allocation scope, we need the "heap".

Heap memory is memory that won't be reclaimed until the program explicitly requests (or termination).  While malloc() is the most common method for heap allocations, it is not the only one.  From these methods, a section of memory has been set aside for the program's usage.  To again simplify our picture of computing, a Java implementation of newFoo would disassemble to something like the following:

  push    ebp
  mov     ebp, esp
  sub     esp, 0x4     ; Space for arguments
  mov     [esp], 0x10  ; "Push argument"
  call    malloc
  mov     esp, ebp     ; Cleanup
  pop     ebp
  ret

A language like Java might prohibit making an allocation of an object foo on the stack and therefore force the allocation to made from the heap.  And it does so by not using "sub esp, 0x10" to make the allocation but instead calling out to malloc() to set aside the necessary space.  Now, out of this function, the address returned is not reclaimed.

An entire series of courses would be required to fully explain the trade-offs and reasoning, and since the length is growing long, I will conclude here and discuss the usage of this memory in a later post.

Thursday, October 21, 2010

Accessing Structs with Side Effects

In my recent research, I've needed to take certain additional actions when the fields in a data structure are changed.  Since all code involved is my own, my initial approach was to annotate any access with explicit calls.  Obviously, this isn't sustainable in larger projects, but for quick prototyping, it is sufficient.  Let's see a simple example (with pseudo-C as usual):

typedef struct _tree {
    pTree left, right;
    int data;
} Tree, *pTree;

Tree t;
t.left = t.right = NULL;
// Now take action

Since we want to ensure that we take action after every update, get / set routines could be introduced for each field.  Yet, now I have the overhead of calling these routines on every access.  And I still have to trust that all accesses will use the routines (though using another language like C++, I could make the fields private and force access to the routines).

My research is currently using C#, so I have further tools available to me, specifically accessors.  Initially, they seemed to be yet another silly hoop to jump through while writing code.  However, they provide a great tool for my present problem, where modifications to fields need to incur side effects.  Now for a C# snippet:

private Node pLeft;  // Internal storage
public Node left     // Public accessor
{
    get { return pLeft;}
    set
    {
        if (pLeft == value) return;  // discard redundant stores
        pLeft = value;
        pStack.Push(this); // side effect on update
    }
}

So my data structure now has implicit side effects from updates (good for the research).  And so any future development doesn't need explicit annotations.  All in all, I was quite pleased with this change (except for spending the time debugging the differences between implicit and explicit detection).

Thursday, October 14, 2010

Book Review: Beautiful Code

I came across the book, Beautiful Code: Leading Programmers Explain How They Think (Theory in Practice (O'Reilly)), several months ago and was immediately intrigued.  Could this be a book to do so much of what I want to achieve here?  In short, no.  The editors want us to read thirty or so examples and learn something about what makes code beautiful.  In principle, I can agree with this thesis, as I have learned a lot about writing better code from reading what others have programmed.

The book provides 33 chapters, the shortest is 6 pages and the longest is over 30.  And the quality is inversely proportional to the length.  The shorter contributions are far more likely to achieve their goal of demonstrating beauty in programs, as their programs are more self evidently beautiful.  Short contributions just need fewer pages for their beauty.  Thus in reading 550 pages, one will find long sections of slow development to reach an uncertain conclusion as to the quality of a contributor's code, punctuated by shorter gems of programming.

To compound the difficulties in reading, I have yet to discern the method to the organization of the chapters.  Programmers use a wide variety of languages and the book is no different.  For example, the chapters using Fortran were skipped due in part from my lack of knowledge of the language.  Yet the final chapter used Lisp and was rather interesting, even though I have virtually no experience with it either.  But that chapter succeeds because the point is not based in the language, and changing its examples to pseudo-code would be just as workable.

To conclude, the premise is still valid.  And so I will remain hopeful for a future rewrite that discards about 20 chapters and provides some degree of transition between each.  Perhaps even confining the languages down to a handful, but just deciding this might be intractable.

Friday, October 1, 2010

Is elegance pretty?

Through the course of my work and reading, I encountered a write-up about what makes code "pretty" with   reasonable suggestions for establishing and maintaining a code style.  The essential objective is to make the code readable by others.  As I work with undergrads, they are not always thrilled about the effort required for "pretty" code, yet they can accept that I have to read and understand their program.  Yet this barely compares to the effort required in a commercial context, where the code may have a multi-year lifetime and pass through many programmers.

Is pretty code elegant?  No, although the reverse is true (and thus the title).  Elegant code has an aesthetic aspect that can be termed pretty, yet it must also be more than aesthetic.  My continuing proposal of elegant is code that must also be efficient, extensible, reliable, etc.  Basically, to be elegant, code must leave the objectors behind.  Instead, you want to show others your elegant code.

So go and write pretty code.

Monday, September 27, 2010

Stack Overflow: String Replace

I've recently begun exploring Stack Overflow.  Helping to answer people's questions has a considerable appeal to one, such as I, who enjoys teaching.  Sometimes you can find a question that no one has answered, as few know the answer.  Other times, almost any professional programmer can answer the question, and often these are from CS students trying to solve their homework.  On occasion, I may repost interesting questions here.

For my first repost, we have a StringReplace function that needs optimizing.  Independent of the questioner's implementation, let's consider the core algorithm.

StringReplace(OldText, NewText, BaseString)
    for SubString of length OldText in BaseString
        if (OldText == SubString) {Replace text}

This is how we want to replace.  Find each instance of OldText in BaseString and replace.  Given the nature of the question, our implementation will be written in C and not use any libraries (like regex, CRT, etc).

Working backwards, we'll need to return a final string of some indeterminate size.  Then the BaseString, as it is modified, is stored in this final string, in the following form, where each base is a subset of the string:

When the problem is viewed this way, an implementation is suggested, recursive.  The following code expands upon this approach (though avoiding some casting requirements, necessities of sizeof(char), and possibility of unicode support).

#define StringReplace(x, y, z) StringReplaceHelper(x, y, z, 0)
char* StringReplaceHelper(char* OldText, 
                          char* NewText, 
                          char* BaseString,
                          unsigned long space)
{
    char* start = BaseString, ret = NULL;
    unsigned long offset = 0;

    while (*BaseString != '\0')
    {
        if (Match(OldText, BaseString))
        {
             offset = (BaseString - start);
             // The next search will begin after 
             // the replacement text
             ret = StringReplaceHelper(OldText, NewText,
                                 BaseString + strlen(OldText), 
                                 space + offset + strlen(NewText));
             break;
        }

        BaseString++;
    }

    // If the end of string, then this is the last piece
    // Else copy in subset and replacement piece
    if (*BaseString == '\0')
    {
        offset = (BaseString - start);
        ret = (char*) malloc((space + offset));
        memcpy(ret + space, start, offset));
    }
    else
    {
        memcpy(ret + space, start, offset);

        // Don't actually do the following, the processor
        // will have to walk NewText twice
        memcpy(ret + offset, NewText, strlen(NewText));
    }

    return ret;
}

First, the routine Match() is basically a string compare, which can have its own interesting optimizations including partial skip-ahead.  Second, a final version wouldn't use strlen, except perhaps at the start.  The lengths can be passed between iterations.  Finally, using memcpy and malloc aren't cheating as I've written my own implementations in the past, and therefore their omission in the present is just for the sake of brevity.

But this is just what I'd currently write for a string replace.  You can follow the link to Stack Overflow and see the original request (and my suggestion for improvement).  Or perhaps consider your own.

Friday, September 24, 2010

Elegance in Function

A friend shared today a post about an elegant worm, Stuxnet.  While I am not a security researcher and cannot comment on the claims of elegance, I can delight to hear that others are evaluating their fields with aesthetics in mind.  If the thought of an elegant virus bothers you, consider that I would also claim that there are beautiful swords and firearms.

The post begins: "Brilliance and elegance are powerful adjectives most often used to describe things of beauty -- people, places, and things.  Works of art whether they are literary or visual; aural or editable often bear these designations but it is rare indeed when we see them used to describe malicious code & content..."

Edit: Since this post, I've been seeing many other places Stuxnet has been mentioned, including the NYTimes.

Tuesday, September 21, 2010

Review: Performance Anti-Patterns

Being a performance engineer, I always appreciate when others give solid guidance for how to write performant code.  One such work is Performance Anti-Patterns.  While you can read this article, there are a couple of things I'll reiterate here.

First, performance should be part of the entire development process.  Trying to optimize code at the end of the cycle limits the options that are available.  Conversely, adding micro-optimizations early is potentially wasteful as future development may negate the effect.

Second, benchmarks are key to proper measurement of performance. Establish what are the important scenarios for the application.  And the metrics of interest for the benchmark.  Is the goal to increase throughput?  Or perhaps cycles per request?  Power consumption at a given load?  Without a specific measurement of performance, a benchmark just serves as an application that is run regularly.

This also defines the common case.  Take a storage driver and three operations: read, write, and recovery.  In all likelihood, the read and write operations occur far more often than recovery.  If a disk fails, does it matter if the user is notified 10ms versus 20ms after the event?  Does it matter if a read takes 10ms versus 20ms?  Spend the effort on optimizations like-wise.

Third, think parallel.  This requirement is beginning to sink in; however, just being multi-threaded is not enough.  How many threads?  One per task, CPU?  Is the data properly partitioned?  A scientific application will operate very different from a web server.

Finally, there is a cost and trade-off to performance.  Reliability and security are obviously paramount.  But also the readability of the code and future extensions to the application.  Taking these elements into account can distinguish performant code from truly elegant implementations.

Thursday, September 16, 2010

Object-Oriented C (part 3)

In this exciting conclusion to the object-oriented C programming series, I will discuss a couple of drawbacks to my prior solution, as well as compare my implementation to those provided by OO compilers.

Drawback #1: Branch Prediction
It is important to remember that a branch is any instruction that changes the instruction pointer, which includes function calls.  Most functional calls are straightforward; however, some branches use a register (i.e., runtime value) for their destination, most commonly with function pointers and switch statements.  Ray tracers may already use oct-trees to reduce the set of intersection objects.  Further sorting a set of objects into their different types would present an easier to predict sequence of indirect function calls.

Drawback #2: Inheritance is Explicit
All of the OO inheritance is up to the programmer.  There is no assistance from the compiler to ensure that classes inherit correctly, nor for "constructors" or protection of fields.  This forces the programmer(s) to be quite disciplined in their usage.  Under particular circumstances, I'd not worry about this, but in general I would want the compiler to keep an eye on the usage of this model.

To reprise, for someone who has limited experience with C++ and its notation (like I did at the time), using the limited OO capabilities provided by this solution proved ideal.

All of this work hinged on an important "law of programming": If a compiler / run-time can do something implicitly, the programmer can do so explicitly.  (See More Rules).  Because a compiler for C++ is generating machine code to support OO, there must be a way to write code to give an equivalent compilation in C.  So then what does the C++ compiler generate?  (As someone who has not written a compiler and is not disassembling C++ programs, the following will remain speculation, though clarifying comments are appreciated).

I can conceive of three ways for a compiler to store an object's functional information:
1) Type field.  The type of an instance is always clearly known, but types must be globally unique.  Any code wishing to perform operations needs multiple levels of indirection to get from a "type" to the appropriate function.  Bleh.
2) Direct inlined fields.  This is effectively the solution in C.
3) Indirect fields.  Keep a single table of function pointers for this type in memory.  While this saves the memory for a couple of function pointers, there is an additional layer of indirection that must be traversed on every function call.

My remaining question is: "how does a compiler / run-time address 'diamond' inheritance"?  That is, classes that inherit from multiple distinct classes.

Thursday, September 9, 2010

Problem Solving via Programming Languages

As an interlude to the OO series, I want to clarify something about language preference.

BASIC, C, C++, C#, IA64, Java, Lisp, MIPS, ML / OCAML, Perl, Python, and X86 / X64 -- I've written executable code in them all (some far more than others).  While C dominates my coding experience, and is my general preference, I recognize that sometimes another language provides better support.  Usually, this support is primarily a reduction in development time, but I'd like to delve a little deeper into why I choose three particular languages: C, C#, and Perl.

C -- "Machine independent assembly"
When I chose to write C code, I'm often thinking of precisely how a computer will execute the high-level solution that I've sketched.  Usually, given a problem, I am decomposing it into executable bits, like looping through a list, building a tree just-so, etc.  I know what I want, and if I'm wrong then its my foot that I'm shooting.

C# -- "Rapid application development"
I'm only beginning to use C# regularly, but I'm finding it very useful for achieving simple tasks, particularly those that require a UI.  .NET is providing an extensive collection of support libraries, so that loading graphics, interacting with them, etc is mostly a matter of connecting the pieces together.  To me C# is called for when my problem is UI-centric: click the button and an image appears....

Perl -- "Text, files and strings"
My Perl use has been declining over the years, not for a dislike of the language, but rather a shortage of problems that call for its use.  Perl is my tool of choice when I need to work with lots of text and files.  I've used it to write everything from copy files between directions and rename to parse MBs of raw data and compute statistics for particular results of interest.  (The later eventually was rewritten in C for speed and portability).

I'm always thinking that one of these days I'll try learning more about a functional language.  However, I quickly fall into the trap of thinking procedurally.  "I know you're on an x86, so just let me write my assembly and we can be done."

Tuesday, September 7, 2010

Object-Oriented C (part 2)

In part 1, I outlined the problem with writing a ray tracer in C, particularly how to opaque the presence of different objects.  Since my roommate was a TA, he and I discussed some of the crude ways he feared that he would have to grade.  For example, the intersect calls might be based on switch ... case or an if ... else chain, as follows:

switch (object->type)
{
    case TRIANGLE: ...
    case SPHERE: ...
...
}

While it clearly handles the different object types, it sets up regular patterns of duplicated code.  Once for intersection, then for finding a color, etc.  Each pattern also has to be modified if a new scene object is introduced.  But I wanted something better, particularly I wanted to show that carrying a type field is unnecessary for my design.  I concluded that I could have my scene objects carry their function pointers.  (Note, the following code is still pseudo in nature and is provided for illustration and not compilation purposes).

typedef struct _SCENE_OBJECT {
    float (*Intersect) (pSCENE_OBJECT, Vec3);
    void (*Color) (pSCENE_OBJECT, Vec3 ...);
    pSCENE_OBJECT next;
} SCENE_OBJECT, *pSCENE_OBJECT;

Then I declare my sphere scene object (and similarly for other scene objects), as follows:

typedef struct _SCENE_OBJECT_SPHERE {
    SCENE_OBJECT generic;
    ....
} SCENE_OBJECT_SPHERE, *pSCENE_OBJECT_SPHERE;

Now, my scene object types have inherited from the base object.  Therefore, the intersection loop from part one now has a single line intersection call:

dist = object->Intersect(object, ray);

This approach still has drawbacks, but it achieves a simple inheritance between structs in C, thereby providing a rudimentary OO capability to a language known for not having one.  In part 3, I look forward to exploring some of the drawbacks, as well as the elusive discussion of how OO intrinsically works in C++ or other languages.

Monday, September 6, 2010

Object-Oriented C (part 1)

For most programmers, the very idea of writing object-oriented (OO) C code seems wrong on several levels.  First, there are several, quite usable, programming languages that already support OO designs, so why mess with adding something to C that is as core to a language as OO?  Second, how would someone do this anyway?

Disclaimer: I am not recommending that OO programming be generally accomplished in C, but rather am writing to argue how some OO constructs can be integrated into a program to improve its design / readability (i.e., elegance).

Many years ago when I was an undergraduate in Computer Science, one assigned project was to write a ray tracer.  The provided framework was in C++, but having not studied the details of inheritance, overloading, etc, I elected to rewrite the framework in C.  Since the framework was very simple, the few classes were merely renamed to struct and member functions provided a prefix.  Yet, this is not the interesting part, merely the starting point.

In brief, ray tracers cast (i.e., test for intersection) a ray (i.e., vector) against the scene (i.e., a list of objects).  The object that is first intersected provides the color for that part of the image.  Different objects will have different intersection tests / other properties.  In pseudo-C, the code for intersection might read:

object = objectList->head;
minDist = INF;  minObject = NULL;
while (object != NULL) {
    dist = Intersect(object, ray);
    if (dist < minDist) {... }
    object = object->next;
}

The fundamental design problem is therefore how to properly set up the objects so that the ray tracer's operations upon its scene objects are still elegant.  How to accomplish this will be discussed in part 2, along with how OO intrinsically works.

Friday, September 3, 2010

Elegance in Coding

To start this blog, I thought I'd quote the first paragraph from my statement of purpose, which I wrote as part of applying for grad school.

Elegance is when an object achieves its telos, designed end, well and simply. For software to achieve elegance, it must be performant as well as satisfying other fundamental requirements like security and reliability. This refined software does not waste its resources: CPU cycles, memory, and device accesses. Instead it makes efficient use of what it requires. Reaching software elegance is often limited by the experiences of developers; important developments require looking beyond the standard realms for which applications are designed, whether the realms consist of workloads or hardware configurations or even the problems being solved.  Academic research provides one setting for the long term investigations required for exploring potential solutions of the next realms of software.

Writing software is not just a logical act of solving a given problem.  Often though it is viewed this way, even by me.  Yet there are days when I see my code as a creative art.  One that isn't about solving the problem, but rather doing so well.  Because in being written well, the source code can be called beautiful.  You may wish to share the code with others, to take it before the class, before the team and let us all admire the work.  Admire its elegance and simplicity.

Yet this art of computer science is not a matter of font selection, or pleasing quantities of white space.  Rather, we see that the problem is solved with efficiency at run time and without defect.  Still, this code has style too.  That it is consistent in using spacing.  The variables are named meaningfully.  Comments are useful.

So I'll be writing sometimes about elegance in programming, more commonly from the negative direction.  But most posts will be about ordinary problems in writing software or brainstorming designs or so many of the other tasks that a computer scientist faces.