Wednesday, February 2, 2011

Ray Tracing Performance and Practice Part 3

When last we left the ray tracer, it could read in PLY files and then render a scene.  But while it rendered, you'd probably want to get dinner, read some blogs, and write some code.  Very slow.  Improving performance is very straight forward: either reduce the work done or find a way to do the same work faster.

At the time of original development, I knew that there existed further performance opportunities via Oct Trees.  The idea behind these and other similar structures is to partition the scene into sub-scenes that can be treated in the aggregate.  Let's work through a simple example.

While the actual ray tracer is in 3D, we'll reason through a 2D tracer.  We'll have lots of circles and triangles in standard XY coordinates.  We can start partitioning the scene into quadrants: +X +Y, +X -Y, etc.  Now, when we draw a line (i.e., the ray) through the scene, rather than checking whether the ray is intersecting every circle and triangle, we can first check whether it is passing through any quadrant.  If the ray doesn't pass through the quadrant, then the ray will also not pass through any object in the quadrant.  The Oct Trees work very similarly, except besides X and Y, the scene is also split by +/- Z.

If the scene is split just once, this can likely provide some benefit, but for real practicality each octant of the tree must also be split repeatedly until only a few scene objects (i.e., spheres and triangles) remain in each octant.  This is all well and good; however, four particular problems arise: first, where should each octant be split.  Second, how many scene objects should be in each octant.  Third, how should objects that straddle octant boundaries be handled?  Finally, how does one practically render with an Oct Tree?

Where?  Two possibilities were tried: splitting in the "center" of the objects and splitting in the center of the octant.  If the split point is based on the objects, then each octant should have a similar number of objects.  This bounds the depth of the Oct Tree by making each octant have a subset of the parent octant.  However, this also forces every ray to traverse several levels of the Oct Tree, as every octant should contain scene objects.  Instead, if each octant is split at its center, then many octants may be empty, but the ones with scene objects may have significant number of levels in the tree before reaching individual objects.

How many?  Ideally, this number would be reached based on considerable experimentation and analysis.  At present, the code just splits an octant when it contains more than 16 scene objects.  Given the 50k triangle scene from the previous post, this would create roughly 3000 final octants, 375 parent octants, 47 grandparents, 6 great grandparents, and 1 root.  Ideally.

Yet, the split points are never ideal and so the octants are imbalanced.  Especially as some scene objects will cross the octant boundaries.  For example, a sphere placed at the origin is present in all 8 octants.  So the Oct Tree places the scene object into every octant that it is present within.  Rather than computing whether the object is precisely within the octant, the Oct Tree creates a bounding cube around the object.  Back to the 2D space, draw a triangle from (1,1), (-1, 1), and (-1, -1).  A square around this triangle would also include (1, -1), and therefore the triangle would also be "part" of the +X -Y quadrant.  I would be happy to switch to computing whether the object intersects the octant boundary planes; however, that would be when I understand the math involved.

The final issue is how to put the Oct Tree into practice.  In the immediate, the intersect code follows:
    if (No Octants) 
        foreach SceneObject in OctTree
        {
            Compute Distance for Ray to Intersect Object
        }
        Return closest object or none
    foreach Octant in OctTree {
        Compute Distance for Ray to intersect Octant
    }
    Sort Distances
    Select closest octant
    do {
        Recurse on octant
        if (Object returned) return object
        Select next closest octant
    } while (Octants remain)
    return none

A potential optimization is to handle shadows, reflected rays, etc from the lowest octant rather than starting at the root.  However, implementing such an optimization requires significant state tracking, which also impedes making the ray tracer multi-threaded.

In a future part 4, I'll explore the hurdles of making the tracer multi-threaded rather than its current implementation of one thread.

No comments: