Designing a Compile-Time Ray Tracer

Introduction

This is part 1 of the CTRT series. You can read the complete series here:

In today’s travel, we will be discussing how to design the world’s fastest runtime ray tracer. Specifically, we are going to be implementing a ray tracer that performs all of the calculations at compile-time, so that once the compiler finishes running, we are left with the fully rendered image. This project provides us with an excuse to dive into the world of constexpr functions and explore several compile-time patterns such as CRTP, compile-time polymorphism, amongst others.

Before we get started, let’s lay down some ground rules:

Only C++17 features are allowed. This choice is mostly due to the fact that neither MSVC nor CMake have a dedicated C++20 flag yet. In addition, I haven’t read enough about the new features to figure out how they could be applied.
All computations must be done at compile-time through constexpr functions (so no template meta-programming).
The main function must be trivial. In other words, it should only be allowed to grab the final rendered image and save it. Nothing else.
To make things more interesting, we are only allowed to use the following external libraries:
- The standard library (let’s not get too carried away here),
- STB (for saving the image to JPEG), and
- Zeus (mostly for utilities, FMT, and Catch).

Now that we have set the ground rules, let’s begin where most graphics projects often start: the linear algebra classes.

⚠ WARNING ⚠
We will not be discussing the fundamentals of ray tracing beyond what is required for the design of the ray tracer. Furthermore, we will also not be talking about all of the C++ features used in the implementation aside from those that are new or are used in a different way.

Note
At the top of each section, you will find links to all the files that are discussed there in case you want to read the code along with the explanations. The entire source code for this post can be seen here

From the Ground-Up: The `Vector` Class

Vector.

The vector class we’ll be designing for this project is relatively simple. Since we may want to switch between floats and doubles (if nothing else for the sake of testing performance), we will be templating the class as follows:

1
2
3
4
5
6
template<typename T>
class Vector
{
public:
    // ...
};

Our vector needs some data, so let’s decide how we’re going to hold it in the class. We could:

Hold three variables of type T representing the $x, y, z$ coordinates, or
Hold an array of type T and size 3.

“But wait! There’s a third option” You may say. Indeed, we could use a fairly common trick in computer graphics and declare our class with something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
template<typename T>
class Vector
{
public:
    union
    {
        struct
        {
            T x, y, z;
        };
        std::array<T, 3> data;
    };
    // ...
};

This option offers us the best of both worlds: we can access each component of the vector individually as a coordinate or we can treat it as an array. So why wasn’t that mentioned in the 2 options? Well, let’s try it! After adding a couple of constructors to the class, consider then the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
template<typename T>
constexpr Vector<T> add_vectors()
{
    Vector<T> u{1.0f}; // Initialises all 3 coordinates to 1.0f.
    Vector<T> v{2.0f}; // Initialises all 3 coordinates to 2.0f
    Vector<T> w;       // Empty constructor initialises to 0.0f.
    w.x = u.x + v.x;
    w.y = u.y + v.y;
    w.z = u.z + v.z;

    return w;
}

int main()
{
    constexpr auto result = add_vectors<float>();
    return 0;
}

Compiling with MSVC gives us the following errors:

1
2
3
main.cpp(37,27): error C2131: expression did not evaluate to a constant
main.cpp(28,12): message : failure was caused by accessing a non-active member of a union
main.cpp(28,12): message : see usage of 'Vector<float>::x'

So, what is going on? This pattern will work correctly if we remove the constexpr qualifiers, but since that would defeat the purpose of the entire exercise, we need to figure out what the problem is. If we read through the documentation regarding constant expressions, we will find the following:

A core constant expression is any expression whose evaluation would not evaluate any of the following: […]
an lvalue-to-rvalue implicit conversion or modification applied to a non-active member of a union or its subobject

In simple terms: we are not allowed to change the value of a non-active member of the union. In our case, v.x is not active, and therefore we cannot modify it. Since we aren’t allowed to use the union, we are left with the two options we discussed earlier. To simplify things, we are going to be doing a hybrid of options one and two by declaring the array of type T and then writing functions to allow us to access the individual coordinates of the vector. In short, we are left with something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
template<typename T>
class Vector
{
public:
    std::array<T, 3> data;

    // Constructors...

    constexpr T& x()
    {
        return data[0];
    }

    constexpr T x() const
    {
        return data[0];
    }

    // Similarly for the remaining coordinates.
};

Note that we are providing overloads for const access as well as regular access (similar to how we would overload operator[] which we will be overloading).

Now that we have the basic class defined, let’s discuss the operators.

Operators

For starters, we are going to need to perform the following:

Add and subtract two vectors,
Multiply a vector by a scalar (and divide for convenience),
Negate a vector,
Dot and cross products,
Take the length and squared length of a vector, and normalise it.

We could implement the operators as one would expect, but instead we’re going to dig into our bag of tricks and realise that ultimately all of the arithmetic operators can be implemented as a function that gets applied to the array held by the vector. So, instead of manually setting the individual components, we’re going to use lambdas to transform them instead. Our first attempt would look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
template <typename T, typename UnaryOp>
constexpr Vector<T> unary_op(Vector<T> const& vec, UnaryOp&& fun)
{
    Vector<T> out{vec};
    std::transform(out.begin(), out.end(), out.begin(), fun);
    return out;
}

template<typename T>
constexpr Vector<T> operator-(Vector<T> const& vec)
{
    return unary_op(vec, [](T a) { return -a; });
}

Unfortunately this won’t compile on the back of std::transform. As of C++17, the algorithms in the STL are not declared as constexpr functions, and therefore cannot be used in this context. This is not a problem, since we can simply replace the call to std::transform with a loop. Adding a version that takes an rvalue for the sake of completeness, and we are left with the following implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
template<typename T, typename UnaryOp>
constexpr Vector<T> unary_op(Vector<T>&& vec, UnaryOp&& fun)
{
    Vector<T> out{std::move(vec)};
    for (auto& val : out.data)
    {
        val = fun(val);
    }
    return out;
}

template<typename T, typename UnaryOp>
constexpr Vector<T> unary_op(Vector<T> const& vec, UnaryOp&& fun)
{
    Vector<T> out{vec};
    for (auto& val : out.data)
    {
        val = fun(val);
    }
    return out;
}

Similarly, we will implement a binary_op that takes two vectors. Using these functions allows us to quickly implement all of the arithmetic operators by simply passing in lambdas with that perform the corresponding operations.

The rest of the functions are implemented normally, until we reach the length function. Initially, we would implement it like this:

1
2
3
4
5
template<typename T>
constexpr T length(Vector<T> const& vec)
{
    return static_cast<T>(std::sqrt(dot(vec, vec)));
}

And we would be greeted with yet another compiler error informing us that std::sqrt is not a constant expression and therefore cannot be used. Unfortunately this means that we are going to have to implement our own square-root function, which we can do fairly easily by using the Newton-Rhapson approximation and a recursive function. The code shown here is adapted from this gist.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
namespace detail
{
    // Adapted from https://gist.github.com/alexshtf/eb5128b3e3e143187794
    template<typename T>
    constexpr T sqrt_newton_rhapson(T x, T curr, T prev)
    {
        return curr == prev ? curr
                            : sqrt_newton_rhapson(
                                  x, T{0.5} * (curr + x / curr), curr);
    }
} // namespace detail

template<typename T>
constexpr T sqrt(T x)
{
    if (x >= 0 && x < std::numeric_limits<T>::infinity())
    {
        return detail::sqrt_newton_rhapson(x, x, T{0});
    }

    return std::numeric_limits<T>::quiet_NaN();
}

At this point, we have successfully implemented the entire vector class and all the necessary functionality for our ray tracer! Now let’s talk about the ray class.

Shooting `Ray`s and `Camera`s

The ray class we’re going to be using here is a simplified version of the ray class that is presented in PBRT. The only thing that is special about this class is the fact that we’re going to be overloading operator() to act as the evaluation function for our ray. Mainly, it will look something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
template<typename T>
class Ray
{
public:
    // Constructors

    constexpr Vector<T> operator()(T t) const
    {
        return origin + t * direction;
    }

    Point<T> origin;
    Vector<T> direction;
}

Where Point<T> is just an alias for a vector. This is the only interesting part of the class, so lets move on to the next part: the camera.

The goal for the camera class is to give us the direction of the ray for a given position on the image plane. Beyond this, the camera doesn’t need to deal with much else. Now we could argue that it would be worth implementing a common interface so we can create different cameras, but that isn’t worth it at this stage since adding things like fish-eye lenses or depth-of-field are more complex and generally tend to work better with random sampling, which present a problem as we will see later on. For now, let’s jump immediately to the thing we are rendering: the scene.

Light, Camera, `Scene`!

Scene

Under normal circumstances, the design for a scene class would involve creating a generic interface and then setting up a series of sub-classes to represent the different types of scenes that we would want to render. So, we would set up something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Scene
{
public:
    // Some list of virtual functions.
};

class ExampleScene : public Scene
{
public:
    // Override necessary functions here.
};

We would then setup a render function that would take a pointer to the base class and use it to render the final image:

1
2
3
4
void render(std::shared_ptr<Scene> const& scene_ptr)
{
    // ...
}

Unfortunately, we are not in normal circumstances. Specifically, since we are running everything at compile-time, we can’t use any form of dynamic memory allocation, which means we can’t do runtime polymorphism. This would mean that we can’t use our nice interface setup… or can we?

While it is true that we can’t use runtime polymorphism, this does not mean that we can’t use compile-time polymorphism. Now, depending on how you define it, there are a couple of different ways that this could be accomplished in.

If it Looks Like a Duck…

A first attempt would be to simply remember that we have templates, and as a result we could write something like this:

1
2
3
4
5
template<class Scene>
void render(Scene const& scene)
{
    // ...
}

This would solve the problem since we could pass in any object we wanted provided it satisfied the way the scene variable is used and provides all of the required interfaces. This is fine (and certainly a valid solution), but this type of setup is more akin to duck-typing than anything else. Furthermore, notice that we just lost our interface. If we wanted to create additional scenes, we would have to read through the code of the render function to figure out what the class is supposed to do (or look at the declaration of whatever scene class we already have) and then implement it. Yes, the compiler will enforce the contract that is established by the function, but this whole affair seems very tedious and not very readable, so let’s try something else.

“Fine”, you may say, “since we want an interface, then let’s combine templates with the interface for the Scene class. That way we can still derive from it and have a generic render function”. Indeed, we could implement something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Scene
{
public:
    // Some interface here.
};

class ExampleScene : public Scene
{
public:
    // Override necessary functions here.
};

template<class Scene>
void render(Scene const& scene)
{
    // ...
}

And this does solve the issue, but the render function is still using duck-typing, which goes against what our original implementation had. What we ultimately want is to have a render function that explicitly says that it only takes scenes, not a function that requires the templated type to provide a certain interface. The return of the Scene base class is certainly a step in the right direction, so the question now becomes: can we figure out a way of setting up the Scene class so that we can write the render function to be as close as possible to the runtime polymorphism version? Unsurprisingly, the answer is yes. Yes we can.

Curiously Recurring… CRTP

What we’re ultimately trying to achieve here is a function signature for render that looks something like this:

1
2
3
4
5
template<typename T> // We still need the template so we can pass any type in.
void render(<scene-of-type-T> const& scene)
{

}

Let’s look at that “function” again. We want the render function to receive a scene of type T, which in C++ would look something like this: Scene<T>. This implies that our base class would have to be templated, and the type of T would now be the derived class… this seems like an oddly recurring pattern doesn’t it? Almost like… yes! The Curiously Recurring Template Pattern (CRTP) solves the problem quite nicely.

Note:
If you’re unfamiliar with CRTP, then I would recommend this excellent article from FluentC++.

So, how are we going to set things up to use CRTP? Well the first thing we’re going to do is to re-write the Scene base class as follows:

1
2
3
4
5
6
7
8
9
template<typename T>
class Scene : public utils::StaticBase<T, Scene>
{
public:
    constexpr Colourf trace_simple(Rayf const& ray) const
    {
        return this->self().trace_simple(ray);
    }
};

Where StaticBase is the base class for CRTP classes that was adapted from here. The trace_simple function will pass a ray along to test for intersection. If it hits something, then it will return the colour of the object that the ray hit. We can then implement our first scene like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class FirstScene : public Scene<FirstScene>
{
public:
    constexpr FirstScene() = default;

    constexpr Colourf trace_simple(Rayf const& ray) const
    {
        return background(ray);
    }

    // ...

};

And the render function can now be re-written to look like this:

1
2
3
4
5
template<typename T>
static constexpr void render(Scene<T> const& scene)
{
    // ...
}

And we’re all set! The render function now looks as close as it can get to the runtime version, and it allows us to explicitly pass in a Scene object.

Our scene is currently looking a bit empty, so let’s figure out how to add geometries.

`Sphere`s and `Plane`s

Since we already have a pattern for defining a base and derived classes, we’re going to re-use it for our shapes. For the interface, all that we’re interested in is the following: does the ray hit? If so, return the material of the shape along with the hit information (hit point, normal, etc.). If not, then return nothing.

Based on this description, we would end up with an interface like this one:

1
2
3
4
5
6
temmplate<typename T>
class Shape : public utils::StaticBase<T, Shape>
{
public:
    constexpr std::optional<ShadeRec> hit(Rayf const& ray) const;
}

Where ShadeRec is the container for the hit information. We can then derive this class to create two new shapes: Sphere and Plane. The question now is: how are we going to store them?

`visit`ing a `variant`

Let’s pretend for a second that we aren’t running everything at compile-time. If we wanted to store a list of shapes, we could have this:

1
std::vector<std::share_ptr<Shape>> shape_list;

And then add shapes as we wanted. The big advantage here is that if we added more shapes, we wouldn’t have to change anything to be able to add them to the list.

So, how do we do this at compile-time? Well, to figure this out we’re going to have to look at our list in a different way. Consider this: if we ignore for a bit the fact that we’re treating the shape classes polymorphically, we can imagine that the list is “holding” all of the different types of shapes that we have. So we can imagine that internally it holds a list that contains two entries: sphere and plane. Now obviously this analogy is completely wrong for the code that we actually wrote, but it serves quite nicely to motivate how we’re going to solve this problem. In essence, we would need two things:

Some type of list of all the shape types, and
A container that holds the list of all shape types.

With this setup, we can then add new shapes into the container, and the type will be handled by the list. So, how do we actually set this up in C++?

First things first, let’s talk about the list of all shape types. What we’re describing here is something very similar to what unions are in C. Basically we want a structure that can hold different types. When we construct it, we would assign it one of the types in the list and then use it as if it was that type. Now the problem with unions is that they aren’t type-safe and there’s no handling of constructor/desctructor or anything of the like. Fortunately, the STL has something that does: std::variant.

In order to bundle up all of the functionality attached to std::varaint, we are going to create a wrapper class for the shapes. This class will mirror the interface of the shape classes, but will also manage the code that handles how to use the std::variant itself. So, we end up with something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
template<class... Shapes>
class ShapeWrapper
{
public:
    template<typename T, typename = utils::Contains<T, Shapes...>>
    constexpr ShapeWrapper(T&& elem) : m_shape{std::forward<T>(elem)}
    {}

    constexpr std::optional<ShadeRec> hit(Rayf const& ray) const
    {
        return std::visit([ray](auto const& elem) { return elem.hit(ray); },
                          m_shape);
    }

private:
    std::variant<Shapes...> m_shape;
};

using Shapes = ShapeWrapper<Plane, Sphere>;

There’s a lot to unpack here, so we’re going to take it piece by piece.

First, let’s start with the template declaration. Notice that we’re taking a parameter pack for one very simple reason: if we hard-code the shape types directly into m_shape, every time that we wanted to add a new shape we would have to tack it on to the declaration of the variable. On the other hand, by having the parameter pack in the template declaration of the class, we can just modify the Shapes alias without having to worry about the wrapper class itself.

Next is the constructor, which does two things:

The template uses SFINAE to ensure that the type that we’re trying to store in the wrapper is one of the ones we declared the wrapper with (to prevent any misuses of the class and provide better compiler errors). This is handled through utils::Contains.
It takes an rvalue reference to the object that it will store. Note that this must be either an rvalue reference or a const& since those are the only constructors that std::variant has. For the purposes of this project, it makes more sense to take an rvalue reference since that way we explicitly transfer full ownership of the object to the wrapper.

The last piece of the puzzle is the hit function. Note that this mirrors the corresponding function in the shape class. This is on purpose, since this way we can use the wrapper as if it was an object. The interesting part is the contents of the function itself. Since std::variant holds a list of possible types, we need to be able to visit the active type so we can do something with it. Hence, we use std::visit to grab the active type held by the variant and call its hit function in turn. By necessity, the lambda takes it’s only parameter by auto since the type is itself a template.

Now that we have the wrapper, the only thing that is missing is the container for all the wrappers. The class looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
template<typename Container>
class ShapeContainer
{
public:
    constexpr ShapeContainer(Container&& list) :
        m_shapes{std::forward<Container>(list)}
    {}

    constexpr std::optional<ShadeRec> hit(Rayf const& ray) const
    {
        for (auto const& elem : m_shapes)
        {
            if (auto result = elem.hit(ray); result)
            {
                return result;
            }
        }

        return {};
    }

private:
    Container m_shapes;
};

Once again, we’re going to examine each part of the class individually, starting with the Container template. The point of this is to provide a generic class that can hold any type of container (so it could use std::array or std::vector). Other than generality, the intent is to be able to explicitly state whether the container can run at compile time. So, if we used an std::array it can be run at compile-time, whereas an std::vector can’t.

The constructor also takes an rvalue reference to the container it will hold for two reasons. First, the wrapper classes can only be constructed by rvalue reference, so the container itself must be taken by rvalue reference. The second reason is to, once again, explicitly show that we are taking ownership of the container (and all its items).

The final part is the hit function, which does basically what you’d expect: loop through every item in the container and check if the ray hits the shape. If it does, return the result. If nothing hits, return nothing.

You’ll recall that whenever the ray hits a shape, it will return hit-data, which includes a material. For the sake of simplicity, the materials will be referenced by ID (just an index) in the shape classes. The materials themselves follow the exact same pattern that the shapes do: a CRTP base, derived classes, a material wrapper with std::variant, and finally a container.

A Red Sphere

So, now that we have all of our foundations setup, let’s finally see how to create our scene and render it. We first start with creating a new Image and a new camera:

1
2
3
4
5
6
StaticImage<image_width, image_height> image;

        Camera camera{Pointf{0.0f, 0.0f, 500.0f},
                      Vectorf{0.0f},
                      Vectorf{0.0f, 1.0f, 0.0f},
                      500.0f};

Next comes the shapes and materials:

1
2
3
4
5
std::array<Shapes, 1> shapes_list{Sphere{Pointf{0.0f}, 150.0f}};
std::array<Materials, 1> materials_list{DefaultMaterial{}};
ShapeContainer<decltype(shapes_list)> shapes{std::move(shapes_list)};
MaterialContainer<decltype(materials_list)> materials{
    std::move(materials_list)};

Notice the usage of decltype to avoid having to explicitly write out the types that the containers will hold. Finally, we forward all of this to the renderer as follows:

1
Renderer::render(scene, image, shapes, materials);

The entire setup is then wrapped inside a constexpr lambda to ensure that all of the calculations happen at compile-time. The lambda then returns the final image, which leaves our main looking like this:

1
2
3
4
5
6
7
int main()
{
    auto image = render_scene_2();
    save_image("../render_test.jpg", image);

    return 0;
}

And that’s it! All that is left to do is to compile the code… and be prepared to wait for a while. Roughly 35 minutes later, the compiler will finish. If we inspect the generated assembly, we will notice that the bulk of the code is comprised of move instructions which are just assigning the different pixel values into the array, which is exactly what we wanted! All that is now left to do is run the program and we’ll be left with an image like this:

Sphere render

Conclusion

So, what did we learn in this trip?

We have to be very careful when designing constexpr functions to ensure that all called functions are themselves constexpr. If the aren’t, we have to implement them ourselves.
We can use CRTP to produce static polymorphism which can be more expressive than more generic templates.
We can use std::visit to further static polymorphism by allowing us to have generic containers that hold a collection of classes with a common interface.
The compiler is able to perform very complex operations and create a ray-traced image.
Compile-time ray-tracing is very slow (it’s at least 600000 times slower!).

This is not the end of the journey though! Join me next time where we will explore the following questions:

Why is the compilation so slow? And can we do anything about it?
Can we extend the material/shape system to create more complex scenes?
Can we add multi-sampling?

Notes

std::transform and related algorithms are declared constexpr in C++20, so it isn’t necessary for us to change the original implementation of the unary_op and binary_op functions if we enable the C++20 flag.
There is in fact a proposal P1383 that seeks to mark <cmath> functions as constexpr. Unfortunately it didn’t make the cut for C++20, but maybe in the future they will be.
The reason we had to create the wrapper/container pattern for the shapes and materials was that we couldn’t perform dynamic memory allocations in constexpr functions (and as such couldn’t pass pointers to the base classes). In C++20 it is now possible to perform these, so theoretically we should be able to implement the code normally (through the smart pointers aren’t constexpr yet, so we’d have to do all the memory management ourselves).

References

The following are the references used for this article in case you wish to read more about the discussed concepts.

Edits:
22/09/2024: Improved TOC and fixed some phrasing due to blog style changes.