In graphics programming, "frustum culling" is about determining whether an object is inside or intersect with the camera's view frustum.
In the last post, we've discuss about an efficient and precise "AABB vs. Frustum Planes" algorithm. However, it's not quite a good choice to iterate over each object and do the test.
Is there a better way? Let's find out!
The problem
In this post, we will solve the culling problem where:
There are lots of objects scatter around the scene within a reasonable boundaries and the number of them is known ahead of time.
Those objects barely move.
For example:
Tree culling for terrain object, as implemented in Polaris 3.
Stop calculating something very expensive offscreen.
The solution: cell based culling
We propose a divide & conquer algorithm, where a quadtree will be used to arrange nearby instances into "cells", their visibility state will be decided mostly by that cell.
Overall, the processing steps are:
At initialization time, we create a quadtree of "cells" fits into a predefined world boundaries. The quadtree remains unchanged during the app lifetime since those objects barely move, then we put the objects into their containing cell.
At update time, we perform AABB vs. Frustum Planes test on the quadtree, this happen pretty quickly since children cells can inherit state from their parent (children cells for sure will be culled if the parent cell is culled, etc.). In some cases, we need to do the test on an individual object.
Example project
You can download the example project we use in this post using this link.
About quadtree
Quadtree is a tree data structure where each node has exactly 0 or 4 children.
For culling problem, we use a quadtree of bounding boxes. The root node encapsulates the entire scene then recursively subdivide into 4 smaller boxes at each depth level.
We are processing 3D scene, why don't use use Octree instead?
A quadtree can be expressed in 2 ways:
Nodes and pointers.
Or, flatten to a 1D array. We prefer this way because we can easily make our code compatible with Jobs system later (multi-threading). The drawback is memory consumtion.
To store a quadtree in an array, we iterate through each depth level, and put all nodes at that depth next to each other in the array, see the images:
To make things clear, we define:
Depth: The number of "layer" in the tree, starting from 1 (the root node)
Depth Index/Layer: The index of depth layer, starting from 0
Calculate the array length
Try drawing a quadtree on paper, you can see that the number of nodes is:
Depth Index | Node count | Increment |
---|---|---|
0 | 1 | 1 |
1 | 5 | 4 |
2 | 21 | 16 |
3 | 85 | 64 |
Can you guess the formula? Not so hard:
L(di) = L(di-1) + 4^di
In code, we can calculate this with a simple loop:
Calculate the number of sibling nodes
Siblings are node at the same depth. Take a look again at the Increment column of the table above, we can clearly see that the number of siblings at depth index di is 4^di.
Precalculate array length and sibling count
Since the array length and sibling count is unchanged at each depth, we can precalculate them, and limit the maximum depth of your quadtree as follow:
Calculate elements indices
Sometimes we want to jump to the first node at a depth level, the picture below demostrate a quadtree array with indices:
Since the nodes at each depth level are places next to each other in the array, the first-node-index-at-a-depth-level is the array-length-of-previous-depth-level:
To retrieve the depth index of a node, simply look for the greatest number that less than the element index in the TREE_LENGTH array:
We also need to access the children nodes from the parent, that can be query by getting the first node index of the next layer, then add to the "local index" of the node in the current layer multiply by 4, demostrated in this animation:
How about other children nodes?
Initialize a quadtree of cells
Let's create a attach a new component to a game object in the scene, we will call it CellBasedCulling_Demo.
We'll define a field for limiting the scene size (levelSize), assign a small value first like (100,10,100) so we can move around quickly.
To make it simple, we make the quadtree depth a constant, in this example, 6.
We use Bounds struct to reprensent cells in the quadtree.
We also declare an array for storing cell cull results.
At startup, we create and recursively split the tree to the defined depth:
First, we need to create a Bounds[] array with the the length returned from our GetTreeLengthAtDepthIndex() function.
Before recursively splitting the tree, we must assign value to the first node (the root, which encapsulate the entire scene defined by levelSize).
At each depth level, we iterate through every sibling nodes, split them into 4 and write to their children.
The splitting stop once we've reach a leaf node, whose first child index is outside of the array.
Let's add some Gizmos so we can see it in the scene:
Back to the editor, enter Play mode and adjust the slider:
Spawn some objects
Let's spawn some game objects within the levelSize defined before:
It's just a very basic object spawning code, where you clone a prefab assigned via the Inspector. We store the spawned instances in an array for later.
After spawning an instance, we need to pre-calculate its leaf cell index so we can quickly get its culling state in each loop. Starting from the root cell, we look for which one from the 4 children cells that contains the object world position and repeat until we reach the leaf cell.
We also declare and initialize an array for storing instance cull results, just for later.
Back to the editor and enter Play mode, remember to assign a prefab & set the instance count:
To verify that we've calculated the correct cell indicies, let's add some visualization. We will draw a cyan box at the instance position and the cell it belongs to, using instance index:
You should see this when entering Play mode:
What if my game objects can move?
Cull the quadtree
The culling should happen every frame, Update() is somewhat perfect place to do this. You can also do this in the camera's culling event.
First we need to calculate the camera's frustum planes:
Then, we traverse through all nodes in the tree, starting from the root, and perform AABB vs. Frustum Planes tests:
You can see that we're not performing tests on every nodes, because children node can inherit culling state from their parent if:
The parent is fully VISIBLE
The parent is CULLED
That makes sense, since children nodes are fully encapsulated by their parent.
The remaining case is that the parent node is PARTIALLY_VISIBLE, we need to do the test on each children nodes to determine their actual state.
What is that CullUtils.TestFrustumAABB() function?
Let's visualize that:
Cull the instances
After culling the quadtree, it's quite simple to cull the instances. Simply iterate through each instance:
If the containing cell is VISIBLE or CULLED, then the instance culling state will inherit from that.
If the containing cell is PARTIALLY_VISIBLE, then you can perform an additional test on the instance bounding box (or just assume that it is visible anyway depending on your app).
In this example, we use the instance positon and scale as its bounding box, and deactivate the game object if it's culled.
Analyze our cell based culling technique
The main purpose of our cell based culling technique is to reduce the amount of AABB vs. Frustum Planes test performed. There is performance benefit when we have the number of tests less than the number of instances. But does it?
Let's add a counter to the TestPlaneAABB() function:
Now it will record how many times the function was call each frame and display the number in the Inspector.
In the best case, we only have 1 test occur, where the quadtree's root node is fully visible or culled:
On an average case, we have a few tests happened but the number of them usually less than the instance count:
However, in the worst case, where all the leaf nodes are partially visible (e.g: cells dimension is rarely encapsulated by the view frustum, such as very tall cells, etc), the number of test is instanceCount + quadTree.Length:
In our low poly terrain tool Polaris 3, the tree renderer does a further optimization step that shrink those cells to just encapsulate their content, so the cell dimension is as small as possible or even stripped away from the tree if it has no tree. That way, the worst case is less likely to happen.
Polaris is a complete toolset for low poly creation in Unity, it provide everything you need from optimized mesh generation, blaze fast foliage rendering, painting, spline, stamping & utilities. Learn more:
Wrap up
In this post we've learned about quadtree and how to use it to perform efficient scene culling. But it's not everything yet. In the next post, we will turn the implementation into a multithreaded one using Jobs.
Stay tuned!
留言