Research & Development

We follow the method of separating axes, optimized for the case at hand. The assumed scenario is a looping SIMD implementation, where it is often beneficial to perform the full calculation for each test unconditionally. In other situations, it may be preferable to early-out as soon as a separating axis is found. Also, the usual approach of preceding the test with a quick and conservative one to prune out easily classified cases, using bounding spheres for example, applies.

Read: obb vs obb.pdf

Reposted from Andreas’s blog:

At Insomniac Games we’re currently doing a performance push which means lots of opportunity to look at compiler output. In this post I want to share a story on fighting a particular compiler optimization that was causing me problems.

One of my areas has been the decal system which is very heavy on 3d data processing — a perfect system for SIMD work. While working on one of the bigger compute loops I noticed something odd about the generated PPC code. All of the actual computation work was being done on the vector unit (as it should!), but there was a lot of stack traffic originating from the integer unit. The compiler was overcommitting integer registers it seemed, in what was essentially a pure-SIMD loop. Puzzled, I dug in and tried to work out what was going on.

Let’s look at the code. The loop I was working on processes 4 triangles at a time and has a structure like this:

Vertex* input = ...;
Vertex* output = ...;
int count = ...;

for (int i = 0; i < count; i += 12)
  VecSimd t1a = SimdLoad(&input[0].m_Position);
  VecSimd t1b = SimdLoad(&input[1].m_Position);
  VecSimd t1c = SimdLoad(&input[2].m_Position);
  // ... 9 more loads
  VecSimd n1a = SimdLoad(&input[0].m_Normal);
  VecSimd n1b = SimdLoad(&input[1].m_Normal);
  VecSimd n1c = SimdLoad(&input[2].m_Normal);
  // ... 9 more loads
  // (crunching)

  SimdStore(out0, &output[0].m_Position);
  SimdStore(out1, &output[1].m_Position);
  SimdStore(out2, &output[2].m_Position);
  // lots more stores

  // increment input and output pointers
  input += 12;
  output += 12;

What I found was that the vector loads were all being done from different (integer) registers and they were all being incremented individually by the compiler where I was expecting a single base register and 12 offsets. Because there are so many loads and stores going on, the compiler would run out of registers and start spilling them to the stack, generating load-hit-stores and memory penalties all over the loop!

Mike Day offered the following insight:

… in the case where there’s a sensible number of pointers – small enough 
not to incur spilling onto the stack – the ‘create n pointers’ approach 
is sensible for the compiler as long as it follows through by doing all 
the loads using the indexed addressing form of the load instruction. 


The benefit is that instead of incrementing n pointers per pass 
(or one pointer n times), it only needs to increment a single offset by 
n times the stride, saving n-1 adds.

It turns out the optimizers for both our PPC compilers were so keen on using the indexed vector load instruction with a single increment that they would blindly overcommit the integer register pool to achieve that particular instruction selection.

The question then became–how do we force the optimizer to stop doing it? The optimizer can see that the input and output pointers are being moved consistently by a certain stride and any attempt on my part to use variations in the indexing expressions just ended up generating slightly different variations of the same bad behavior.

So let’s look at what we’d want if we were writing the loop ourselves in assembly. In this case a better code generation would be to have a single input register and increment it after each SIMD load, using just one register. The key to accomplish this code generation is in C++ to invalidate the optimizer’s assumptions about the pointers and strides so it can’t create an array of independent pointers. But how do we do that? After all, everything here is using linear memory accesses with a stride that’s well known at compile time, enabling the unwanted optimization.

Time for some trickery! The loop structure I ended up using looks like this:

static volatile uintptr_t s_secret_sauce = uintptr_t(ptrdiff_t(-1));
static volatile uint32_t  s_stride       = sizeof(Vertex);

// load our constants in from memory into registers
const uintptr_t secret_sauce             = s_secret_sauce;
const uint32_t  stride                   = s_stride;

Vertex* input = ...;
Vertex* output = ...;
int count = ...;

for (int i = 0; i < count; i += 12)
  // establish base pointer for all loads in the loop
  uintptr_t base = (uintptr_t(input) & secret_sauce) +
                   offsetof(Vertex, m_Position);

  // load and bump base pointer with stride
  VecSimd t1a = SimdLoad((void*)base); base += stride;
  VecSimd t1b = SimdLoad((void*)base); base += stride;
  VecSimd t1c = SimdLoad((void*)base); base += stride;

  // ... more loads
  // update input pointer for next loop iteration
  input = (Vertex*) base;

  // ... rest of loop  

  // ... stores handled similarily

There are two key things going on in this code that breaks the optimization pattern:

  • We’re loading stride and secret_sauce from memory before the loop, thereby preventing any compile-time knowledge of them.
  • We’re breaking any loop-wide analysis of the input and output pointers by masking with a value that is unknown to the optimizer, forcing it to rely on the sequence of statements we’ve laid out exactly.

This generated the desired instruction selection and removed all stack spills from the loop. In one particular case I was using as a performance test case this saved over 30k instructions over the loop lifetime, a significant chunk of work. It also removed several hundred load-hit-store penalties.

This does generate an additional and instruction which is an artifact of the technique, but compared to the much worse stack spilling code this the way better deal. The and could also be replaced with an add of zero or something similar, but it will just amount to the same 1 or 2-cycle overhead in the end.

This has to have been the first time I’ve used static volatile variables to improve performance. Hopefully it’s useful to someone else encountering this behavior!

Ron Pieket wrote a bit more about our game tools webapp architecture on his blog. Reprinted here:

A Client/Server Tools Architecture

I mentioned Insomniac’s client/server tools architecture during my GDC talk earlier this year, and this topic has generated considerable interest. This article gives a very high level overview of the system as it is currently implemented.

All interactive productivity tools developed over the last two years at Insomniac Games have been built on this architecture. The basic idea is that the edited document is not kept in the memory space of the editor application itself, but rather each individual modification is transmitted to a server application, running on the same machine as the editor. The server maintains the authoritative document. Any number of documents may be open at any time. Any number of editors may communicate with the same local server application. The server provides various document related services.

There are several benefits from this architecture:

  • Crash Proofing

    The server application is comparatively simple, and matures early on. Editors are in constant development and may therefore be unstable. But since your work is managed by the server, up to the very last edit, it will survive a crash. Just restart the editor application and all your changes will still be there. Even the undo queue will survive.

  • Multiple Views

    LunaServer allows multiple editors to display/edit the same document. Changes that are made in one will immediately appear in the others.

  • “Free” Undo/Redo

    Undo/redo is handled by the server. All editors (clients) get undo/redo without a single line of code.

  • “Free” Load/Save/Revert

    All disk operations are handled by the server. All editors get this for free.

  • “Free” Perforce Integration

    Perforce integration is handled by the server. All editors get this for free.

  • Consistent Behavior

    Because undo/redo, file operations and Perforce integration are handled by the server, the user is presented with a very consistent interface



At the heart of the system is LunaServer. This is a server application running on the user’s own machine. No network connection is required. Although LunaServer uses network protocols for communication with the editors, and it is quite capable of running remotely, we have not found it necessary to do so. It was never intended to be used over a network. Every user machine runs its own independent LunaServer.

LunaServer implements a RESTful HTTP protocol.

Any editor that we write at Insomniac Games runs as a LunaServer client. We currently have implemented a world editor, an animation set editor, a visual script editor, an effect editor, a material editor, and several more. Any combination of editors and any number of instances of the same editor may be running at the same time. The same document may be opened in any number of editors. Most editors are written in JavaScript, one is written in C++, and one is written in Flash.



LunaServer acts as a JSON document manager. All assets are JSON documents. All editors “speak” JSON, even the C++ and Flash based editors. Even though the C++ application maintains a version of the document in binary form, this is considered a cache for 3D rendering and mouse interaction. The JSON document is the authoritative version, and it is always synchronized immediately with any changes made in the binary version.

Delta JSON


Changes are transmitted between LunaServer and the clients in what we call a “delta JSON” format.

Say, LunaServer contains a document representing an asset named DreamCar. The document might look like this:


Now imagine that the user is using one of the editors to change the “color” property. The editor would update its local copy of the document, and transmit the following delta JSON:


In order to transmit this change, the delta JSON itself is wrapped into a “change document” like this:

  "deltaJSON": {

When this is received by LunaServer, it knows to update the “color” property of the DreamCar document with the new value.

New properties can be added and existing properties can be removed. We have adopted the convention that a null value in a delta JSON object means “remove”. (As a consequence, LunaServer documents cannot contain null values, only delta JSON can)

For example, if an editor needs to send a change to LunaServer to add a new property “price”, and remove property “topSpeed”, it would make those changes locally, and send the following change document to LunaServer:

  "deltaJSON": {

After processing both change documents, DreamCar will read the same on the client side and in LunaServer:


Property “model” was unchanged, “topSpeed” was removed, “color” was modified, and “price” was added.

Synchronizing Clients


As I mentioned, multiple editors may be running at the same time. They may even display and edit the same document. In addition to transmitting changes as delta JSON to LunaServer, each client must poll for document changes from LunaServer. If any editor changes a document, the delta JSON that is received by LunaServer is also transmitted to others, as they poll for changes.

(LunaServer implements a RESTful protocol, and therefore cannot push changes to the clients. So clients must poll)



LunaServer itself is backed by MongoDB. This was a natural choice. JavaScript, JSON, and MongoDB work very well together.



Asset documents are shared between team members through Perforce, and Perforce maintains a document version history. LunaServer will interact with Perforce when an editor attempts to modify an asset document that is not currently writable. LunaServer manages load, save, revert, check-in, check-out, and other operations. It will prompt the user for file names and confirmations as needed. LunaServer does all of this automatically, relieving all editors from these responsibilities. All the editor code needs to do is send and receive asset changes in delta JSON format.

LunaTracker and files on disk


Perforce does not interact directly with LunaServer or MongoDB. Instead, it synchronizes files on the user’s hard drive. Every document in LunaServer’s database has a counterpart on disk. Synchronization of the database with changes on disk is the responsibility of LunaTracker. LunaTracker is a service that watches the asset document folders, and updates LunaServer when a new version of the file is detected.


Before LunaServer applies a delta JSON to its copy of the document, it will compute an inverse delta JSON.

If you recall, our DreamCar document started out looking like this:


And our first change was to set the “color” to “red”:

  "deltaJSON": {

LunaServer will store this change document in its redo queue, and transmit it to clients that poll for changes. It will also store the inverse change in the undo queue. The inverse change document will look like this:

  "deltaJSON": {

When told to perform an undo operation, LunaServer will simply process the last document in the undo queue, and the car is black again.



If multiple editors are open at once, their undo queues must kept separate from each other. When an editor is launched, LunaServer creates a unique session ID and a session document. Undo records are organized by session ID. The session document contains various bookkeeping data such as the list of documents currently opened by the editor instance, object selection, and other settings.

Multiple editor instances may make changes to the same document. Changes that are made in one instance are transmitted to all. But undo for any particular change is only available from the instance or session where it originated.

In some cases (actually only one), what appears as a single editor to the user is in fact implemented as two (or more) separate executables. This is the case with the world editor. The 3D view is a C++ application, and the supporting 2D user interface is a separate application written in JavaScript and HTML 5. In such a case, multiple editors may share the same session ID and session document. Their undo records and selections are shared and synchronized.

Session document changes are handled in the same manner as asset document changes. They are transmitted, received and processed in the same delta JSON format. The target will be the session document. This makes changes in selection, opening and closing of documents undoable operations.



Although the architecture allows for multiple simultaneous editors, it is designed for a single user on a single machine. There is therefore no need to arbitrate multiple simultaneous changes, and there is no provision for it. Although multiple editors may have the same document open, the single user can operate only one at a time. If somehow simultaneous changes are made to the same property in the same document, in two different editor instances, some changes may be lost.



The Insomniac LunaServer architecture was started about two years ago. It seemed like a good idea at the time. I don’t think that anyone involved realized back then, how good the idea really was. The server side document storage, the simple synchronization between instances, the centralized undo/redo system and disk access have made our tools more robust, more flexible, easier to use, and their development much simpler.

In game software development, we face a unique challenge: the program that we are developing is itself an important tool in our production pipeline, even in its unfinished state. Our artists and designers rely on it to see their work. We should expect it to be broken, a good deal of the time. How we deal with the daily imperfections of this vital tool is an important factor in successful development.

In this talk I discuss three different approaches to making this expected breakage less of an issue: reducing data/code dependencies in a game data friendly manner making assertion failures less intrusive and more effective, and using a client/server architecture to prevent loss of work when our custom productivity tools crash

For source slides and other related articles see: Developing Imperfect Software: The Movie at

To better empower our developers, Insomniac has emphasized rapid iteration in our tools. With that goal in mind, we have developed a system to build assets that is both quick and effortless to use. To minimize downtime, the system was constructed so that crashes and mistakes within assets would have as localized an effect as possible. During the development of the build system, we’ve found that just as important as build speed is clearly communicating when errors occur and what can be done to fix those problems. This presentation covers the evolution of the build system and explains some of the design choices that were made.


Fluids Techniques
Speaker: Jim Van Verth

This talk was part of the all-day Physics for Game Programmers tutorial. In it was discussed three different methods for handling fluids in games. First, the Navier-Stokes equations for fluid dynamics were presented and broken down into its component parts. Then two methods that use Navier-Stokes as a basis were shown. The first is a grid-based system such as that used in Little Big Planet. The second is a particle-based system. The final method shown is not based directly on Navier-Stokes, but approximates the surface of water, and that’s the R2O system created at Insomniac by Mike Day.

Background assumed: Multivariable calculus would help (particularly calculus of vector fields), but it’s not necessary. Similarly an understanding of Fourier transforms and frequency space would help as well


Understanding Rotations
Speaker: Jim Van Verth

This talk was part of the all-day Math for Game Programmers tutorial. It covers various rotation formats in both 2D and 3D, discussing their benefits and disadvantages. The formats that were covered: 2D angles, Euler angles, axis-angle, matrices, complex numbers and quaternions. For each format, their memory footprint vs. degrees of freedom was considered (matrices take up more space than angles, for example), how easy it is to rotate with them, how easy to concatenate, and how suitable they are for interpolation. There’s also some introduction to how matrices encapsulate transformations, and how quaternions are just an extension of complex numbers.

Background assumed: Vectors, matrices and linear interpolation.


This was part of the full-day Math Tutorial on Monday. To a degree, it turned into a bit of an anti-math talk (appropriate slide inserted at the last second during the day) in the sense that most of what I was discussing (and the real philosophical point behind DOD) is that a real understanding of your data and your context is necessary for engineering a correct solution. As opposed to a totally generalized and abstracted “solution”. You do not need to solve problems you don’t have, and there is no way to know what problems you have without understanding your data. That said, I did try to specifically make the point that it’s also not necessary to over-constrain your problem to a single set of examples, but rather to just be aware of the finite nature of both the range of data you’re working with and the (hardware/software stack) context you’re working within. Some of you may also recognize a few of the later slides from my earlier “Typical C++ Bullshit” presentation although the overall presentation came from a different direction.


What You Don’t Know IS Hurting You: How Aggressive User Research Improved Resistance 3
Speakers: Drew Murray (Creative Director, Insomniac Games)
Track: Game Design
Format: 60-Minute Lecture

After participating in formal usability tests near the end of Resistance 2 production, Insomniac Games began a fundamental shift in how we make games. While Insomniac has always been focused on making great games – and has a long history of doing so – we’ve relied largely on our gut-feelings of what’s going to be fun, utilizing limited playtesting near the end of production to find and fix major problems. Formal usability testing on Resistance 2 showed us that there was a surprising and often painful gap between how we, the developers, played and understood the game and how typical gamers did. Based on the difficulties we saw playing having in our Resistance 2 usability tests, Insomniac as a company decided to move beyond paying lip-service to user-research and fully committed to systematically integrating it throughout the Resistance 3 development cycle, not just near the end.

What Testing Methods Worked? We used many types of user-research over the course of Resistance 3 production – external RITE usability tests, weekly company-wide playtests, informal one-on-one internal playtests, large-scale external playtests, and other methods. We’ll discuss the pros and cons of the different methods we used, the different information we were able to get from different kinds of testing, and the realities of financial and time costs of each.

Overcoming Culture Shock. There was more than a small amount of opposition on the team to the amount of user-research that we did on Resistance 3. User-research can put a tremendous burden on all members of the team, whether it’s running the tests, gathering and analyzing data, quickly iterating to respond to feedback, or just carving out an hour or two a week to actually play the game! We’ll discuss our experiences on Resistance 3, ways to get your team excited about doing user-research on your game, and how to handle common team complaints.

Did It Work? As we ship Resistance 3 (on the day this talk is being submitted!), we’ll examine whether we accomplished our goal with use-research for the game and some of the specific – and sometimes very significant – changes that we made to the game based on the findings. We’ll also talk about some of the things that didn’t work as well as we’d hoped with our testing methods for Resistance 3, and what we plan to do differently in the future.

Attendees of this session will get an insider’s view to the sometimes-bumpy road that Insomniac Games has traveled over the past three-and-a-half years as we’ve revamped our game-development process to focus on frequent and consistent user-research. Different user-research methods will be illustrated by specific examples from Resistance 3, such as videos of the RITE testing we did for core controls and aim-assist, and graphs created to track how the rating-scores assigned by playtesters for different levels and features trended over production. Attendees will also learn about common pitfalls and how to overcome them, and how to convince a team that user-research is an invaluable tool for improving your game.