Andreas Fredriksson: The Register Pressure Twilight Zone

Posted on

Reposted from Andreas’s blog:

At Insomniac Games we’re currently doing a performance push which means lots of opportunity to look at compiler output. In this post I want to share a story on fighting a particular compiler optimization that was causing me problems.

One of my areas has been the decal system which is very heavy on 3d data processing — a perfect system for SIMD work. While working on one of the bigger compute loops I noticed something odd about the generated PPC code. All of the actual computation work was being done on the vector unit (as it should!), but there was a lot of stack traffic originating from the integer unit. The compiler was overcommitting integer registers it seemed, in what was essentially a pure-SIMD loop. Puzzled, I dug in and tried to work out what was going on.

Let’s look at the code. The loop I was working on processes 4 triangles at a time and has a structure like this:

Vertex* input = ...;
Vertex* output = ...;
int count = ...;

for (int i = 0; i < count; i += 12)
{
  VecSimd t1a = SimdLoad(&input[0].m_Position);
  VecSimd t1b = SimdLoad(&input[1].m_Position);
  VecSimd t1c = SimdLoad(&input[2].m_Position);
  // ... 9 more loads
  VecSimd n1a = SimdLoad(&input[0].m_Normal);
  VecSimd n1b = SimdLoad(&input[1].m_Normal);
  VecSimd n1c = SimdLoad(&input[2].m_Normal);
  // ... 9 more loads

  // (crunching)

  SimdStore(out0, &output[0].m_Position);
  SimdStore(out1, &output[1].m_Position);
  SimdStore(out2, &output[2].m_Position);
  // lots more stores

  // increment input and output pointers
  input += 12;
  output += 12;
}

What I found was that the vector loads were all being done from different (integer) registers and they were all being incremented individually by the compiler where I was expecting a single base register and 12 offsets. Because there are so many loads and stores going on, the compiler would run out of registers and start spilling them to the stack, generating load-hit-stores and memory penalties all over the loop!

Mike Day offered the following insight:

… in the case where there’s a sensible number of pointers – small enough
not to incur spilling onto the stack – the ‘create n pointers’ approach
is sensible for the compiler as long as it follows through by doing all
the loads using the indexed addressing form of the load instruction. 

[...] 

The benefit is that instead of incrementing n pointers per pass
(or one pointer n times), it only needs to increment a single offset by
n times the stride, saving n-1 adds.

It turns out the optimizers for both our PPC compilers were so keen on using the indexed vector load instruction with a single increment that they would blindly overcommit the integer register pool to achieve that particular instruction selection.

The question then became–how do we force the optimizer to stop doing it? The optimizer can see that the input and output pointers are being moved consistently by a certain stride and any attempt on my part to use variations in the indexing expressions just ended up generating slightly different variations of the same bad behavior.

So let’s look at what we’d want if we were writing the loop ourselves in assembly. In this case a better code generation would be to have a single input register and increment it after each SIMD load, using just one register. The key to accomplish this code generation is in C++ to invalidate the optimizer’s assumptions about the pointers and strides so it can’t create an array of independent pointers. But how do we do that? After all, everything here is using linear memory accesses with a stride that’s well known at compile time, enabling the unwanted optimization.

Time for some trickery! The loop structure I ended up using looks like this:

static volatile uintptr_t s_secret_sauce = uintptr_t(ptrdiff_t(-1));
static volatile uint32_t  s_stride       = sizeof(Vertex);

// load our constants in from memory into registers
const uintptr_t secret_sauce             = s_secret_sauce;
const uint32_t  stride                   = s_stride;

Vertex* input = ...;
Vertex* output = ...;
int count = ...;

for (int i = 0; i < count; i += 12)
{
  // establish base pointer for all loads in the loop
  uintptr_t base = (uintptr_t(input) & secret_sauce) +
                   offsetof(Vertex, m_Position);

  // load and bump base pointer with stride
  VecSimd t1a = SimdLoad((void*)base); base += stride;
  VecSimd t1b = SimdLoad((void*)base); base += stride;
  VecSimd t1c = SimdLoad((void*)base); base += stride;

  // ... more loads

  // update input pointer for next loop iteration
  input = (Vertex*) base;

  // ... rest of loop  

  // ... stores handled similarily
}

There are two key things going on in this code that breaks the optimization pattern:

  • We’re loading stride and secret_sauce from memory before the loop, thereby preventing any compile-time knowledge of them.
  • We’re breaking any loop-wide analysis of the input and output pointers by masking with a value that is unknown to the optimizer, forcing it to rely on the sequence of statements we’ve laid out exactly.

This generated the desired instruction selection and removed all stack spills from the loop. In one particular case I was using as a performance test case this saved over 30k instructions over the loop lifetime, a significant chunk of work. It also removed several hundred load-hit-store penalties.

This does generate an additional and instruction which is an artifact of the technique, but compared to the much worse stack spilling code this the way better deal. The and could also be replaced with an add of zero or something similar, but it will just amount to the same 1 or 2-cycle overhead in the end.

This has to have been the first time I’ve used static volatile variables to improve performance. Hopefully it’s useful to someone else encountering this behavior!

1 Comment
 
+1

Episode 62 – Outernauts Announcement

Posted on

Join Community Lead James Stevenson, Chief Creative Officer Brian Hastings, Designer Rowan Belden-Clifford and Associate Community Manager Brandon Winfrey as they discuss Insomniac’s latest game, Outernauts. This show discusses the process of making the game, along with the challenges and surprises along the way. We also discuss our favorite beasts and parts of the game. It’s the first in-depth chance to hear directly from the development team on Insomniac’s first ever game on Facebook!

Play

Tagged , , , , , , , , , , , , , , , , , , , , | 1 Comment
 
+1

Insomniac Games and EA Explore the Vast Reaches of Facebook With Outernauts

Posted on

Insomniac Games and EA Explore the Vast Reaches of Facebook With Outernauts

Award-Winning Console Game Developer Aims To Tame The Galaxy With Its Social Gaming Debut

REDWOOD CITY, Calif.—(May 9, 2012)— Have you ever dreamed of a career in cosmic exploration? Today Electronic Arts Inc. (NASDAQ: EA) and independent video game studio Insomniac Games announced the first details of Outernauts™, a completely new intellectual property and the indie developer’s first game tailored for Facebook. Combining Insomniac’s immersive storytelling with a unique art style and sense of wit, Outernauts is an adventure role-playing game that casts players as members of United Earth’s elite Outernaut force. The Outernauts are charged with capturing and training exotic alien beasts as they uncover the riddle behind mysterious “ancients” while battling pirates and evil corporations seeking to control the galaxy.  Players will explore planets, harvest loot, and fight asynchronously alongside or against friends to master a wild, untamed universe.  Outernauts marks the first foray into the social gaming space for Insomniac Games, creators of the Spyro the Dragon™Ratchet & Clank™, Resistance™, and Overstrike™ franchises.

“We see a huge opportunity to reach an entirely different audience of gamers through Facebook,” said Ted Price, President and Founder of Insomniac Games.  “As we have demonstrated for nearly twenty years in the console games space, we’re confident we can help evolve the definition of a game experience on Facebook. With Outernauts, we are delivering a deep story with real RPG strategy, coupled with Insomniac’s signature sense of humor.”

Outernauts is currently in a closed beta testing phase and will launch on Facebook this summer. To learn more about Outernauts before launch, visit the game’s Facebook page at www.facebook.com/outernauts.

 

About Insomniac Games

Insomniac Games is an independent videogames developer that has released award-winning hits exclusively for PlayStation consoles for 18-plus years. In 2009, it announced a partnership with EA Partners to release its first multiplatform game, Overstrike.  The studio has created world-famous game franchises such as Spyro the Dragon, Ratchet & Clank and Resistance, resulting in more than 38 million games sold globally. Insomniac is also known for its collaborative workplace culture, having earned 12 local, regional and national “best places to work” honors since 2004. In January 2009, Insomniac opened a Durham, N.C. studio. Additional information can be found on both Insomniac studio locations at http://www.insomniacgames.com.

About Electronic Arts

Electronic Arts (NASDAQ:EA) is a global leader in digital interactive entertainment. The Company’s game franchises are offered as both packaged goods products and online services delivered through Internet-connected consoles, personal computers, mobile phones and tablets. EA has more than 100 million registered players and operates in 75 countries. In fiscal year 2012, EA posted GAAP net revenue of $4.1 billion. Headquartered in Redwood City, California, EA is recognized for critically acclaimed, high-quality blockbuster franchises such as The Sims™, Madden NFL, FIFA Soccer, Need for Speed™, Battlefield™, and Mass Effect™. More information about EA is available at http://info.ea.com.

The Sims and Need for Speed are trademarks of Electronic Arts Inc. Mass Effect is a trademark of EA International (Studio and Publishing) Ltd. John Madden, NFL and FIFA are the property of their respective owners and used with permission.  EA and the EA logo are trademarks of Electronic Arts Inc. Ratchet and Clank, Resistance and Outernauts are trademarks of Insomniac Games, Inc.  Facebook is a registered trademark of Facebook, Inc.

3 Comments
 
+3

Ron Pieket: A Client/Server Tools Architecture

Posted on

Ron Pieket wrote a bit more about our game tools webapp architecture on his blog. Reprinted here:

A Client/Server Tools Architecture

I mentioned Insomniac’s client/server tools architecture during my GDC talk earlier this year, and this topic has generated considerable interest. This article gives a very high level overview of the system as it is currently implemented.

All interactive productivity tools developed over the last two years at Insomniac Games have been built on this architecture. The basic idea is that the edited document is not kept in the memory space of the editor application itself, but rather each individual modification is transmitted to a server application, running on the same machine as the editor. The server maintains the authoritative document. Any number of documents may be open at any time. Any number of editors may communicate with the same local server application. The server provides various document related services.

There are several benefits from this architecture:

  • Crash Proofing

    The server application is comparatively simple, and matures early on. Editors are in constant development and may therefore be unstable. But since your work is managed by the server, up to the very last edit, it will survive a crash. Just restart the editor application and all your changes will still be there. Even the undo queue will survive.

  • Multiple Views

    LunaServer allows multiple editors to display/edit the same document. Changes that are made in one will immediately appear in the others.

  • “Free” Undo/Redo

    Undo/redo is handled by the server. All editors (clients) get undo/redo without a single line of code.

  • “Free” Load/Save/Revert

    All disk operations are handled by the server. All editors get this for free.

  • “Free” Perforce Integration

    Perforce integration is handled by the server. All editors get this for free.

  • Consistent Behavior

    Because undo/redo, file operations and Perforce integration are handled by the server, the user is presented with a very consistent interface

LunaServer

 

At the heart of the system is LunaServer. This is a server application running on the user’s own machine. No network connection is required. Although LunaServer uses network protocols for communication with the editors, and it is quite capable of running remotely, we have not found it necessary to do so. It was never intended to be used over a network. Every user machine runs its own independent LunaServer.

LunaServer implements a RESTful HTTP protocol.

Any editor that we write at Insomniac Games runs as a LunaServer client. We currently have implemented a world editor, an animation set editor, a visual script editor, an effect editor, a material editor, and several more. Any combination of editors and any number of instances of the same editor may be running at the same time. The same document may be opened in any number of editors. Most editors are written in JavaScript, one is written in C++, and one is written in Flash.

JSON

 

LunaServer acts as a JSON document manager. All assets are JSON documents. All editors “speak” JSON, even the C++ and Flash based editors. Even though the C++ application maintains a version of the document in binary form, this is considered a cache for 3D rendering and mouse interaction. The JSON document is the authoritative version, and it is always synchronized immediately with any changes made in the binary version.

Delta JSON

 

Changes are transmitted between LunaServer and the clients in what we call a “delta JSON” format.

Say, LunaServer contains a document representing an asset named DreamCar. The document might look like this:

{
  "model":"veyron.dae",
  "color":"black",
  "topSpeed":267
}

Now imagine that the user is using one of the editors to change the “color” property. The editor would update its local copy of the document, and transmit the following delta JSON:

{
  "color":"red"
}

In order to transmit this change, the delta JSON itself is wrapped into a “change document” like this:

{
  "targetDocument":"DreamCar",
  "deltaJSON": {
    "color":"red"
  }
}

When this is received by LunaServer, it knows to update the “color” property of the DreamCar document with the new value.

New properties can be added and existing properties can be removed. We have adopted the convention that a null value in a delta JSON object means “remove”. (As a consequence, LunaServer documents cannot contain null values, only delta JSON can)

For example, if an editor needs to send a change to LunaServer to add a new property “price”, and remove property “topSpeed”, it would make those changes locally, and send the following change document to LunaServer:

{
  "targetDocument":"DreamCar",
  "deltaJSON": {
    "topSpeed":null,
    "price":1.7e6
  }
}

After processing both change documents, DreamCar will read the same on the client side and in LunaServer:

{
  "model":"veyron.dae",
  "color":"red",
  "price":1.7e6
}

Property “model” was unchanged, “topSpeed” was removed, “color” was modified, and “price” was added.

Synchronizing Clients

 

As I mentioned, multiple editors may be running at the same time. They may even display and edit the same document. In addition to transmitting changes as delta JSON to LunaServer, each client must poll for document changes from LunaServer. If any editor changes a document, the delta JSON that is received by LunaServer is also transmitted to others, as they poll for changes.

(LunaServer implements a RESTful protocol, and therefore cannot push changes to the clients. So clients must poll)

MongoDB

 

LunaServer itself is backed by MongoDB. This was a natural choice. JavaScript, JSON, and MongoDB work very well together.

Perforce

 

Asset documents are shared between team members through Perforce, and Perforce maintains a document version history. LunaServer will interact with Perforce when an editor attempts to modify an asset document that is not currently writable. LunaServer manages load, save, revert, check-in, check-out, and other operations. It will prompt the user for file names and confirmations as needed. LunaServer does all of this automatically, relieving all editors from these responsibilities. All the editor code needs to do is send and receive asset changes in delta JSON format.

LunaTracker and files on disk

 

Perforce does not interact directly with LunaServer or MongoDB. Instead, it synchronizes files on the user’s hard drive. Every document in LunaServer’s database has a counterpart on disk. Synchronization of the database with changes on disk is the responsibility of LunaTracker. LunaTracker is a service that watches the asset document folders, and updates LunaServer when a new version of the file is detected.

Undo/Redo

Before LunaServer applies a delta JSON to its copy of the document, it will compute an inverse delta JSON.

If you recall, our DreamCar document started out looking like this:

{
  "model":"veyron.dae",
  "color":"black",
  "topSpeed":267
}

And our first change was to set the “color” to “red”:

{
  "targetDocument":"DreamCar",
  "deltaJSON": {
    "color":"red"
  }
}

LunaServer will store this change document in its redo queue, and transmit it to clients that poll for changes. It will also store the inverse change in the undo queue. The inverse change document will look like this:

{
  "targetDocument":"DreamCar",
  "deltaJSON": {
    "color":"black"
  }
}

When told to perform an undo operation, LunaServer will simply process the last document in the undo queue, and the car is black again.

Sessions

 

If multiple editors are open at once, their undo queues must kept separate from each other. When an editor is launched, LunaServer creates a unique session ID and a session document. Undo records are organized by session ID. The session document contains various bookkeeping data such as the list of documents currently opened by the editor instance, object selection, and other settings.

Multiple editor instances may make changes to the same document. Changes that are made in one instance are transmitted to all. But undo for any particular change is only available from the instance or session where it originated.

In some cases (actually only one), what appears as a single editor to the user is in fact implemented as two (or more) separate executables. This is the case with the world editor. The 3D view is a C++ application, and the supporting 2D user interface is a separate application written in JavaScript and HTML 5. In such a case, multiple editors may share the same session ID and session document. Their undo records and selections are shared and synchronized.

Session document changes are handled in the same manner as asset document changes. They are transmitted, received and processed in the same delta JSON format. The target will be the session document. This makes changes in selection, opening and closing of documents undoable operations.

Limitations

 

Although the architecture allows for multiple simultaneous editors, it is designed for a single user on a single machine. There is therefore no need to arbitrate multiple simultaneous changes, and there is no provision for it. Although multiple editors may have the same document open, the single user can operate only one at a time. If somehow simultaneous changes are made to the same property in the same document, in two different editor instances, some changes may be lost.

Conclusion

 

The Insomniac LunaServer architecture was started about two years ago. It seemed like a good idea at the time. I don’t think that anyone involved realized back then, how good the idea really was. The server side document storage, the simple synchronization between instances, the centralized undo/redo system and disk access have made our tools more robust, more flexible, easier to use, and their development much simpler.

Leave a comment
 
+0

Insomniac Games and Epic Games at Muck Ruckus!

Posted on

Teams from Insomniac Games and Epic Games are gearing up for some friendly competition at this year’s Muck Ruckus charity event, raising money and awareness for the National Multiple Sclerosis Society.

Muck Ruckus MS Carolinas is a 5-mile, military-style run with obstacles that have been surrounded by or consist entirely of mucky fun. People cheer as teams of contestants slip, slide and slosh their way to the finish line.

The pictures from last year show just how messy the event can get!

Insomniac's Muck Ruckus Team

The team from Insomniac so far is Team Captain Jason Anderson, Aaron Butler, Duncan More and William Parmenter. The team is still growing and recruiting!

The Insomniac team is preparing by doing a 5k Benefit Walk in May as well as the Global Corporate Challenge.

The team from Epic Games is currently in training for the event. They are working on their core strength and cardio. Team Captains Mike Caps and Prince Arrington are at the helm to be sure the participants are ready for the event.

Along with Mike and Prince, the team from Epic also includes Julianne Capps, Rod Fergusson, Peter Hayes, Wes Hunt, Tanya Jessen, Tim Johnson, Aaron Jones, Kevin Lanning, Eric Newman and Ben Shafer.

Both teams are veterans of the event, having also participated last year. The teams each have a goal set for $5,000.00. Lets help them achieve it!

If you would like to donate to Team Insomniac, visit their Muck Ruckus page. If you would like to donate to Team Epic, you can visit their Muck Ruckus page. You can donate to either of the teams or individual team members.

Leave a comment
 
+0

#GDC12 Ron Pieket: Developing Imperfect Software: The Movie

Posted on

In game software development, we face a unique challenge: the program that we are developing is itself an important tool in our production pipeline, even in its unfinished state. Our artists and designers rely on it to see their work. We should expect it to be broken, a good deal of the time. How we deal with the daily imperfections of this vital tool is an important factor in successful development.

In this talk I discuss three different approaches to making this expected breakage less of an issue: reducing data/code dependencies in a game data friendly manner making assertion failures less intrusive and more effective, and using a client/server architecture to prevent loss of work when our custom productivity tools crash

For source slides and other related articles see: Developing Imperfect Software: The Movie at itshouldjustworktm.com

Leave a comment
 
+0

Bob Sprentall: Asset build management

Posted on

To better empower our developers, Insomniac has emphasized rapid iteration in our tools. With that goal in mind, we have developed a system to build assets that is both quick and effortless to use. To minimize downtime, the system was constructed so that crashes and mistakes within assets would have as localized an effect as possible. During the development of the build system, we’ve found that just as important as build speed is clearly communicating when errors occur and what can be done to fix those problems. This presentation covers the evolution of the build system and explains some of the design choices that were made.

BuildManager.pdf

Leave a comment
 
+0

#GDC12 Jim Van Verth: Fluids Techniques

Posted on

Fluids Techniques
Speaker: Jim Van Verth

This talk was part of the all-day Physics for Game Programmers tutorial. In it was discussed three different methods for handling fluids in games. First, the Navier-Stokes equations for fluid dynamics were presented and broken down into its component parts. Then two methods that use Navier-Stokes as a basis were shown. The first is a grid-based system such as that used in Little Big Planet. The second is a particle-based system. The final method shown is not based directly on Navier-Stokes, but approximates the surface of water, and that’s the R2O system created at Insomniac by Mike Day.

Background assumed: Multivariable calculus would help (particularly calculus of vector fields), but it’s not necessary. Similarly an understanding of Fourier transforms and frequency space would help as well

GDC2012_JMV_Fluids.pdf

Leave a comment
 
+0

#GDC12 Jim Van Verth: Understanding Rotations

Posted on

Understanding Rotations
Speaker: Jim Van Verth

This talk was part of the all-day Math for Game Programmers tutorial. It covers various rotation formats in both 2D and 3D, discussing their benefits and disadvantages. The formats that were covered: 2D angles, Euler angles, axis-angle, matrices, complex numbers and quaternions. For each format, their memory footprint vs. degrees of freedom was considered (matrices take up more space than angles, for example), how easy it is to rotate with them, how easy to concatenate, and how suitable they are for interpolation. There’s also some introduction to how matrices encapsulate transformations, and how quaternions are just an extension of complex numbers.

Background assumed: Vectors, matrices and linear interpolation.

GDC2012_JMV_Rotations.pdf

Leave a comment
 
+0

Ratchet & Clank HD Collection Announced!

Posted on

Hey everyone,

It’s amazing that it has been almost ten years since the release of the original Ratchet & Clank on PlayStation 2. A lot has changed since 2002, but your loyalty to our furry Lombax and his faithful robot companion has endured. As we reach the 10th Anniversary of Ratchet & Clank, we wanted to commemorate the achievement of the Ratchet & Clank saga.

Today, we’re delighted to announce the Ratchet & Clank 10th Anniversary Celebration, which also simultaneously addresses the biggest current request from our fans; The Ratchet & Clank Collection . This will contain Ratchet & Clank, Ratchet & Clank: Going Commando and Ratchet & Clank: Up Your Arsenal – all remastered by Idol Minds, working closely with our team at Insomniac. In addition to a full 1080p HD remaster, and 720p 3D support, of all three games and the inclusion of three platinum trophies, we’re happy to confirm that the online competitive multiplayer from Up Your Arsenal will be included. Now you can check out the acclaimed multiplayer mode that many of you didn’t get to experience the first time on PlayStation 2.

The collection will release this fall in North America. I know it’s a little bit later than the Spring European release, but it’s because we have something very special planned to celebrate the tenth anniversary, and we’ll share all the details very soon. We’re also working to make a special bonus available for fans who wait in North America, details when we can share!

We’re so excited that Sony has worked to give our original Ratchet & Clank PlayStation 2 adventures the HD remaster treatment. Stay tuned in the coming months for more on the Ratchet & Clank 10th Anniversary Celebration, as well as more details, screenshots and footage as we near the release of the Ratchet & Clank Collection.

-Ted

19 Comments
 
+19