Overbyte - Blogs from Tony Albrecht http://overbyte.com.au/index.php/overbyte-blog/blogger/listings/overbyteadmin Mon, 27 Mar 2017 07:54:18 +1030 en-gb Vessel Post Mortem: Part 3 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-3 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-3 This is the final part of the Vessel Porting Trilogy. The previous parts can be found here 

After the Christmas break (where I was forced into yet another family vacation) I dove straight back into the code, hoping to get the final issue fixed and the game submitted as soon as possible. There was still time to submit and if I passed Sony’s submission process first time then we could still have the game on the PSN store by the end of January. I fixed the remaining problem with the start up error reporting and added in a little more error checking just to be sure. I was pretty paranoid about the submission process as I didn’t want to delay the game anymore by failing - I carefully checked everything twice, crossed my fingers and hit the ‘submit’ button to Sony America (SCEA) on January 9.



In parallel to the code submission process there is a ‘Metadata’ submission process. This corresponds to the PlayStation Store assets and the game’s presence there. It consists of all the text, images, trailers, prices etc and they all have very specific requirements that must be met in order to pass submission. James (our QA guy) managed most of this and involved communicating with Strange Loop’s art guy Milenko (who was incredibly responsive - I’m not sure he ever sleeps) and consisted of asking for different resolution images and screenshots, as well as sourcing the different language versions of the game text. It took us a few submissions of the meta data to get it all right, but the turnaround was pretty quick and a continuous dialog with the Sony helped a lot.

The code submission process consists of uploading the game in a special package format plus some extra documents that describe the trophies and other bits and pieces. We had to submit to both SCEA and Sony Europe (SCEE) so we could have the game released in both those regions. We hadn’t submitted to SCEE at the same time as SCEA as we were still waiting on some publisher registration details to come through, so all I could do was wait for that and for the response from SCEA on our initial submission while I busied myself with some other work.

On January 18th I received the first fail from Sony. As fails go, it was a pretty good one. There were three “Must Fix” bugs: one was due to my entering the wrong data in a field when submitting the game package, and two were due to saving games failing when there was no disk space left. They’d also mentioned some slowdown in some of the levels - I’d expected that, and as it wasn’t a critical bug, I ignored it. The save game bugs proved troublesome - Ben had written all of the save game code and with him gone I now had to learn how Sony wanted you to do it, how Ben had done it and how I could make it work correctly. It took me a few days to find and fix the problems and by this time the details that SCEE required had come through so I resubmitted to SCEE and SCEA at the same time (January 24)

I was quietly confident that it’d all pass this time. I’d thoroughly tested the save game code now and it all worked. What could go wrong? I seriously considered heading out and buying a bottle of Chimay Blue as a pre-emptive celebratory reward.


Image from https://bonbeer.wordpress.com/category/places/mechelen/

The first fail I saw came back from SCEE on February 2nd. I’d messed up the submission data again (I hate filling out forms) plus there was a legitimate bug they’d found where once you’d finished the game you couldn’t continue on from that save game if you wanted to continue playing to complete all the trophies. When I had fixed that I re-submitted the new build to SCEE on February 4th and then began to wonder what had happened to the SCEA build - it should have come back at about the same time as the SCEE one. I also worried that they would pass it with the “continue when finished” bug in it. I needn’t have worried, as the SCEA team came up with a whole slew of new bugs that weren’t mentioned in any of the previous tests. 

This new bug report that SCEA had sent back was very concerning. The crashes they were reporting sounded like bugs I’d previously fixed - I was also impressed with their thoroughness. One of the bugs was something like “Hang from the rope in level blah and throw seeds for at least 5 minutes into an area with no water. Crashes 70% of the time”.  While looking at this bug I received notice from SCEE that they had passed Vessel. Which was great, but it was also worrying as I now had numerous repeatable crashes in the build - so I failed the SCEE build myself in order to be able to submit a fixed build which would match the SCEA build.

By now I was getting a little frazzled. It was like one of those movies where the protagonist has slain the evil uber zombie, only to have it rise from the dead over and over again. Would this submission process never end?



I eventually tracked the crash bugs down to a simple stupid error. There was a section of code that was responsible for limiting the amount of fluid, fluros and seeds in a given section. When the frame rate dropped below 60fps, this code would kick in and drop the number of drops/fluros or seeds to the minimum value deemed suitable for that level. During the final stages of development we had created a FINAL_RELEASE mode which turned off all of the unnecessary debugging code. Unfortunately, this erroneously included some code which updated the time values that the limiting code used, so the fluid, fluros and seeds were never being reduced. Ever. This meant that the game would have been running much slower than it should have, and was prone to crashes whenever certain hard coded limits were exceeded. I've never been so relieved to fail anything.

Something I’d like to focus on here is the level of performance that Vessel was submitted with; Ben and I spent a huge amount of our time just trying to squeeze as much performance out of this game as possible on the platform we had. For the most part, I'm pretty happy with the results. Yes, you can abuse the game and make it slow down quite easily - if you try to fill a room with fluros and water and seeds then you’ll most likely see some slowdown - but during normal play, the game maintains 60fps simulation at 30fps frame rate (the graphics is 30fps, so there are two simulation passes per visible frame). However there are a few levels where the sheer number of fluoros and fluid required to solve the puzzles means that the frame rate will drop to 20fps. Given another month I reckon I could have fixed that too, but given how far over time and budget we were, that wasn't an option. So, while I’m happy with how much we improved the frame rate of the game, I'm still a little disappointed that we didn't hit a full 60fps simulation everywhere.

Sure, the PlayStation 3 hardware did slow down the porting of the game initially, but given the complexity of the fluid simulation required I doubt that there will ever be an Xbox 360 version released. The SPUs allowed us to perform at least 80% of the physics simulation completely in parallel - with no impact on the main processor at all. There is no way you’d get the same level of performance on the X360.


With the FINAL_RELEASE bug fixed and the game resubmitted to SCEE and SCEA (which passed on the 20th and 22nd respectively) I was finally free! The metadata was all sorted and we’re now expecting the release on March 11 in the SCEA territories and March 12 in the SCEE regions. We’re all hoping it does well enough to cover our costs (as all devs with newborn progeny are).

What went right

Having a QA guy in house was invaluable. An impartial, non-technical third party to throw our fixes at kept us honest. As a developer you get very close to the game and like to assume that your fixes have made a positive difference and you’re moving forwards. That’s not always the case.

Honest communication with the Publisher/client. I tried to keep John at Strange Loop Games constantly up to date with the progress (or lack thereof) of the game. He was incredibly understanding with the continuous delays, and while it was hard to deliver bad news, at least it reduced the surprises.

Having good, experienced coders on the job. Bringing Ben onboard was a good choice, even though we had to let him go in the end. There is no way I could have done this without him. Working from the same physical office was also beneficial. I worked remotely for over 5 years leading up to this port, and I think we’d have delivered even later if we’d worked separately.

What went wrong

Work estimation. I fundamentally underestimated the amount of work required. I should have applied Yossarian's Circular Estimation Conjecture and multiplied it by PI - that would have been closer to the mark. The big killer was the misunderstanding with which threads needed to run at 60fps. If it was just the physics I'd underestimated, we probably would have been a shade over a month late (maybe two), but with both the physics and game threads needing to execute in under 16.6ms, we really had our work cut out for us. The amount of time taken for submission shouldn't be forgotten either; 4 to 8 weeks after the initial submission should see your game on the store.

I’d also recommend to anyone doing a console game that they get as much TRC compliance done as soon as possible in the development of the game. Get the load/save stuff working, trophies, error reporting, quitting, controller connection, file error reporting and the like in place and functioning sooner rather than later.

In Closing

The porting of this game was a trial for me. There was a week or two around September/October where I was really doubting that we could deliver a game that ran at a playable frame rate. I'm proud of what we delivered in the end, and if you’d taken away the financial and time pressures I would have thoroughly enjoyed the challenge. I've learnt a lot from this experience, and I'm looking forwards to the next one.

Just not right now, OK? Maybe a bit later in the year.


tony@overbyte.com.au (Tony Albrecht) News Tue, 11 Mar 2014 09:00:05 +1030
Vessel Post Mortem: Part 2 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-2 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-2 In the last episode, our intrepid heroes had discovered that rather than having almost finished the PlayStation 3 port of Vessel, they were barely half way there. Read on to find out what happens next…

With the realisation that both the game and render threads needed to run a 60fps, we had to reassess where we focussed our optimisation efforts. The game thread was doing a lot of different jobs - Lua scripting, playing audio, AI, animating sprites - all sorts of things. There was a lot to understand there, and on top of that, both the physics and render threads still needed more work. But performance wasn't the only concern - there was TRC compliance, game play bugs, new bugs introduced by our changes and, as we eventually discovered, the need for a PC build (more on that later).


Now, Vessel  was ‘mostly’ bug free on PC when we started - we did find a few bugs in the Lua and in game code and the few compiler and shader compiler bugs added to that, but there were subtle differences in the way the two platforms dealt with numbers which meant that weird things sometimes happened. For instance, a door on one of the levels would get stuck and your character would have to bump into it in order for it to open. This was due to a variation in position values about 5 decimal places in causing the jamming on PS3. Additionally, there was some code that was used in a number of places that did something like the following;

x = MIN( y/z, 1.0);

What this did on PC was catch the case where z tended to zero and clamped x to 1.0 ( MIN(infinity, 1.0) = 1.0). Unfortunately, on PS3 value/0.0 was QNAN and MIN(QNAN, 1.0) was QNAN. So the clamping never happened resulting in much weirdness in hard to reproduce places (it only occurred when z was zero), and was therefore a bugger to find.

This meant that we weren't just optimising and adding in new PS3 functionality, we were also debugging a large foreign codebase. I hadn't expected that we'd need to change Vessel much - it was a shipped game after all and so I had assumed that it was pretty much bug free.

I had also assumed that for the most part we wouldn't need to change any assets significantly. Things like platform specific images for controllers and buttons were just texture changes and so weren't a problem. Text was fine too. Unfortunately, problems like the sticking door mentioned above meant that we need to go into some of the levels and jiggle bits around. Later in the life of the port we had to tweak fluid flows, emission rates, drains and other fluid based effects to help with performance. 

All of this meant that we needed to have the editor up and running, and the editor was built using the same engine as the game so we needed to maintain two builds of the game - one for PS3 and one for PC solely for the editor. This was done crudely by #defineing most of our PS3 changes out and leaving the PC code in there inside the #else conditional. This did slow us down a bit and also made our code that much uglier, but it had the side effect of allowing us to easily test if a bug was present in the PC build or was specific to the PS3.

By the end of September we had fixed many audio issues and managed to get LuaJIT (an optimised version of Lua) working again (it was mostly working but was causing some repeatable problems in some places). Ben also profiled the Lua memory usage (article here) and tweaked its garbage collection to be more CPU friendly (it was occasionally peaking at 13ms in a call). So, slowly, we were scraping back time spent on the game thread. It was still a serious problem, but it was getting better.


Back on the physics system, more and more the CPU code was being ported to the SPUs - in many ways, this was getting easier as I was building a lot of infrastructure which made it more convenient to port. Many of the patterns that I used for one system would translate well to others. I started reducing the amount of memory used by parts of the physics, like dropping the size of the WaterDrop struct to 128 bytes from 144 bytes meant better cache usage and simpler processing on SPU (nicely aligned structs meant that I could assume alignment for SIMD processing).

Some optimisations were straightforward.  For example, I changed the way that we processed drops on fluros skeletons. Instead of iterating over the entire drop list and checking each drop to see if it was part of skeleton ‘N’ I changed it to pass over the drop list once, partitioning the drops into buckets of drops per skeleton. Then we could just process a given skeleton, touching only those drops on that skeleton. Obvious in hindsight, and unnecessary on the PC build, but it takes time to get to the point where you really start to understand the code and can begin to change the way it works (rather than optimise a function to perform the same algorithm, just faster).

At this time we also saw the transition of the render thread to the status of low hanging fruit. So we started working on that - unrolling loops, prefetching, minimising data copying. Tweaking, poking, prodding and breaking, testing, measuring, then swearing or cheering and checking in.

The end of September rolled around and we still had no real end in sight. I kept talking to Strange Loop, trying to keep them up to date while we furiously worked to finish this game. As we’d already blown the initial quote, our contract specified  that we would have reduce our daily rate for the remainder of the project, to be recouped on sales. So not only did we have significant time pressure, we now had financial pressure on top of that.


October was very productive - it saw over 150 bug fixes and optimisations checked in for the month. As the game was improving in performance and becoming more playable James, our QA guy, was able to play more and more of it. We hadn't actually managed to do a complete play through of the game by this stage, so we focussed heavily on removing all blocking bugs and getting the game in a functionally shippable state (assuming that we would continue to make performance improvements).

Second week in, after discussions with Strange Loop Games we decided to postpone the release of the game to January. This meant submitting the game to Sony in Europe (SCEE) and in America (SCEA) by the end of the year, a mere 4 months after our initial estimate. There was no way I was going to let this date slip again.


Ben was still working on optimising the Lua execution and audio (which at this stage was still taking 4 or 5 ms per frame with peaks way higher than that). I’d thought that as Vessel used FMOD that it would all just work when I turned it on. Unfortunately, the audio was a complete mess on the PS3. While profiling the game thread in detail we discovered that it was taking up huge amounts of time, spiking and causing frame stutters as well as just plain behaving badly. It took weeks of intermittent work plus lots of discussions with the FMOD boys to figure out what the problems were. Turns out, many of the sounds were just set up incorrectly for the PS3 build - effects and layers that were disabled on PC were left on for PS3. Some sounds would loop infinitely and multiple sounds would play at the same time. All these things took time to discover, time to investigate a solution and time to fix. Something we had very little of.

To make things worse, Vessel performed no culling for the rendering, AI or audio. At all. The entire level was submitted to the renderer which culled it and then passed off the visible bits to the SPU renderer. Also, Lua scripts (which are slow at the best of times) were being called even when the things they were modifying weren't in view. Audio was also being triggered by objects that were a long way from the player, relying on distance attenuation to reduce volume. All of this meant that there was a lot of work going on under the hood for no visible or audible impact.

We were still stumbling across weird little issues. For example, the animated textures in the game were never animating. They were loading correctly, but never changing. Ben tracked this down to an endian swap function which was broken (the PC and PS3 have different byte ordering for their numbers and so the same data when loaded from file must be modified depending on platform). This endian swap was only ever called in one place and only by this particular animated material. 

The fix is often easy. Finding what to fix is hard.


Even though we had a tight timeline, we also had other client obligations which saw us each lose 2 weeks in November. This helped financially (well, it would have if they had paid on time) but put even more time pressure on us. Regardless, we were confident that we’d make the end of December deadline. Plenty of time.

We finally managed to get a complete play though of the game in November. Only to realise that the endgame didn’t work. Fix, patch, hack, test.

It was during the last week of November that Ben had a breakthrough. He bit the bullet and implemented a simple container based culling system for the game thread. All objects in a loaded level were placed within a rectangular container and there were an arbitrary number of these per level. This meant that we could quickly check what should be rendered or processed (Lua or audio) by looking at the container in view and those next to it. Conceptually simple, but implementation required that we modify the editor (which we’d not done before) and renderer, then visit each level in the editor and manually add the new containers then re-export everything.

This fix made a massive difference to performance. Audio was faster as it wasn’t playing as many sounds. There was less Lua processing happening as there were fewer things active. And rendering was faster as there was less being sent to the render thread. So it worked, and it worked well. In fact, we had to limit the frame rate to 30fps as some places were now rendering at 60fps on all threads!



The container fix was instrumental in getting the game thread to a more playable speed, but it also introduced a lot of little bugs that took a month to fix. And still, it was too slow.


With only three weeks until the Christmas break we figured we needed to start cutting back on the content in some of the levels in order to get the frame rate up. We picked a few of the worst and closely examined them to try and figure out how we could speed them up. We found the following; 

  • Trees were very costly (as were their roots). So we cut some of them down.
  • Drains were expensive, so we removed some of them.
  • Areas with lots of water were obviously expensive, so we reduced the amount of water in some areas, cutting back on the flow or adding drains to reduce the amount of water in pools.
  • We tweaked the object limiter code to ensure that there were always enough fluros to finish a given level, yet not too many to make it run too slow. Same with the number of drops and seeds.

All of the above helped, and the game was now running pretty well. There were still areas where it would slow down, and you could (and still can) slow down some areas intentionally by spraying lots of water and generating lots of fluros, but there was no time left for further optimisations. I'd already built 13 different SPU tasks to speed the physics up and one or two for the render thread - it was getting very hard to speed things up more, and at this stage it was risky to make any extra significant changes. This would just have to do.

James was now noticing some play specific bugs. Some fluros weren't moving correctly - jumping too high and destroying themselves on the scenery or just behaving badly. Which is fine in most cases, but this was happening with a key fluro that you needed to use to complete the level. We had to modify the level to make it work again.

In addition to the optimisations that we were still implementing, and the bug fixes for the odd things that were happening, we also had to make sure that the game was TRC compliant. James trawled through that massive missive, highlighting things that we needed to check - the ways that save/load games functioned, how the game shut down, what would happen in the case of no hard drive space left, the order that trophies unlock, so many things. And so little time.


And, on top of that, the financial pressure on the company due to the length of time that Vessel was taking to port, plus the reduced pay rate we were getting for the overage and the fact that there was very little work lined up for the new year meant that I had to notify Ben that we were going to have to let him go.  

It was one of the hardest things I've ever had to do. I felt like I’d betrayed him - he is an awesome coder and he’d become a good friend. And just before Xmas FFS. To make things worse, the day I had to do it he was working from home and I had to let him know over Skype. It was just awful.

He called me back within a couple of hours to tell me he had just managed to land a new job. In two hours. I told you he was good.


From the left: Tony, Ben and James.


So, back to the code. With a week and a bit to go we mainly focussed on the remaining TRC issues. The game was running as good as we were going to get it to in the time we had and I was satisfied with that. TRC violations were disappearing and it looked like we might just make it.

In the last week (on the Wednesday)  I realised that the file format that we’re using needed to be DRM enabled in order to be TRC compliant, so I spent 17 hours straight trying to fix it. Did my first all nighter in a long time and managed to solve the problem at 5am. I pushed through the next day, fixing, tweaking, hacking, testing while full of sugar and caffeine and then dragged myself home to sleep on the Thursday night.

Submission Time

The final Friday arrived. We performed some clean ups and readied the code for submission. We were going to make it! On one last pass over the TRC checklist I realised that we’d missed something. We needed to be able to report an error if any of the required startup files were missing, but the way that the engine was designed, there was no way to display anything until those critical files were loaded. We had a few hours so I quickly hacked together an emergency renderer that I could use in the case of a critical failure. I gave myself 2 hours to do it and it was working within 30 minutes. Awesome! But, upon further testing, the loading screens refused to work. I still have no idea why - the code I added was never executed and yet still affected the loading screens. The game itself played fine, but those bloody loading screens didn't bloody work. I wasn't willing to submit with what I knew would be a TRC failure so, once again, we’d missed our deadline.

We'd have to delay submission until next year. 


Continued here: Vessel Post Mortem: Part 3

tony@overbyte.com.au (Tony Albrecht) News Fri, 07 Mar 2014 08:53:50 +1030
Vessel Post Mortem: Part 1 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-1 http://overbyte.com.au/index.php/overbyte-blog/entry/vessel-post-mortem-part-1 I started looking at Vessel in January 2013 - initially just in my evenings. In my ‘spare’ time. I was looking at it because John and Martin (the founders of Strange Loop Games and the creators of Vessel) had asked me to consider porting it to PS3.  We knew each other  from our time together at Pandemic Studios in Brisbane where I had worked closely with Martin in the Engine team while John worked as an AI programmer. I was keen to do the port but I wanted to be as sure as possible that I knew how much work was involved in it before committing.


I found the idea of porting such a game daunting - it had a custom rendering engine, a bespoke fluid physics system with major parts of the game code written in Lua - but I figured it was time to step up and take on such a challenge. I have an immense amount of respect for John and Martin - they are both incredibly smart coders and I relished the chance of working with them again. I wanted to do it but I didn't want to let them down. They were going to be my clients, but they were also friends.

I started by looking at their code - in order to give a quote on the porting process I needed to have the game running so that I could profile it. Vessel had already been partially ported to the PS3 by a third party, but wasn't compiling when I first got my hands on it. So I spent a few weeks of long evenings fixing the FileIO, the template compilation errors (GCC and MSVC disagree on some of the finer points of templates), and just commenting out anything I didn't think was relevant in order to get the game up and running. Once it was working, I ran my trusty profiler over the game and tried to guess how much time it would take to get this baby to a shippable state. What I found was a little scary. 


Visualisation of the code changes made during the porting of Vessel to PS3.

Vessel runs predominantly on three threads. The game thread which runs all the Lua scripts which control the majority of the game logic and in game objects, the render thread which takes the output of the game thread and converts it to a format that can be fed to the SPUs which then build the graphics command buffer and feed it to the GPU in parallel, plus the physics thread which manages all the physics processing - fluid dynamics, rigid body collisions, everything. John had built the physics system himself and it was pretty impressive (and complicated). It was too much to grok initially, so all I could do was look at the game’s performance on some of the worst levels, profile it there and see how much I needed to speed it up.

I investigated a few of the levels on the PS3 build and picked two that I thought were the worst. What they told me was that the physics thread was the primary cause of slowdown and so I concentrated my analysis on that thread. The game was running at about 60ms per frame in the worst levels I was testing - about 15 to 20 frames per second. From discussions with Strange Loop, I knew that the physics thread had to run at 60fps so this meant that I needed a factor of 4 speed up. From experience, I knew that this was something that was achievable. The game thread was running at around 30ms per frame and the Render thread was clocking in at around 5ms per frame. SPU rendering was pretty fast, and I figured that wouldn't be much of a problem.

Months of fannying about took place and a contract was finally signed in June. So I did the only appropriate thing one could do in that situation - I did two weeks of work on Vessel and then went on a 6 week family holiday to the UK.




In anticipation of the Vessel contract I’d hired an experienced programmer, Ben Driehuis. Ben came highly recommended and had worked on the Bioshock series of games, plus, most importantly, he was living in the same city as me. Having spent the last 6 years working remotely I knew that my best bet of delivering this game on time was to be in the same physical space as my team. I threw Ben in the deep end. He was hired and then given work on platforms and engines that he’d never had to deal with before. Then when I buggered off to the UK he had to manage by himself, dealing with a performance contract on a big name title and then kicking on with Vessel in my absence. Poor bugger.

I also had a friend who was interested in the QA side of things, so James (an old Uni mate) was responsible for playing Vessel over and over and over again, reporting bugs to us and telling us that the performance improvements we’d just made hadn't made any difference. James also helped us out on the testing for the TRC (Technical Requirements Checklist - a massive list of tests that need to be checked off in order to get your game through submission to Sony) and the actual submission of the game.

Once I got back from my holiday we moved into the new office and started working in earnest on the Vessel port. Our goal was to finish Vessel by the end of August - we had about four man-months to do the work, and I had carefully calculated how much time each section of the port would take. Surely that was enough time?

Ben and I divided up the work between us: I took on the optimisation of the physics and Ben was to deal with pretty much everything else - audio was broken, some of the in game objects didn't animate, loading needed to be optimised, memory usage had to be reduced, we needed save games and trophies and a whole swathe of PS3 specific features implemented. Ben has written about some of the fun he had during the port here, here and here.

From my initial inspection of the code, I had noticed a lot of memory allocations taking place, so I was optimistic that I could reduce (or remove) these and scrape back a lot of frame time. I also figured that rearranging the memory usage to be more cache friendly would help the PlayStation 3’s main processor so I could leave the core of the physics code alone. The last thing I wanted to do was to port all of the physics to SPU.

Three weeks later I started porting the physics to SPU. The changes I was making were incrementally improving the performance, but I could see that we’d never hit the speed we needed without dramatic changes. Porting code to SPU isn’t like porting code from one platform to another. You can program SPUs in C++, but they have very limited memory available to them - only 256kb each. So, in order to port any significantly complex task, you need to understand how that task uses memory - where it is, what order it reads it, where it writes it - the things that most programmers of high level languages take for granted. Not only that, but you have to consider how the task will parallelise. Running on one SPU is good, but running on 6 is awesome.  (I’ll go into more detail on the SPU porting process in later articles.)

The first SPU task I built was one that looked for drops that were near a set of objects. My first shot at this shaved almost 2ms off the frame time - the function in question dropped from a peak of 2.4ms to 0.6ms! That meant that in order to hit 16ms per frame I just needed to do this 21 more times! 


Still, Ben and I kept hammering away at the code. By the end of the first week in August I’d shaved 15ms off the Physics thread and yet that thread remained the bottleneck. Ben had freed up over 200MB of memory and we were close to running on a retail PS3. But we’d started to notice a few things - some of the levers and triggers in the game weren’t working, fluid shaders weren’t working properly, we were getting spikes of over 200ms in some cases and the game still wasn’t playable due to not just the levers not working but the AI of the fluros seemed to be broken (some would teleport away).

It was evident at this point that we weren’t going to hit the end of August deadline. Other projects had taken some time from both Ben and myself, so we still had billable time available going into September, so I let Strange Loop know that things were going to take a little longer than I had expected. They were very understanding and comfortable with the release date pushing out a little more.

Things were progressing well though - by the end of August the physics thread was running at better than 33ms per frame - we were over halfway there! Save games were in place, loading was optimised, and trophies were functioning. Unfortunately, the game thread was slowing down - as Ben was fixing problems with the code and data, more things were being processed. For example, audio wasn’t working before and now there was a lot of audio being triggered so this obviously took more processing time.

It was the second week in September when I realised that I’d made a horribly flawed assumption about the game thread in Vessel. I discovered that it had to function in lock step with the physics thread and so it too had to run at 16ms per frame. Reducing the framerate of the physics wasn’t an option as it was fixed to 16ms per frame - experimentation with varying that resulted in fluid collisions breaking horribly.

So, at that point we had the physics running at sub-30ms per frame, and the game thread running at 30ms and above. We were close, but we still had a long way to go - we now had to figure out how to optimise the Lua heavy game thread down to 16ms per frame. Actually, it had to be faster than that - as the PS3 has only two hardware threads, the 3 software threads have to share those two hardware threads which meant that the game thread plus the render thread had to be faster than 16ms per frame.

Then Ben discovered the reason for the teleporting fluros - a compiler bug triggered only in optimised builds was causing fluros to incorrectly calculate their trajectory during jumps, making them disappear. Turning off optimisation for that function fixed the problem. That was a tricky bug to find, so we were pretty stoked - until we realised that this meant that the number of fluros in view was now higher that it was before. Much higher. More fluros means more processing - more physics, more drops, more rendering, more audio. This one fix resulted in the Physics thread jumping back up to 50ms from around 24ms, the render thread jumping up to 12ms and the Game thread climbing to 40ms.

I was horrified. We were effectively back to where we had started. We now needed to save a total of 60ms over three threads. I wasn't sure we could do it. It was like that bit in horror movies where the protagonist has killed the evil monster, only to have it rise up, stronger than it was before.


The initial optimisations on a project are the easiest. As time goes on and all the low hanging fruit has been picked, the remaining optimisations are harder and harder to perform. We really had our work cut out for us - and we'd come too far to go back now.

Visualisation of the code and data changes made for the Vessel PS3 port.

Click here to read on...

tony@overbyte.com.au (Tony Albrecht) Programming Thu, 27 Feb 2014 23:11:19 +1030
Official Vessel Release Date! http://overbyte.com.au/index.php/overbyte-blog/entry/official-vessel-release-date http://overbyte.com.au/index.php/overbyte-blog/entry/official-vessel-release-date Overbyte and Strange Loop Games are proud to announce that the PlayStation 3 version of Vessel is due for release on 11 March in the US regions and 12 March for the European and Australasian regions. The official announcement can be found on the PlayStation Blog


So, for US$9.99 (or the equivalent in your local dollarcoins) you get over 10 hours of puzzley platforming goodness with a fluid system built just for you, lovingly ported to the PlayStation 3 by your caring friends at Overbyte.

But wait! That's not all - if you order through here, you not only get the PlayStation 3 version but the Steam version as well and, (while available) you will also get the Turbine toy. How cool is that?

 We're really happy to be finally releasing this game, and hope that you'll have as much fun playing through it on the PlayStation as we will. 


tony@overbyte.com.au (Tony Albrecht) News Thu, 27 Feb 2014 09:15:37 +1030
Looking For a Good Sort http://overbyte.com.au/index.php/overbyte-blog/entry/looking-for-a-good-sort http://overbyte.com.au/index.php/overbyte-blog/entry/looking-for-a-good-sort Vessel for PlayStation 3 has finally been submitted to Sony for approval so I have a little more time to write up some of the optimisation challenges we had to overcome during its porting process. Well, OK, I don’t really have any more time but I’ll try to write a bit more about it before wet bit rot sets in and I forget everything.


Vessel ran predominantly on three threads; one for Physics, one for Game and one for the Render process. The Render thread translated the output from the Game thread into a command buffer that could be parsed by the SPUs and as such was pretty short - we had allocated up to 3 ms per frame for it and unfortunately, in certain places it was hitting 11ms.

Looking at a Tuner profile we found the following


In this case, the Render thread was taking around 6ms, with half of this taken up with the generation of the quads for each of the fluid mesh maps. Of that, 1 millisecond was spent in the STL sort function. This seemed a little excessive to me, so I looked closer. The red bars in the image above correspond to the following code:


with the following functors to do the gritty part of the sorting.


Basically, what the above code does is, for each fluid mesh map (the outer loop code isn’t shown but the first code fragment is called once per FluidMeshMap) sort the FluidVertexes by water type (lava, water, glow goo etc) and then by whether or not it is an isolated drop (both edges in the g_Mesh are equal) or if it is an edge (e1 != e2). Once that is done this information is used to populate a vertex array (via calls to SetEdge()) which is in turn eventually passed off to the GPU.

From looking at the Tuner scan we can see that the blue parts take up a fair amount of time (they correspond to PROFILE_SCOPE(sort) in the code) and correspond to the standard sort() function. The question is, can we write a better sort than std::sort?

std::sort is already highly optimised and was written by smart people that have produced a generic solution that anyone can plug into their code and use without thinking too much about it. And that is exactly why we can write a faster sort than std::sort for this case - because we know exactly what we want. We have the input and we know the output we want, all we need to do is work out the fiddly transformation bit in the middle.

The output we desire is the FluidVerts ordered by sort and then by edge/drop type. The final order should look something like this:


I decided to base the solution loosely on the Radix sort. We do a first pass over the data to be sorted and count up how many of each fluid ‘sort’ there is. This is then effectively a histogram of the different fluid types - we now know how many of each type there is and we can set up a destination array that is the size of the sorted elements and we can calculate where in that contiguous array the ‘sorts’ of fluid should go.

Our second pass steps through the source array and places the FluidVertex into the correct bucket (based on fluid ‘sort’), but adds it to the beginning of the bucket if it is an edge and at the end if it is a drop. And that’s it. Done. Two passes. A super fast, super simple sort that does just what we need.

“So, how much faster is it then?” I hear you ask.

“About five times faster.” I would respond. This sort runs in about .25ms compared to the original 1ms but it also has the added benefit of moving the FluidVertex data into contiguous sorted arrays (the original just produced an index map that could be used to traverse the original array (vSorted)). This is good because we can then easily optimise the second part of the FluidMeshMap loop which copies the verts to a vertex buffer for rendering and also modifies the edges via calls to SetEdge() (for those that are interested, the SetEdge() function does some simple arithmetic and inconveniently calls Atan2() to set up the angle of the edge for the GPU to use). This contiguous data means fewer cache misses, but on PS3 it also means that we can easily pass it off to the SPUs to do the transform and copy for us while we process the next FluidMeshMap on the PPU. Nice huh?

The result is that the BuildFluidMeshmap section is reduced from 2.25ms down to 0.24ms - a saving of 2ms


The key message to take away from this is that generic implementations of algorithms are great for getting your code up and running quickly, but don’t expect them to be lightning fast in all cases. Think about what you are doing, reduce you problem to the simplest inputs and outputs and then build a simple transformation that does just that. It will make you feel good on the inside. Simplicity is its own reward.

(For anyone interested in more sorting goodness, check out an old blog post of mine A Questions of Sorts.) 

tony@overbyte.com.au (Tony Albrecht) Programming Thu, 30 Jan 2014 12:13:26 +1030
How Smart Programmers Write Stupid Code http://overbyte.com.au/index.php/overbyte-blog/entry/how-smart-programmers-write-stupid-code http://overbyte.com.au/index.php/overbyte-blog/entry/how-smart-programmers-write-stupid-code This is article is based on the presentation I did at GCAP 2013. You can find the slides in Keynote and Powerpoint formats here. The slides include extensive presenter notes, so you should be able to understand what I was getting at even though the slides are purposely light on content.

I've spent a significant part of my professional career working with other people’s code. I've worked with code from big studios and small ones, from successful ones to struggling ones, with experienced devs and green. I’ve seen code that is beautiful, code that inspires as well as code that makes no sense, code that stinks, code that is impossible. I've seen code you wouldn't believe. 


But all the code I’ve seen, both good and bad, has one thing in common - it was written the way it is for a reason. Part of my job, and the job of any code maintainer, is to find that reason and understand why the code was written in that particular way. 

When confronted with new, complex code, there are 4 stages that a programmer goes through:

  • [horror] WTF is this?
  • [confusion] How did this ever work?
  • [blame] I hate you and I wish you would die.
  • [enlightenment] Ahh, I get it now.

The forth stage is the important one. Enlightenment. Comprehension. When you finally understand the code completely and it becomes your own. All too often if the function of some code isn't obvious, then it is often dismissed as being stupid - but this is rarely the case. The code in question isn't necessarily stupid, but it is obscure. Complex. Some of the smartest people I know are programmers, yet they can write some bloody awful code. Stupid coders don’t last long in this industry and their code doesn't last much longer than they do. So why do smart programmers write stupid code?

There are two ways of constructing a software design: 
One way is to make it so simple that there are obviously no deficiencies
and the other way is to make it so complicated that there are no obvious deficiencies.

 — C.A.R. Hoare, The 1980 ACM Turing Award Lecture

Time Constraints

Time constraints have a significant impact on the complexity of our codebases. There are two types of time constraints: the time available to write the code in the first place as well as the time available for the code to complete execution in. 


Time to get the code running

The first constraint is a very common one - game devs are time poor. We generally have only just enough time to get the code working in the first place - who has time to revisit their code just to clean it up? We've got a game to ship, right? Personally, I’ve seen some awful game engines ship great games - and conversely, great engines that have never shipped a game. So, back to my point; given a limit in time to develop the code, what is often produced isn't always the cleanest or simplest solution. When that code is revisited for bug fixing or some other modification, its opacity can mean that it won’t be touched - rather it will be worked around or removed completely as the new coder may not understand why or how it has been done.

Time for the code to run in.

The second constraint is one we've all seen too. A portion of code was recognised to be a bottleneck and so it has been optimised into something unrecognisable. In its optimised state it is difficult to debug, difficult to add to, difficult to understand. 

Both these forms of time constraint can result in complex, hard to maintain code. ‘Stupid’ code. Code that wasn't necessarily written by stupid people, but taken at face value is at best, obtuse.

Code Entropy

Code is like a thermodynamic system. Over time it tends to a state of disorder. In fact, given enough time it tends to a state of perfect internal disorder (which is a very apt description of most shipped games’ codebases). Disorder within a codebase manifests as complexity (and instability) which, if not actively resisted, increases over time. 


There are a couple of factors which contribute to this increase of complexity;

Problem Domain Evolution

The problem you end up solving is rarely the one you set out to solve.

Consider a programmer who sets out to solve a problem. She settles on an appropriate approach an initial solution is implemented. At a later date the initial problem is deemed to be incorrect or requires incremental modifications and so you she has to readdress the problem. The new direction is similar enough to the old one that it makes sense to simply modify the existing code - so she slowly twists the original solution to fit the new problem. It works, it’s a little ugly, but it works. If this happens enough times, the final problem can be significantly different to the original one and the initial design may not be as appropriate any more. Or even remotely appropriate. But it works. 



The issue here is the evolution of the code. The programmer has designed a clean system to solve the given problem and that problem has changed so she needed to modify, or dirty, the system. Edge cases and singularities can pop up which can dirty the implementation as well.

So what you have is an evolving system which gets more and more complicated as the problem domain changes. This is even more evident over longer periods of time leaving you with dead code and data. As new programmers are swapped in and out to iterate on the code things get progressively worse.

 Solution Evolution

Iterating without readdressing the initial design convolutes the eventual solution.

A programmer starts with a given problem. She designs a system to solve this problem and then implements that solution and tests it to see if it solves the problem. If it does, then the problem is solved, the code is checked in and beer is consumed in a celebratory manner. If not, then the implementation is revisited and iterated upon until a working solution has been achieved (possibly while everyone else is at the pub having beers). 



The issue here is that the initial design may not have been the most appropriate for this problem - the programmer learns more about the problem as she implements and iterates on it, but rarely revisits the initial approach to the problem. The more work that is put into a particular approach, the less likely the programmer is to change it. The ideal time to design the code for this problem is once she’s finished implementing and testing it.

What we end up with is sort of like DNA - there is a working solution there, but there is also a lot of cruft left in there from other iterations which has no effect on execution (except for when it does). It’s hard to tell what is part of the core solution and what is cruft - so what happens is nothing is touched when changes need to be made. New code is added to the periphery of the solution, often duplicating what is already in that original solution.

Complexity has momentum

The problem with complexity is that it breeds more complexity. The more complex a solution is, the more complex modifications to it become, either through necessity (the complex implementation requires complex changes) or convenience (it’s too hard to understand so its easier to tack something onto the existing solution or even to rewrite what is there). 


Controlling complexity is the essence of computer programming.
    — Brian Kernighan 


It takes a real effort to simplify a complex implementation - a coder needs to completely understand the existing solution before she can simplify it. The urge to rewrite a complex system is strong, but unless you fully understand the existing implementation you run the risk of making the same mistakes the original programmer(s) did.


Any intelligent fool can make things bigger, more complex, and more violent.
It takes a touch of genius — and a lot of courage — to move in the opposite direction.



In closing

My recommendations are this; Treat existing code and previous coders with respect. In all likelihood it is the way it is for a valid reason. Once you understand why the code was written the way it was, you are then in a position to modify, simplify or optimise it - and when you do, leave the code in a state which is better than when you found it. Your future self with thank you.

Where possible, take the time to refactor your code once it is working. Simplify it, clean it, make it beautiful. Beautiful code is its own reward. 


tony@overbyte.com.au (Tony Albrecht) Programming Thu, 21 Nov 2013 21:16:57 +1030
A Profiling Primer http://overbyte.com.au/index.php/overbyte-blog/entry/a-profiling-primer http://overbyte.com.au/index.php/overbyte-blog/entry/a-profiling-primer The most interesting thing about this port is the optimisation feat ahead of us. Vessel was already (mostly) running on the PS3 at the start, but needed to run much, much faster. So, where does one begin? The first thing you need to do, is profile it - if you don't know where the program is slow (or why) then you can't optimise it. But in order for me to be able to talk about profiling and optimisation, I'll need to delve (briefly) into the profiling tools on the PS3 (If you're an experienced PS3 dev, then your probably familiar with this information - skip ahead, I won't mind). Here's a screen capture of part of the tool I use for PlayStation3 profiling (SN System's excellent Tuner - comfortably the best profiler I've used)


Like all the best profilers, it has lots of colourful bars (the PS2 Performance Analyser was awesome like that too). Colourful bars are crucial for dazzling and confusing artists who may wander by and happen to glance at your monitor. The top of the screen has a time line in seconds - you can see it ends at 0.073, or 73ms. Yup, that's right, 73ms. (Note that in this blog I will almost always use milliseconds rather than frame per second. Frames per second is a useless metric only used by teenage boys who like to put lights inside their PCs).

The display is broken up into roughly 3 parts that we can see here. Each group of horizontal bars corresponds to a thread (click on the image to see it in higher resolution in another window). The top thread is the Physics thread, the next the Game thread and then the Render thread. 


The top, segmented bar is the overall thread usage. When the thread isn't running (like at the start during the stall for sync) you'll see a gap or a bar indicating an OS function (like a mutex or even fileIO). The colourful collection of bars beneath that are the result of marking up important functions in the code base with calls that tell the profiler when the function starts and stops. The longer the bar, the more time it takes. Bars that appear under other bars indicate function calls from that parent. For example, WaterTime calls MoveWaterFromInternalPressure which calls ApplyViscosity and the functions that call DoubleDensityRelaxation and then ProcessSkeletons. It is a very powerful way to view the flow of your code and I often find myself using Tuner just to see how a foreign codebase flows. For those of you out there who don't have access to the PS3 devkits or Tuner, you can use Chrome Tracing to read the output of your profiling in JSON format and display it in your browser (assuming of course, that your browser is Chrome).

So, basically, the longer the bar the longer the function it represents takes to run. Now, how do we choose which bar to optimise?

Generally, when looking for what to optimise first, I try and find something easy, something obvious. Occasionally you'll get lucky and make a simple fix that will save milliseconds - check your compilation options (you wouldn't believe how often people miss one library/project), check that your code is inlining properly (check the generated assembly), look for lots of string based functions as well as intra frame allocations - basically make sure that everything you're seeing makes sense and that there are no piles of stupid lying around. There are two ideal candidates for optimisation: One function that does a lot of processing or one or more small functions that are called a lot.

There is another feature of Tuner (and most other good profilers) and that is Program Counter sampling. Program Counter (PC) sampling is done by the profiler reading the program counter register to get an indication of which line of code is being executed at that time. The more times a particular line is sampled, the more often it is being executed - on PC the free Very Sleepy does a good job of this. You'll either get a hierarchical view of the functions being called, or a flat view of what is being called when - like in the following image:


The red and blue marks correspond to samples that are performed during the frame. On the left you can see the names of the functions that the samples have occurred in. In Tuner you can also drill down into the actual functions and see which lines and even which assembly instructions have samples attributed to them. This allows you to find hotspots - sections of the code that are called a lot and contribute heavily to the cost of each frame.

Tuner has a lot of additional functionality that I won't go into here - this should be enough to help you to understand captures that myself and Ben will post over the upcoming weeks.

In my next post I'll be looking in detail at the optimisation of one of the sections of the Physics thread. I'll just say this; my initial hope was that I could gain a lot of performance back by simply rearranging memory and minimising memory allocation in the physics code. That hope was very quickly and resoundingly dashed - this wasn't going to be an easy task.

tony@overbyte.com.au (Tony Albrecht) Programming Fri, 27 Sep 2013 09:15:00 +0930
Any Port in a Storm http://overbyte.com.au/index.php/overbyte-blog/entry/any-port-in-a-storm http://overbyte.com.au/index.php/overbyte-blog/entry/any-port-in-a-storm The Overbyte team has been busy beavering away on a number of projects over the last few months, but there is one that I thought might be of interest to a few programmers out there. We recently started work on a PlayStation 3 port of the successful Indie PC title, Vessel, and what we plan to do over the coming weeks is to document the porting process. We'll look at what we started with, what our goals were, what assumptions we made, what mistakes we made along the way and have a look at some really gritty optimisation problems.

tony@overbyte.com.au (Tony Albrecht) Programming Sat, 14 Sep 2013 22:21:10 +0930