Vessel: Common Performance Issues
It is time for me to write my final blog on the adventure that was, ‘Vessel PS3’, since my time at Overbyte is coming to an end. This project was tough, however, it is one I am proud we have conquered. There are a lot of interesting and cool optimisations that I myself have done whilst on this project, but since I won’t have the time to write a few blogs about them all, I thought I would highlight a few of the common things I found with some examples of things we did. I will leave some of the more interesting ones for Tony to cover later.
Using containers badly
There were quite a few places in Vessel in which we had to fix up the use of vectors, lists and other similar containers. Obviously you need to use the right ones, to do the right job by understanding what your code needs your container to do and then picking the right one.
But what if you choose the right container or a near enough one? There are still many performance traps you can quickly run into.
Let’s do a quick example. What is wrong with the following code?
vector<PipeEffectDefinitionItem> defs = m_EffectDefinition.GetRef()->m_Definitions;
for(int i=0; i<m_vfxInstances.size(); ++i)
PipeEffectVfxReference& vfx = m_vfxInstances[i];
Did anyone notice the problem? We are grabbing a vector of these PipeEffectDefinitionItems from another object, but instead of using a reference, we are copying them. This happens in an update code of a particular class, which can be called a few times a frame due to other interactions. In each of them it was doing the same thing. Sometimes it is valid to copy a container, if you know the data might change or it is temporary. However in this case, we are referencing static data, so changing the first line to vector<PipeEffectDefinitionItem>& means we don’t need to allocate a whole lot of memory.
Memory allocation and de-allocation was a huge problem for us, with all the threads fighting each other. I have seen many code bases suffer from the same problems in my career. The things we found ourselves doing often in Vessel, was to attempt to work out the size of containers outside a loop, instead of just adding items to them inside it and causing lots of resizing of the containers.
Another common way to solve performance issues is to cache data. One of Vessel’s biggest function costs on the game thread, was a lookup which searched all persistable items in our game to see if it matched a HashID. The idea behind it being, that since some things are spawned at run time and some during load, they could safely reference each other and this system would look it up and let you access it, if it existed. This would be done throughout the code base for each frame and whilst the lookup itself was quite efficient, it was still expensive.
When dealing with a code base you don’t really know, the hardest thing is learning why things are done the way they are. Tony’s blog a few weeks ago talked about this. You cannot assume that the programmer before was stupid. They wrote the code the way it was for a reason. Sometimes they just wrote it in the most simple way and didn’t have the same needs/performance considerations you do. Sometimes the game/systems around it changed which means either a system is adjusted or the system is used in a way that it was not intended.
In this case it was clear that on PC the performance cost of looking it up each time was minimal to nothing, whereas the effort required to track object lifetimes, caching and removing cache, handling level transition and everything else would result in such an insignificant gain it was never looked at. For PS3 however we needed to cache these lookups! Here is the original call to GetRef.
And the cost of the function over a typical frame:
Caching this value required a few changes and a few assumptions. We knew that all things referenced by this system were all persistable, so we could add our required call back functions for both destruction of the persistable, as well as the removal of the reference to it on that class. We also assumed that the constructor would only be used to create new items, not used to re-assign items. This meant that we knew that we never had to check the cache in constructors. For all of our assumptions we had to write code to catch when we were wrong and adjust lots of other code around these. But in the end, what we were able to do is change GetRef to:
And now looking at its performance, we see a 0.5ms improvement.
Inefficient Data Searching
The first thing you should probably do in a code base when looking to optimise is to search for something like TODO: Make faster. Chances are it is probably slow. Whilst I am kidding, I did find one such comment after noticing a function using a lot of time via hierarchy view. This function was used to find both functions and properties so that lua and the game code could talk to one another. Here are the pieces of code in question.
There are two inefficient things in this code. Firstly we are using a vector to find the functors and properties, where we never need to iterate through the list, well at least not in any time critical, runtime code. Secondly we are doing a string comparison, ignoring case. These aren’t cheap either. Here is a summary of a profile of this code running in one of our levels.
The solution here it two fold. Firstly we should be using an associated container as we know that
a) The container is only added to as a start up cost, which sets up the links between the strings and the actual functors/properties.
b) The time critical part of this function is a lookup based on a key
c) We almost never do iteration through all elements (except to find the key as above)
Secondly we know that the string compare is not fast. In a case like this you want to convert it to a hash which will be a lot faster. The Vessel code base already uses hashes instead of strings in most places, but this one was never changed. So we now have the follow code:
The difference we can see below, a small saving, but the key thing in the max is reduced, so less peak slow downs. It is always good to review any functions which are called a lot, especially if they seem to be taking a lot of time in comparison to the work they should be doing.
Doing things unnecessarily
If you don’t understand a codebase, like in our case when we started Vessel, knowing what each function and system is trying to do and stopping it from doing things it doesn’t need to do can be extremely difficult. Now that we are at the end of the project, we are pretty familiar with the code and thus have found many areas where the code was doing things it didn’t have to, or being overly complicated.
For instance in Vessel there are objects called Water Instances. There is one water instance for each specification of liquid in the game. There could be several specifications of Water or Lava. The problem was the water instance code also managed sounds and lights for the liquids based on flow, collisions and other information. This code would find all liquid of a liquid type (Water, Lava, Red Goo) and play/stop sounds and generate lights for them. As we could have up to 9 water instances of a single type, this meant the same liquid was processed 9 times and 9 x lights and sounds were being played.
Obviously the issue here was that originally water instances and water types were a 1 to 1 relationship, but as the game grew, this moved apart and as there was no performance impact on PC, it wasn’t revisited. But on PS3 this was causing us huge issues, going through all the liquid in the game is slow, so to do it 9x more then necessary was bad. Also fmod was not appreciating the heavy load it was being put under and the GPU was struggling drawing 9 lights in the same location. Here is the profile:
We simply just cached if we had done the lighting and sounds checks for a liquid type each frame. If we had, we didn’t do it again. This did have some repercussions, as the content as all tweaked around this. So we had to increase sound volumes and some other settings, and the intensity of lights.
Further inside this class it was handling collisions and keeping a list of data about the interactions between drops of the fluid. This list was reusing a pool of data, to stop memory allocations and de-allocations. The list was checked each frame to remove any which were now obsolete and then checked for room to add new records later.
This code again was quite slow, as it was going through 100 records per 35 water instances. And most of the time, as most water instances do not have liquid on screen, there was no records to clean up. However the code did not know this and continued to check each empty record. Thus we did a minor optimisation to this, making the object know if it had any records being used. This way the code did nothing for the majority of the water instances.
This was a much smaller saving, but an indication of some of the real simple changes you can make to stop code doing things when it doesn’t need to. Let’s have a final look at the water collisions:
Anyway I hope there was some interesting information in there for you. Thankyou to everyone who took the time to read one of my blogs. Tony has heaps more in store for you, so watch this space!