Upcoming Events! Community Event Calendar

Around the Verse: Performance and Optimization Written Thursday 8th of March 2018 at 08:13pm by Sunjammer, Desmarius and CanadianSyrup, Nehkara

As per usual, anything said during the show is subject to change by CIG and may not always be accurate at the time of posting. Also any mistakes you see that I may have missed, please let me know so I can correct them. Enjoy the show!

TL;DR (Too Long; Didn't Read)

Project Update

  • UI has rewritten the visor vehicle status holograms to use RTT so it is a true reflection of the in-game entity

  • UI is using same RTT to render comms so it can be displayed on MFDs, mobiGlas or visor

  • Introducing the ability to set the camera angle (from predefined list) when viewing own or target vehicle

  • Added ability to connect auxiliary displays (visor, 3D radar, etc.) to vehicles dashboard to use as additional screens

  • Added new way to support multi-channels and improved colour blending/breakup to the Procedural Planet tech

  • Improved the existing tools to reposition all planetary outposts automatically

  • Finalising last pieces of truckstop exterior and incorporating new common element hangars

  • ProceduralLayout tool is now automatically generating lots (1000+ and counting) of successful layouts

  • Anvil Terrapin, MISC Razor, Tumbril Cyclone, and Aegis Reclaimer approaching flight ready and getting VFX added

  • Focusing on impact reworks for Geminii R97, PRAR Distortion Scattergun, and Scourge Railgun

  • Migrating Star Marine customisation screen to common (Arena Commander, mobiGlass) codebase: will allow players to select individual armour pieces in SM

  • UI hard at work on implementing visual improvements and prototyping the new layout for the mobiGlas

  • TechArt working on Character Customiser: new looping idle, more/randomise facial animations, work-in-progress icons, specular highlight and lighting passes, and improvements to camera system

  • Working on Service Beacons for 3.1 to allow players to pay each other for services (personal transport, combat assistance, etc.)

  • Fixing character bugs such as persistent helmet lights and wearing hats with helmets

Optimzation

  • The biggest problem with optimization in a game like Star Citizen is the sheer number of entities.

  • Game simulation is dependent upon inputs, the state of the game's world and the rendering of scenes within that world in simple looping steps.

  • Performance optimization is basically finding less resource intensive ways of performing these said steps without losing the quality in gaming fluidity or experience.

  • GPUs and CPUs can work together in either lockstep or in parallel forming a pattern called pipelining.

  • The management of pipelining is a main way to improve resource efficiency and thereby reduce lag and latency.

  • Because a system will always be bound by the slowest component, multithreading CPUs add to the complexity.

  • More pipelining can actually increase perceived latency though better frame rate might be achieved.

  • Again, because of the latency issues in dealing with a game like Star Citizen, especially considering the internet use, lockstep is not being used

  • Another way of optimizing is to decrease the distance on which an entity is visualized and thereby reducing the amount of tracking.

  • Often multiple approaches are used in unison to increase and optimize performance through good resource management.

  • Tech art and tech content tend to focus on the GPU budget of 2500 draw calls, and try to use LODs, detail meshes, and skinning to reduce the number of draw calls.

  • Damage tech is being reworked to have less debris, and therefore less extra geometry.  This reduces the number of draw calls required.

  • Another trick to limit draw calls is vis areas and portal culling - this essentially is only having the game render what you can see.  They are currently fighting some significant bugs with this as seen on the Caterpillar where you can frequently see through the walls.

  • Another improvement is moving from using voxels to using signed distance fields for local physics grids.  This is much more precisely and the actual shape can be described in much higher resolution. This will allow for much more ship-hugging shield tech.

  • Signed distance fields are also the first step in allowing ship wrecks to be fully accessible and therefore salvageable and explorable.

  • Network code on the server has been a big performance bottleneck. The main thread will get delayed because the network thread is too busy trying to send updates to all of the players in the game, causing everything to slow down or stall.

  • 3.0 included a lot of optimizations for network code including conversion to serialized variables and remote methods.

  • The overload of network instructions on a single thread has now been parallelized onto multiple threads.

  • Network code has never really been a performance issue on the client side - the issue has always been on the server side.

  • Bind culling is intended to address this mismatch.

  • Essentially bind culling will mean the client will not load entities for which the player is not in range and therefore will not spend any CPU processing time on them and will not get updates from the server about them.

  • The issue, and why bind culling has taken so long to implement, is what to do when a player does go in range of an entity which the server has not been updating. How do you create the entities without causing the game to slow down massively?

  • Object container streaming is the answer to the bind culling issue - needed entities will be predicted and loaded on a background thread so the main thread is not halted while trying to do so itself.

  • Full implementation of bind culling won’t be coming to the player until object container streaming is in the game but bind culling is needed beforehand for the developers so that they can work to eliminate all the problems it can cause - such as how do you point a player to a destination for a mission when that destination does not currently exist in the game (due to being culled out).

  • A lot of the performance gains that were anticipated with bind culling have been able to be achieved with serialized variable culling which essentially just tells the server to stop sending updates to the client about entities which are not within a certain distance of the player.

  • When designing a game like Star Citizen, it becomes more difficult over time to optimize and improve performance as it’s being built due to the complexity and ambitions the game is trying to achieve.

  • One of the issues developers have had for awhile was not only trying to simulate a proper load on the server, but also get enough QA and other devs to participate in internal playtests.

  • With 3.1 they’ll be introducing two new tools in order to improve the data that’s being captured and be able to seek out where the performance hits are coming from.

  • They’ve designed headless clients which are able to simulate the random player actions in a server, and multiply that by the server player count. This allows developers to quickly populate a server internally and get a more accurate idea of what the server load will be like on live.

  • Secondly they’ve designed a tool that is able to capture data more accurately than before and give them detailed information on the specifics a player is doing instead of getting a log of data that has to be combed through to get to what they actually want.

  • When it comes to new features coming online, there usually is a performance drop, but with the new tools they should be able to quickly get performance back to the level players found acceptable and improve upon it.

  • With Star Citizen if they built the game as a regular studio where they didn’t have to have a playable build throughout development, and only polish it a few months beforehand then production would be way different, but because they took the approach they did, there’s a level of complexity involved in order to make the builds playable, while continuing development.

  • As Mark Abent puts it, there’s no one solution to fix all the problems, there’s billions of possibilities that incrementally improve performance and stability over time.


Full Transcript

Intro With Sandi Gardiner (VP of Marketing), Sean Tracy (Tech Content Director). Timestamped Link.

Sean Tracy (ST): Hello and welcome to another episode of Around the ‘Verse. I am Sean Tracy, Technical Director of Content, Tech Art, Tech Animation, General Tools, and Engine Tools.

Sandi Gardiner (SG): And I’m Sandi. Sandi Gardiner.

ST: So, this week we bring you a very special episode. We’ll get to our monthly Star Citizen project update, followed by a look at the ongoing optimization processes and Jesse Spano will learn about the dangers of too much caffeine.

SG: What?! We’re joking of course.  But this episode is special as it is our first to be filmed in front of a live studio audience.

--Cheering from the audience--

ST: That’s right, we’re very excited to welcome some of our subscribers, joining us here today in our LA studio.

SG: They’re getting an up close look at our studio as well as the absolutely thrilling world of ATV hosting segments.

ST: And that’s not sarcasm folks. That is.

We’ve already had two PAs storm off set and just minutes ago Jared flipped the craft services table.

SG: We have craft services?

ST: Not anymore.

SG: Of course not.  All joking aside, we do have a lot to cover in our Star Citizen project update from March. Let’s check in with Ricky Jutley and see what’s been keeping the devs busy since our last update.

ST: Take it away Ricky!

Project Update With Ricky Jutley (Producer). Timestamped Link.

Ricky Jutley (RJ): Hi and welcome to this month’s update on progress for the Persistent Universe.

The UI Visual team have been working on the visor vehicle status holograms which have been extensively rewritten to utilise the new RTT technology developed by the render team. Previously vehicles displayed on the HUD visor were constructed using the same hierarchy as a separate entity from the vehicle you were viewing. With the introduction of RTT the vehicle seen on the visor - or MFD - is now actually a rerender of the full vehicle. This means rather than being an imitation of the target, it’s now a true reflection of what’s in world or space and accurately reflects all the real parts and attachments like the thrusters and weapons.

This was accomplished by building a list of the render nodes that were selectively rendered to a render target using a customer material shader we can apply the colour tinting to visualise the damage status of the vehicle parts whilst avoiding some of the more expensive features of doing a full physically-based forward render. This render target of the vehicle can then be rendered whenever the player selects the “own” or “target” view on the display.

This was hardwired to exist on the helmet visor in 3.0 but in 3.1 you’ll be able to enable it on the MFD displays. The visor and MFD displays all have their own render targets - which the 2D UI gets displayed on - plus and optional camera that can show a vehicle too. Yes we have a render target object rendering other render target objects. It’s all very inception and can be quite confusing working out who’s rendering what some times.

We also use the same RTT technology when rendering the communications system often referred to as “comms”. Instead of selectively rendering the vehicle we actually render the full scene camera of the pilot to a render target so the player can see who they are communicating with. This comms render target - similar to the vehicle visor status holograms - can then be displayed on any given displays render target whether that is a vehicles MFD screen, the mobiGlas or the visor.

One thing we’re introducing for the 3.1 release is the ability to dictate the camera angle when viewing your vehicle or the focus vehicle. This allows the player to view them from a series of fixed camera angles such as top, bottom, left side, right side, etc. This is the addition to the live view which shows the vehicles orientation from the pilot’s perspective.

Another piece of new tech we added for 3.0 - and are improving for 3.1 - is how the vehicle displays are described. Initially we started with a fixed list of displays which were available per dashboard describing each of the displays available. For example, the Gladius pilot seat has one head’s up display and six MFD screens. We have now added the ability to connect additional auxiliary displays.

We currently utilise this by connecting the visor display and the 3D radar display. This feature will eventually allow the player to customise their cockpit by allowing them to add or remove auxiliary displays as they desire. When the visor connects to the dashboard it plugs in as multiple additional screens for the UI to utilise. Meaning we can pipe views from the various controllers to these optional visor screens. Some examples would be the target vehicle hologram, the energy view, and the weapon view.

The Procedural Planetary team continue to make improvements to the the planet tech. We added a new way to support multi-channels while improving the colour blending and breakup. All of which operate at the same runtime performance cost as the current tech. Transitioning from the ground to orbit is now a much smoother experience thanks to the strides made in colour blending.

We also separated the ecosystem procedural distribution set up from the rendering channel’s art set up which mean that a large number of these changes will be mostly visible on Yela on the 3.1 release.

Finally we also improved the existing tools to reposition all planetary outposts automatically.

The Environment team are finalising the last few pieces of the truckstop exterior and are incorporating the new common element hangars with work on lighting, polish and advertising to follow.

For interiors the ProceduralLayout tool is now automatically generating lots of successful layouts for us overnight. This is using only the base library of rooms available but we are already seeing some unusual and visually diverse layouts. We estimate we’ve crunched through one thousand layouts so far during the initial period of development. Not all have been suitable for use but it’s a good indicator of the amount of variety we can achieve using the tool.

We have a number of ships and vehicles approaching the flight ready phase -  and the VFX department have been busy adding effects to them - including the Anvil Terrapin, the MISC Razor, and the Tumbril Cyclone .... as well as the Aegis Reclaimer.

They’ve also been focusing on impact reworks for weapons such as the Geminii R97, the PRAR Distortion Scattergun, and Scourge Railgun.

During their recent sprint EU Gameplay Feature Team 1 completed all their new task work for the PMA/VMA and are now in the process of bug fixing and polishing for inclusion in 3.1.

Another big push for the next release was to replace the bespoke customisation screen we currently have for Star Marine to bring it inline with the codebase already shared between Arena Commander, and the loadout apps in mobiGlas. This will allow us to only update on single piece of code that is shared across all loadout customisation screens. It’s a big saving for code as we only need to fix a bug once rather than multiple times. This means that each interface for the players only need to learn a single system so any user experience improvement will be felt on all loadout screens for the future.

The appearance is admittedly still very rough as we’ve been focusing our attention on making sure the code works correctly and that the functionality is not lost. In fact switching to the new interface means that players can now mix and match light, medium, and heavy armour pieces individually. With the weight of armour playing a more important role the stability of Star Marine players to customise various parts adds and extremely good level of strategy to the game.

The code work for this functionality is now in place so the UI team will begin updating the visuals and bringing the quality over the next week for 3.1. At the same time the UI team is also hard at work on wider visual improvements to the mobiGlas app itself along with prototyping the new layout we showed a few weeks ago on ATV. We’re also looking at ways to improve the visibility of mobiGlas in general. One of the big improvements we are hoping to achieve is for the mobiGlas to occupy more of the screen space in an effort to increase the legibility and clearly display the many visual elements.

Finally the team’s looking to bring up the quality of the apps by adding visual improvements to the rendering rules and shaders applied to the 3D models which should create a very good, noticeable improvement to inventory management to 3.1.

The Tech Art team in LA has been continuing to work on the Character Customiser. A new looping idle animation was chosen to provide some variation in body movement. Instead of the single deadpan facial animation, a selection of more expressive facial amins were implemented and randomised in Mannequin to bring more life to the character model. In addition all selected options now have work in progress icons with the technology needed to display preview icons for selectable items - like skin tones and hair colours - which will now be complete. The final polished icons for each selection types are planned for the near future.

Next the non-head body parts had a specular highlight pass by the Character team - after the team decided that the heads needed a polish pass to match the rest of the body parts - as well as a lighting pass to add fidelity to the head detail.

Another area of focus is the transition between the head and body textures. Previously a visible seam was noticeable between the head and the torso, so the team worked on making the transition much smoother thus removing the seam.

We have also had significant improvements to the camera system. The field of view, camera position, and the depth of field parameters were adjusted to create a more dramatic presentation of the selection character in the environment.

And finally for this team various issues with persistence - specifically saving and entering the PU - have been addressed. Some bugs popped up  while working on the Character Customiser - like selecting items, saving and then entering the PU. Other issues such as quickly skipping through selections or cancelling were also discovered and addressed.

Teams in Austin and Los Angeles have been working on the Service Beacons which marks the beginning of the player generated content. Although this is only the beginning, for 3.1 we’re planning on allowing players to pay each other for services such as personal transport or combat assistance. Once a contract is accepted Quantum Travel markers will be created on the contract initiator so the service provider can easily get to them. Both parties will be able to see where the other person is while the contract is active. Either party involved in the contract can cancel at any time but be warned you’ll be able to rate the other person where contract completion is not easily determined. For instance when does combat assistance end?

We’re close to finishing our second sprint on this feature. We’re still trying to get the backend services for the system online and we’ve been running into a few issues where the system is not yet communicating properly with the Diffusion services that keep us multi-server and multi-thread safe.

The Character team spent some time fixing a few bugs that were preventing helmet lights from operating as expected. Such as one example where lights persisted when you take off your helmet. The team has also been working on an annoying bug where players were able to wear their hats and helmets at the same time.

That’s it for this month’s update for the work being done of the Persistent Universe. As always thanks to our subscribers for sponsoring this and all of our shows. And of course thank to all our backers for continuing to support the development of both Star Citizen and Squadron 42. We’ll see you next month.

Back to Studio With Sandi Gardiner (VP of Marketing), Sean Tracy (Tech Content Director). Timestamped Link.

SG: Thanks Ricky! As you can see from ship HUD improvements to planet tech to mobiGlas enhancements, the devs remain busy with the push towards Alpha 3.1.

ST: Yep, there are many features and upgrades like these planned for all four of our quarterly releases throughout the year. One item that will be particularly focused on now is performance optimization.

SG: Yes, these ongoing improvements to frame rate will continue to be developed throughout the Alpha phase and beyond.

ST: Let's hear more about the entire optimization process from the team that's been working on it in this week's feature.

Optimization With Mark Abent (Gameplay Programmer), Matt Intrieri (Technical Artist), Clive Johnson (Software Engineer), Rob Johnson (Lead Gameplay Programmer), Christopher Bolte (Principal Engine Programmer). Timestamped Link.

Mark Abent (MA): Optimization … one of the big... big, big, big things that we're trying to focus on this year. Biggest problem is we have a … We're a space game, right. We have a lot, and I mean a lot of entities. Too many to count, and I'm sure someone put a number somewhere. The more entities you have sometimes they need to do some complex logic, and they may have to update to add a timer, check a timer, do something to continue the game going forward, and if you have a lot of these it could be a very problematic thing if everyone has to update these every frame. If there's 5,000 things that's 5,000 things someone has to do before the next frame can occur.

Christopher Bolte (CB): For a bird-eyed view a video game is the realization of a simulated virtual world constructed out of a basic update loop consisting out of a few simple steps.

Sample Input

External inputs are examined at the beginning of the frame so that the game can react to those. Such input can be keyboard, mouse, controller input, but also network data in case of a multiplayer game.

Update World-State

The virtual world has a state. Such a state is a multitude of things like the kind of objects which exist, their position, their color, values, and so on. This state can change over time. For example, an AI can walk from point A to point B during the steps that change over time are computed.

Lastly, Render Scene

The last step during each frame is the rendering. Here it is data mined what is visible to allow the player a look into the simulated virtual world. Objects which are data mined to be visible are then drawn on the display. We are focused on the update simulation and render scene steps, as those are the performance critical ones. Without going into too much detail it can be said that each of those steps refers to a fixed amount of operations, mathematically or logical, to update the world-state and refresh the displayed image.

If all those operations can be done within 1/60th of a second, in other words 16 milliseconds, we will do the world update loop 60 times each second resulting in 60 fps or frames per second. But if this operation take 50 milliseconds, the game will refresh the scene only 20 times a second resulting in 20 fps. The real point of performance optimization is to find ways to reduce the fixed amount of operations to be able to execute them within the target frame rate on the target hardware. To better explain what performance optimization consists of in modern games, the previous mental model of the basic update loop must be extended by some more details. Though the upcoming explanation will still omit many parts, it will still give a more accurate explanation of the interaction of various type of components like the CPU and GPU.

To go back to the initial example, a single frame looks like this. Here we use an imaginary Execution Unit which does all the processing for the world update.The input sampling is omitted as it generally doesn't affect performance, but nowadays practically each computer has a chip specifically developed for rendering, the GPU. Thus, the work is normally split between the CPU and GPU. It is immediately clear that such a setup is not very resource efficient. The GPU and CPU are running in lockstep and only one of the two chips is actively doing computations at a time. We can improve this situation by letting the GPU work on one frame while the CPU updates the simulation of the next frame in parallel. This pattern is called pipelining and appears everywhere when working with computers. For example, in watching online videos the next few frames are downloaded while the already downloaded frames are processed and displayed in parallel, but pipelining doesn't come without implications. When the CPU produces the data faster than the GPU can process it, the GPU will lag more and more behind. This lagging behind can be perceived as additional latency. On top we need to store the CPU computed state somewhere in memory while the GPU is not ready to process it, and as no device has infinite memory, we need to stop the CPU at some point and allow the GPU to catch up as we cannot store more state forever. Here we speak of the CPU being blocked waiting for the GPU.

For Star Citizen, we generally opt for better latency and only allow the GPU to lag behind up to one frame. The same problem can happen in different ??? if the consumer is processing the data faster than the producer, which is called starvation. In general one component will always be slower than all other components, which implies that the game will always be bound by something – here CPU or GPU. Thus, the first code of performance optimization is to identify by which component the performance is bound. The frame rate won't be affected if you optimize GPU performance while the GPU is already starving. This approach is also called critical path analysis as it helps to identify the path which dominates the performance.

A similar complex situation emerges with multithreading CPUs. Each CPU now has multiple execution units which we will refer to as hardware threads. To cope with this, strategies needed to be found to distribute operations over the different hardware threads. The obvious choice is to use the same pipelining operation as with GPUs, but there's two large drawbacks on CPUs. Work balancing: As shown in the GPU/CPU example, the system will always be bound by the slowest component. To pipeline the CPU work, you would need to find chunks of work which are logical together and always takes the same time. This in itself is very hard to impossible in a dynamic game and with the increasing number of hardware threads/CPU cores you would soon run out of systems to put beside each other. We would either need a specific setup for each different CPU core count, resulting in a maintenance nightmare, which is why our chosen approach is to batch the operations and to utilize the inherent parallelization found in video games. For example, there is rarely only one single character to update, so we can put each different character onto a different CPU core. The same happens with any kind of objects which we have. This pattern allows us to scale more flexible with the CPU cores. The process to change all the game logic is ongoing, and we are moving more and more parts of the game logic over to this approach to utilize more CPU cores more frequently and improve all performance.

Input Latency

The root of the 60 fps discussion in the context of games is not only about how quickly a game frame refreshes and its visual quality. It's also about input latency. Input latency is normally measured in frames as each frame meets a fixed amount of operations regardless of the underlying hardware. So, if we need three frames from user input to displaying an effect, the input will have an input latency of three times 16 milliseconds resulting in a 48 millisecond latency when running with 60 fps. If our game runs with 30 fps which means a frame duration of 33 milliseconds, it will have an input latency of 99 milliseconds nearly doubling the time between input and the visual reaction, resulting in a less fluid impression of the game. Adding more pipelining here would only increase the perceived latency which is why we try to not depend on pipelining for performance on the CPU even if it would result in a better frame rate as it would cost us worse and not really acceptable input lag.

Clive Johnson (CJ): One of the questions we get asked quite a lot is whether a server having a target frame rate of 30 frames per second and clients having a target frame rate of 60 frames per second. How is that possible? Why aren't they tied together? And the reason for that is that we're not operating in something called lockstep, and lockstep would be where the server sends an update and the client has to wait until it receives an update before it goes, “Okay this where this entity should be.”, and then has to wait for another update and goes, “Right the entity moved to here.” and so on.

What we actually do is the server will send it's updates at 30 frames a second, the clients rolling at 60 frames a second in theory the client should receive an update for entities every two frames, so it has to kind of guess what's happened in the middle frame. So basically what it's doing is it's locally simulating all the entities. So an entity moving across a flat plane at a fixed speed and a fixed direction it can guess where the entity should be at each point, and it'll get the next update from the server and go, “That's slightly out from where it should be. I need to nudge that one a little bit over to the left.”, and a little bit later down the line, “Oh the entity is turned a bit, so I need to nudge it a little bit over the other way.” So that allows the client and the servers to run at different frame rates, and we don't have to worry about doing something like operating in lockstep or anything like that which … which can cause big issues, cause of latency on the … on the internet and how that varies quite randomly all the time. It'll be quite a bad experience. It's alright for things like turn based games, but for something like a live action game like we're trying to do it's just not going to work.

CB: Reducing the amount of operations can be achieved in multiple ways. Which one is used is mostly depending on the current situation, time to implement it, and so on. Most often multiple approaches are also combined. Detecting objects too far away to be observed by the player: Here we need to decide after which distance we want to disable objects, what to do at the gray area where objects are barely visible, how to make it look like objects moved while the player wasn't looking at it and values other corner cases. Let us assume a system where we need to keep track of 1,000 objects in a single list. If you delete such an object you must remove it from all lists. The straight forward approach is to check each entry in our list for our object. When we find it we will move the object. Afterwards we copy each following object in the slot before it to fill up the hole from removing the object. This will always require 1,000 operations.

Such an approach can work well for a small list, but if the list is large or if you move objects very frequently it would take a quick chunk of our precious operations budget, thus we need a smarter solution. Inside each object we will remember the position inside our list, when we know when we move an object we immediately know its position. Then we can take the last element on the list, place it into the hole and update its reference to the moved object. With this we are down to four operations from 1,000 which is a nice performance gain. With the other change to behavior as you know we are the objects in our list. Such freedom can be okay or not depending on how the objects in the list are used. This was only one example of an optimization of a very small part of the world game. In general the world optimization process is iteratively finding where the game is bound - CPU or GPU or which CPU thread batch and come up with a better solution to use less operations for the most expensive sections.

Matthew Intrieri (MI): There are many moving pieces to this puzzle. Generally the tech content or tech art people will be focused on the GPU side of the thing, the number of draw calls. We have about a budget of about 2500 draw calls for all ships on screen at one time and a ship can be anywhere between 500 draw calls to 800 draw calls at its highest resolution. So, we try to optimize that as best we can. We have a few tricks in our bag to do that. One of the things is LODs - we have artists creating level of detail meshes which reduce the number of polygons and reduce the materials as you go into the distance. These are being hand done and also generated by our Simplygon tool.

We're also reducing the damage we're doing or reworking some damage on old legacy model ships like the Mustang. When you blow up a ship you have all these little pieces of debris that used to be hand modeled. Now we actually just break the ship hull in half and and change the shader so you see some what we call UV2 damage. This reduces the number of debris items and therefore reduces the number of draw calls quite a bit.

Draw calls are like the material passes on a mesh. So, one of the other ways we can reduce draw calls is by combining meshes together in what is a process called skinning. As I work on landing gear, I can take many pieces of the landing gear, pistons and gears and the feet, and combine them into one mesh and I drive that all through a skeleton so instead of 10 pieces with 10 draw calls each, you get one mesh with 10 draw calls. So, really a good significant savings there.

Another trick that we do is with our vis areas and portal culling. So, when we're on the outside of the ship we don't want to draw the interior and even when you're in another room in the ship, we don't want to draw the next room. You can think of portals as like a door - when it's open it draws the inside of the room, when it's closed it culls the room and the geometry out. We use our vis area portals to connect to the doors which animate, open and close, and then drive the portal to turn on and off.  And like say the Caterpillar, those doors are skinned so they're optimized and they have portals on them. You will have noticed there are some issues with the portals recently and we are working on about four or five major portal issues but we will get them and it will be faster and cleaner and you can walk outside again and then see the exterior in space by standing on your deck.

Another thing that's being worked on right now is called signed distance fields and this is a technology that Chris Raine is working on. It's a different way of describing a volume by recording these distances. So, it stores distances and what we've done to this point is use a volumetric approach where we have these voxels and we fill the ships with voxels and this gives us our local physics grid but when we move to a signed distance field it'll be much faster, we can describe shapes in a much higher resolution, and we'll know more immediately if you're inside the ship or outside the ship, colliding with the ship or not. This is gonna open the door for new shield tech - a new visual on the shields - so instead of a bubble that's kind of loosely shaped around the ship, you'll see a very tight shield that pushes just out from the plates of the ship.  

Also one of the things we're excited about is when we break the ships in half, the signed distance field is like a first step in getting us our multi physics grids, so we'll be able to keep the interiors around and then allow you to reclaim and walk around in the ships after a battle, so you'll find new items and things that way.

CJ: Network code was a big performance bottleneck on the server and the reason being that for the 24 clients that we were supporting in PU at the time had to gather all the information from the simulation, had to write it out 24 times to each different channel, and the way it would do this is if you think about different threads, each CPU doing a different amount of work, you kind of have the main thread running along this way going like - there's a frame,  there's a frame - and at each point in the frame, towards the end of the frame, it would go right now - network code go off, do your thing, write some stuff out - and the network code will go, “Okay, right!” and start doing stuff and as long as it had finished before the main thread started the next frame everything was fine but if the network code was running a bit slow then it would stall the next frame on the main thread. It caused that to wait until the network thread had finished.

By the time 3.0 came out… well it came out with quite a few of the network changes that we'd been planning. So, we had had the introduction of serialized variables, that's been in for quite a while, by the time 3.0 came out pretty much all of the game code had been converted from using the old Aspect and RMI systems to the replacements which is serialized variables and remote methods and what that meant for us is that we could focus our efforts on optimizing one particular code path and we just leave the legacy system not performing as well as we'd like but because not so much stuff is using it, it's not really an issue. The big difference we made with serialized variables is that the fetch from the game, saying, “Okay, what's the state of everything? What are the things that have changed and we need to update? What’s your current state because I need to send that across the network?” That was parallelized, so instead of the network thread doing all the work at a particular sync point with the main thread it would just go, “Okay, how many threads have we got available? Okay, all of you go and grab some data and send that to the network.” Then, the next part would be, “Okay, now we need to take that data and we need to send it off to each of the different channels so that they know that they need to do some work.” The channel being the network code’s representation of a client. So, then that was parallelized as well and then the final part that we parallelized was the, “Okay now all the channels have got the data that they need to send, we need to actually go and send it.” So, that was parallelized as well so we went from one thread doing all the work to splitting it over as many threads as we had available.

So on the clients we've kind of got the opposite situation where they're receiving all these updates from the server, it'll come in through the socket, it'll get processed by the network thread, and then there'll be a sync point with the main thread, it'll go, “Okay, right, network thread’s ready, main thread’s ready. Here's all your data. This is what's been received from the server, right. Go off and do your next bit of simulation.” That part has always been pretty quick to be honest and on clients were talking about roughly a millisecond to process all that data even though it takes proportionally quite a lot longer to write the data and package it all up on the server, the unpacking of it has always been quite quick. So, the network code itself has never been a performance issue for clients because they're mostly just receiving data from the server and what they actually send is so much smaller than what the server sends that it's never really been an issue for them at all.

So, bind culling is supposed to address this mismatch between client and server. The sheer processing power that you have on servers and their ability to process everything and the much smaller processing power than most clients will have. The idea behind bind culling is that a client is only really interested in what's immediately around it, anything that's on a different planet or a couple of hundred kilometers away and even out in space if there’s no way you could reasonably see it - it just shouldn't be there and by that I literally mean that your client should not know about it at all. That would save all the processing of those entities on the client; it would save a bit of processing work on the server because it doesn't have to communicate information about those entities to your client. It's definitely something we want to do, it's just not a particularly easy thing to do. You need a way of controlling the updates on the server so that it's only updating not all the entities it knows about but it's only updating the ones that it knows are in range of any player at all, so the ones that no player can see just don’t update and it saves processing time on the server and we've had to change assumptions on the networking code side where previously it would say, “Okay, I've got an entity, it's a networked entity, every client needs to know about this,” there would be be assumptions all the way through the network code that relied on that being true so we'd have to change all those and so it's a bit by bit process that we've kind of unraveled stuff and add new stuff and optimized stuff that would allow bind culling to be possible and we're almost there. The steps that are left are really - if you imagine the situation that you've got bind culling in and entities that aren't immediately around your clients don't exist, what happens when you fly to a different location? So, you quantum travel to a different planet. You need all those entities to be there, so they have to be spawned. So, say I was going to tell you about those entities - information comes over the network and then your client has to go and create them. Up until now, all of the entity spawning has been synchronous or what we call blocking which means that the point in the code where it says that I need this entity, it'll basically stop whatever it's doing, it'll go and load the data in for that entity, create the entity itself, register it with physics and whatever else it needs to do, and then go,”Okay, this function can exit. Let's carry on and do the rest of the game. So, we need asynchronous entity spawning which is also known as object container streaming. The idea being that a client will get a message from the server and go, “Okay, I need this entity,” and it’ll go off and go, “Okay, well... what files does this entity need to load - what data does it need to load - so I'll go off and do that on the background thread and then go, “Okay, I finish loading that. Okay, now I can create the entity,” and it's not stalled the processing of the game - it's not stalled your client at all while that’s going on. You can continue to play.  So, obviously that takes a bit longer so the trick is to try and make sure that you finish spawning entities by the time they actually appear so you've got to kind of preemptively decide to spawn stuff.

So, we need this object container streaming - that's the technology that's gonna allow us to do the streaming of entities in - in order to be able to make bind culling work. So, what I really want to do is develop the bind culling technology finish it off as early as possible so that in-house we can be testing it and making sure that it works, finding all of the problems that it's gonna cause. Perhaps an example of that would be, say you've got a mission from a mission-giver - Miles Eckhart - and you need to go and find Miles. Miles is sat in his bar in Levski and you're on Port Olisar, so we need a little marker rendering on your client to show where Miles is so you know where to head, but with bind culling the Miles Eckhart and seat won't be there. So, what to be draw the marker on? So, they’re the kind of problems we need to try and figure out ways to deal with. There's a couple of different ways we could deal with that situation but until we do we can't really put bind culling in because it could break the gameplay.

Probably won’t see the full implementation of bind culling in until, as I said, we get object container streaming in. But, for us internally, we need to get it as soon as possible so that we can we can move on with some of the other stuff that we’re planning. So, what we've kind of done which is another thing - so I've mentioned bind culling, I’ve mentioned object container streaming, then there's also serialized variable culling. Because we can't do bind culling just yet, because we can’t get it out into the hands of the backers, serialized variable culling is what we can do. So, the way that works is the same sort of distance checks that we're doing before and if an entity is too far away from the client, the server will just say, “You know what? I'm not actually going to send you any updates for that entity and what that will mean is that because the entity is not receiving any updates on the client, the client can go, “Okay, well I can put this entity to sleep. I don't need to do any CPU processing on it. It can just stay where it is so you can get a lot of the performance gain that we'd hope to see from bind culling from this serialized variable culling.

CB: The size of our games are generally not very optimized during development. It's very clear that the faster version is more complex and has more state, even this small example. More complex code means more places where something can go wrong

resulting in a bug. Because of this, you always have to decide on the trade-off between production speed and performance. If we focus everything on performance, feature development will be slowed down a lot including future features as they need to be developed against a more complex code base. To make it even worse we could optimize all this code just to later realize that the specific part was not really needed or we need to implement something else entirely as a feature was simply not fun. There's a famous quote: “premature optimization is the root of all evil”, in other words we need to view the final system to correctly understand and analyze it to make meaningful performance optimizations, but so much for the theory. The reality is our game can already be played and playing with 15 fps is no fun.

To help the code leads and production to decide which features optimization should be focused on, we perform a performance analyzation. Sometimes features need

optimizations to work or a feature is needed to implement an optimization resulting in dependencies between developers making the group progress more complex. Other times code changes are trickier than expected or introduce too much risk shortly before release resulting in a trickle down effect of delays for other optimizations or features.

We use a set of different tools for performance analyzation. The very basic tool we are using  the classic sampling profiler. Such a profiler works by stopping the CPU at a fixed time interval and recording what the CPU was doing at that time. If a function needs a lot of

operations to have a higher chance to be executed when the CPU is sampled resulting in a higher sample count which indicates a high cost function. One drawback of a sampling profiler is that it only shows the function where the time was spent, but no context of why the function was executed, on which CPU, in relation to what other code or high-level state of the game like the number of active vehicles. For this we mostly use an instrumented profiler which shows executed function at a lower granularity but shows what CPU core was executing when. This profiler can either show us plots of certain performance values we are interested in. Those tools  are very useful to analyze a specific situation, but Star Citizen is a different beast. We have up to 50 players and many more planned on a single shared server which has to execute the physics simulation. This means what any player is doing can affect the other players as well as making the server slow, causing it to run more operations which can be observed indirectly as jumping movements on the clients, rubber banding and so on. All effects of the server not sending updates frequently enough. Then, we need to replicate the server behavior.

Robert Johnson (RJ): We’ve also been looking at some of the telemetry frameworks that

we use to actually help us visualize and track what the actual performance issues within the game inside the codebase actually are. Some of the improvements we've done to the telemetry for example allow us to capture data automatically during our play tests. The day it would automatically be captured when the framerate drops below a certain threshold.  So for example we can use this kind of auto capturing to make sure we grab moments where we see these nasty bugs occurring which cause big performance drops. We would then take this data onboard analyze it as a team and then look to see where the problems were lying, the various problems could then be either addressed by some of the engineers on the performance team or in certain cases we might actually find that the ownership of the issues in question will be from engineers on other teams so in that case we'd have to coordinate with them to look at fixing up certain issues that that might have cropped up in the play tests.

It's currently quite early stages for the auto capturing and with the new telemetry system but the initial tests seem quite positive so we're hoping for three one that we see in a lot of these nasty issues a lot sooner and we're able to fix them up in plenty of time for the actual release.

Some of the other things that we've looked at adding to trap the performance are some auto tests. The auto tests would be run by a test build machine once per internal build that we do, these internal tests would be something like having an automated player walk around a busy area of the game such as Levski or it could be have the player spawn in into a ship alongside 30 AI ships and these auto tests would track various frame rate statistics and allowing us to see as we go from build to build, is our performance improving,  have we accidentally pushed the performance back a bit in which case these tests allow us to catch issues as they arise and whilst they don't necessarily give a perfect measure of the performance that we're likely to see in the live environment they can at least give us a good indication that we're moving in the right direction rather than us putting in a lot of optimization code changes and not actually knowing whether the changes having the desired effect.

So for three zero one there's really two good ways at the moment for us to gather data from  which we can decide what needs optimizing code wise. The two ways that we have at the moment are taking the auto captured data from the internal play tests we do with up to 50 internal QA and a handful of engineer's participating, and the other way we would look to gather data to analyze to see what we can optimize is just by doing local tests around machines. When we run those we can spin up a server, connecting a client, we can potentially can in headless clients should we want, we can play the game fairly normally but with the additional the debug commands that are available to us we can do things like like jump outside Olisar, instantly put ourselves in a ship, we can spawn 50 AIs and so we quite quickly can put ourselves in it in a scenario that in the real releases would see the framerate stressed more than just running around as a single player in all us are playing on the wise empty server.

With those different situations I'd also tend to do things like spawn 50 AI, then move the player away to another location in the game that's far far away from Olisar because then in that situation we'd sort of expect really that the performance should should be better because there's now a bunch of vehicles on the server but those vehicles are now so far away from the players vehicle that we shouldn't need to update them on the client with the same level of detail that we previously did when we were sat right next to him in Olisar. So we'd be running these various different scenarios locally, taking a look at a data and by analyzing it we sort of look out for instances like that where we could get some clear gains by not updating things that didn't need to be updated, we could also take that data and look into things like what are the most expensive functions in the game what, what are the functions that are being called most frequently, which functions take the longest to run, which functions do a lot of allocation and deallocation of memory.

We could also look at functions that were coming from code that we didn't actually necessarily even need to run for various game modes or code that had been there since we inherited the engine. So we would kind of look in it looking at ways to improve pre-existing code that we did need but also often some of the biggest wins will be actually removing or refactoring older systems that we didn't need or didn't need to use in the ways that we were currently using them for PU.

For example one of the issues that we've always had is being able to test with tester server with  a full load of clients. So particularly now we've gone to 50 players per PU map and you can imagine the amount of manpower it takes and you know the logistical nightmare of trying to organize 50 people in QA or you know volunteer developers to try and play test the game for a couple of hours and fill the service and get back performance data, and then it's equally, it's…. if you ask you know so you observe a slow down while you’re doing this test if you want to ask someone what happened you have to ask 50 people and you get 50 different answers.

Does this kind of yeah it's quite difficult to sell the tests and again the quality of information back that you require. So we're working on these things we've got called headless clients which are basically clients without players, little they're all dumb robots still press the keys in random ways there's no key, it was just a program that runs by its virtual keyboard it's just mashing the joystick and mushing the keyboard wiggling the mouse and it's smart enough that it can get out of bed and get into a ship and then just fly around and firing in random directions and that generates enough load that we can kind of simulate roughly what we can expect a player might do.

Previously the headless clients have only been run on developers machines and to get enough headless clients to help fill the server, they're being migrated onto virtual machines running in the cloud. So they have to be spun up on demand and shut down.

The other thing I mentioned was getting back a better quality of information so we've had telemetry in the game servers for a while now. We've had some telemetry from clients as well. It's been a bit of a fragmented approach into how that's being done so we're working with the engine team and DevOps again to try and get a more unified approach get richer information about what the performance is, where the bottlenecks are on each of the clients on the server, try and get more contextual information about what everyone was doing there at each point in the game so we could go okay those performance drops here what was everyone doing all right that guy catapillar blow it off lots of cargo okay that's wrong.

RJ: Really yeah with a traditional game you probably tend to find that the performance for the majority of the life cycle the development would be actually pretty low, probably fifteen frames a second or less, whereas the issue we have is an early access title if you like is that we've got to get our features in, write our new code, write our new systems, but yeah we have to get those systems performing in an optimal manner that allows people to play the game at a reasonable frame rate and enjoy it rather than having the luxury of only testing the game internally for the majority or the development cycle and then knowing exactly what content we want in the game, knowing what works, and then being able to optimize that in the last few months of the project. So really we do have an extra overhead of sort of continual performance optimizations that we need to look at, but that's all part of the great challenge that we've got here right in the code for what we all see is a massive groundbreaking title.

During development there's never going to be a line that we're in the cross and performance is something going to be fantastic. It's always going to be this incremental process of getting better and occasionally we're gonna get worse. So basically what's what's gonna happen is as new features come online the first thing they're probably going to do besides work fingers crossed, is make performance a bit worse and everyone’s measured the impact which we’re  going to try and optimize it back up again to the levels that everybody's kind of happy with but development being the key thing it's never going to get to that kind of 60 frames per second for everybody because it's more important that we spend the time on actually developing the game getting everything in getting everything working. So the kind of optimizations that we're dealing with now, you know there's a few small kind of micro optimizations making things go particularly fast but it's kind of system level optimizations so yeah we know we're gonna need bind culling, we know we're gonna need server meshing, we know we're gonna need the the entity component update scheduler, we know we're gonna need object container streaming. So these kind of system level things, these pieces have to go in so that we can build the rest of the game, but that's the primary focus important to me, it's to give us the tech to build the rest of the game.

MA: Yeah I guess the biggest take away out of this is there's never one answer in optimization there's billions I mean, that's why we have those profile markers in there and we need to see what's going on and work towards generating a solution to beef up that part of the thread that's going slow. Maybe we have to turn off some updates in the you know that are out of range used in serialized culling or maybe we have to optimize the way that we’re handling physics on the planet because this particular way takes a little bit longer. It's analyzing the data, figuring out where in a thread that we are having issues in and then finding a solution that works for that and doesn't break the game. That's probably the biggest takeaway in the biggest challenge.

Outro With Sandi Gardiner (VP of Marketing), Sean Tracy (Tech Content Director). Timestamped Link.

ST: Optimization affects everyone from devs to players.  We all consider performance implications as we develop Star Citizen and we even have an optimization strike team to keep up with our own development.

SG: That's right and with 3.1 around the corner we have a few ships coming online. The Reclaimer, the Cyclone, the Terrapin, and the Razor and on the concept front, the Aegis Vulcan remains available through the end of the month. Add one of these versatile support ships to your fleet and become a beacon of hope for distressed pilots everywhere.

ST: We also have a ship package that features the Vulcan along with some other Aegis vessels and a standalone skin pack available to everyone.

SG: That's right, so even those of you waiting to pick up a Vulcan in-game can secure these eye-catching paint jobs now.

ST: Make sure you tune in to this week's new episode of Calling All Devs and Loremaker’s Guide to the Galaxy up on the site now. And on this week's Reverse the Verse, Clive Johnson and Rob Johnson - two sides of the performance optimization coin - will be stopping by to answer your questions following today's show that airs live tomorrow at 8 AM PST.

SG: A huge thanks to all of our subscribers for sponsoring all of our shows and a special thanks to the subscribers live with us in the studio.

--Cheering--

ST: Thanks for being part of this very special episode and thanks to everyone for tuning in.

SG: Thanks also of course to all of our backers you make the development of Star Citizen and Squadron 42 possible.

ST: That's all for now. Until next week  we'll see you...

All: Around the Verse!

Sunjammer

Contributor

For whatever reason, this author doesn't have a bio yet. Maybe they're a mystery. Maybe they're an ALIEN. Maybe they're about to get a paddlin' for not filling one out. Who knows, it's a myyyyyyystery!

Desmarius

Transcriber

When he's not pun-ishing his patients or moderating Twitch streams, he's at Relay pun-ishing their patience as a member of the transcription staff. Otherwise, he can be found under a rock somewhere in deep East Texas listening to the dulcet tones of a banjo and pondering the meaning of life.

"If you can't do something smart, do something right." - Sheperd Book

CanadianSyrup

Director of Transcripts

A polite Canadian who takes pride in making other peoples day brighter. He enjoys waffles with Maplesyrup, making delicious puns and striving for perfection in screaming at the T.V. during hockey games.

Nehkara

Founder

Writer and inhabitant of the Star Citizen subreddit.