As per usual, anything said during the show is subject to change by CIG and may not always be accurate at the time of posting. Also any mistakes you see that I may have missed, please let me know so I can correct them. Enjoy the show!
CIG debated cutting critical foundational features from 3.0 but didn't to ensure they could deliver all of the 3.x series on a quarterly basis in 2018
Now that most of the uncertainty, R&D, and challenges associated with integrating new tech are out of the way the 3.x iterations will be more predictable
Switching to date-based from feature-based is going to dramatically change everything next year: no more holding back a version for a couple of features
3.0 was always going to be an anomaly and letting a critical piece of infrastructure slip might have prevented the release of an entire branch
Challenges: dynamic pricing and interdiction were being worked on right up to the end (and even into the holidays)
3.0 integrated a lot of new technology, new technology that hadn't been finalised, and performance could only be assessed once it was all glued together
3.1 primary focus is performance: identifying/addressing the biggest problems in terms of server & client frame rates
Single biggest feature in 3.0 is the procedural planetary tech: it dramatically changes the entire game
Now turning their attention to mining in a "significant fashion" as they are targeting 3.2 for the initial iteration
CIG were able to make changes during the holidays without a patch because those issues were controlled via the backend services
Probability volumes are areas of space in which events can happen to the player based upon inputs from the economy (i.e. flying through a probability volume that includes a pirate haven will likely lead to a pirate encounter).
Probability volumes hook up to the economy to provide a dynamic experience.
Probability volumes act as optimization as they allow for many different types of encounters without having the explicitly hard code them.
The back end simulations are planned to be hooked up towards the end of this year.
The team did a lot of work on the back-end for publishing 3.0, and it was a huge come-together between the publishing side, the QA side, the network side, everyone.
The delta patcher helped hugely
The team also have the ability to issue hotfixes without kicking all players off the servers; they can issue a hotfix, and then ‘spin down’ servers gradually, only resetting them when there are 0 remaining players.
This allowed the team to do many hotfixes, even during the Christmas break, all without ever kicking all players out.
Sandi Gardiner (SG): Hello and welcome to another episode of Around the Verse; our weekly look at Star Citizen’s development. I’m Sandi Gardiner.
Chris Roberts (CR): And I’m Chri Roberts.
SG: To give you a more in depth look into what we’re up to we have a new format for Around the Verse focusing on one topic per episode. Each month we’ll cycle through a studio update from one of our offices or an update on Star Citizen’s Persistent Universe, a Squadron 42 update and a installment of Ship Shape.
CR: Yeah, I’m looking forward to it. I think it will allow us to dive a little deeper than managed to last year on some of our ATVs. We’ll also be premiering a couple of new shows this year. Calling All Devs: a new Q&A series harking back to the days of 10 For the Chairman, 10 for the Producers, etc. And Reverse the Verse - which is coming back from the dead - which will be a live weekly talk show where our devs will discuss the most recent Around the Verse, other Star Citizen news. I think that will be kind of fun and interesting.
SG: Yes it will be and Erin Roberts is our first guest on the first Reverse the Verse which airs tomorrow at noon PST.
CR: In addition to the new shows we’ve been refining our Subscriber experience based on feedback from the community.
SG: Yes we have. And we’re kicking off a new year of perks today with series of Takuetsu action figures to collect and display in your hangar. January’s figure is a Vanduul warrior with Imperator level members receiving a special variant as well.
CR: Yeah and they’re very cool. In this week’s Studio Update we head to Austin Texas where the team recently finished up publishing efforts for Star Citizen Alpha 3.0.
SG: Let’s take a look at what went into getting the latest build live in the PU.
Tyler Witkin (TW): Hey everyone. Tyler Witkin - Lead Community Manager in the Austin, Texas studio - and I’m joined by Directory of Persistent Universe, Tony Zurovec. Tony how you doing?
Tony Zurovec (TZ): Good.
TW: Awesome. Now we concluded 2017 with releasing Star Citizen Alpha 3.0 to the live servers and there’s a lot that goes on behind curtains to get something like that done. So let’s talk a little bit about the run up to 3.0.
TZ: Probably a bit more challenging than what we would have liked. We would all liked to have finished a few months in advance …
TW: Of course.
TZ: … and been able to coast right up to the end of the year. In the end it came down to … there was actually a lot of debate as to whether or not we should cut some of the critical foundational features from 3.0 and push them into 3.1. But we really wanted to ensure that we got the basic skeleton for what we’re going to need for all of the evolutionary gameplay enhancements that we’re going to deliver in all of the 3.x series - 3.1, 3.2, 3.3 - on a quarterly basis next year.
And so we really … despite the fact that it really pushed us right up to the very edge we were determined to get all of the critical features - not just the big obvious ones like the procedural planetary stuff and everything like that - but I’m talking about the things like the dynamic shop prices. I’m talking about things like interdiction which is really necessary for things like cargo transport to have any sort of difficulty associated with them.
You had to have that functionality within the game - and the fact that we got that in there and we were able to work through most of the big … big complex issues puts us into a very good position going forward such that now most of the uncertainty, most of the research and development, most of the more serious challenge of getting these pieces of completely new technology integrated and functional within the game and talking to the other systems, with that out of the way it should make all of these 3.x iterations a lot more predictable.
It doesn’t mean that everything that we aim for ... ‘cause we’re always aiming fairly aggressively in terms of what we want to deliver because we want to see this just like the community in the game as soon as possible. And so we always try to push ourselves regardless of whether we think it’s going to be difficult or easy to make it we want to go ahead and slot it in there and try. And sometimes … sometimes we go ahead and reveal those more aggressive dates to the community.
I think that we’re being a bit more conservative - certainly in our planning and stuff - next year and the whole concept of going date-based as opposed to feature-based is going to dramatically change, I think, everything next year. All of a sudden we’re not going to hold up the entire version for one or two features that aren’t quite there. We’ll push out what we actually have. Which is going to make the game … each one of these iterations is going to push what makes up the game significantly forward. And so we’re going to be able to get the latest and greatest out to the community as quickly as possible and then if something does slide from, say 3.1 into 3.2, they’ll see it a few months down the road.
TW: Yeah and the community is actually really excited about that.
TZ: Us too. It’s painful from a development perspective to have so many things in there. Like I said it’s like for … 3.0 was a bit of an anomaly because we really wanted to get, like I said earlier, the structural foundation into the game. And we didn’t want to leave pieces out. And then we deliver something and yet there’s still big missing critical pieces of infrastructure that we were going to have to tackle on 3.1 or 3.2.
And because they are big critical pieces of infrastructure if they slide from 3.1 then all of a sudden it’s not just one or two things that might slip: you may have an entire branch of what you’re hoping to deliver to the community isn’t possible without that actually being in there. There are a lot of knock-on effects from things like the dynamic pricing in terms of the economy that if you don’t have piece of the puzzle working all of that gets pushed.
And so now that we’ve got the basic structure in there’s certainly the potential for little individual pieces here and there to slide from one 3.x release to another.
TW: But it won’t stop the content flow.
TZ: It won’t stop the content flow and the impact is likely to be much more localised than it would be if we were missing a really, really big piece of the plumbing.
TW: No, absolutely. So you talked a little bit about some of the … releasing 3.0 and how it tough, let’s say, at the end of December. What were some of the challenges you guys encountered?
TZ: Well I mentioned a couple of them there. The dynamic prices were right up until literally the very end. We had testing, some rudimentary testing, that was happening a few weeks before it went out the door but we were probably within two weeks of the timeline before it all finally started to click together.
Same thing with the interdiction stuff and you saw the outcome of this which is we were tweaking some of this stuff right up until ...
TW: Right down to the wire.
TZ: … the very end. And we finally got it in and it worked. There were a few minor things which … which we paid for with some late nights right on Christmas Eve, Christmas Day, etc. to where we had a very small number of people that just really wanted to make sure that what players were able to experience and enjoy over the holidays didn’t have any really egregious problems. A few of us just put in the time to go ahead and push out some … some temporary patches to make those things function considerably better than where it was a few days earlier.
TW: Yeah. It never ceases to amaze me. All the time I’ve been here I’ve seen consistently people almost saying, begging “Can I stay? Let’s get this out the door!” And it’s really a testament that we really want this just as much as our community does.
TZ: Yeah. And it’s actually interesting because near the end everybody wants to put as good a face, everybody wants make sure that what they’re working on is going to be perceived in the best possible light by the community. But as a company and you’re trying to close so many things down you really do have to be taking entire portions of the project and saying “Okay, this is … we can’t risk tweaking any more of this …”
TZ: “... because we may destabilize something.” And we’re doing these release builds and we’re trying to get the number of significant issues down the bare minimum.
TZ: And so a lot of this stuff near the end you’re looking at it, you see a problem, and you’re saying “Well it’s a problem. It’s not very big. The impact is going to be fairly limited. This is what’s it’s going to do to gameplay.” And so you’re weighing all of these and making the decision of whether or not you actually have to fix it for 3.0 or whether you can go ahead …
TZ: … and push it - to where we’ll deal with it - ‘til 3.1. Some of these things your able to just close something down such that you don’t necessarily fix the problem entirely but you’re able to patch it such that - with very little effort - so that it works well enough for 3.0 …
TZ: … and that buys you time.
TW: With the hard dates with the quarterly releases it’s really going to help that because it’s really going to help organise the entire process.
TZ: Yeah, yeah. The dates will change … the dates will changes everything not least the which is … 3.0 … 3.0 was always going to be a gargantuan release. There’s no doubt … there was never any question about that. So when you’re talking about integrating that much new technology and a lot of that technology hadn’t yet been finalised.
TZ: So there was still a lot of implementation issues and certainly you saw some of the … the knock-on effects in terms of “Well once you get it all glued together what’s the performance like? Oh the performance is not optimal! Okay, well we’ve got a limited amount of time to make that run as well as it should.” It’s not that we don’t see the issues. It’s not that we don’t want fix the issues. It’s just that by the time you finally get all these pieces interconnected and working, you’re able to test it …
TZ: … then time does start getting short. Obviously on the performance side - 3.1 - one of the primary focuses of 3.1 is going to be to deal with speeding the game up: finding the source of some of the biggest problems in terms of server and client frame rate and bringing those up substantially to improve the player experience.
The single biggest feature, I would say by far, is of course the whole procedural planetary thing.
TZ: This dramatically changes the entire game because all of a sudden we’ve got such a larger play field to … on which we can paint all of this content. In addition to that, what I personally like, is the fact that the types of gameplay that you’ll get in that environment - on that environment - is going to be significantly and dramatically different than what you’re going to get when you’re flying around in a spaceship.
All of a sudden, when you’re talking about things like first person activity down on a planet you’re talking about things like taking cover. You’re talking about … you’re much more exposed: you can’t take nearly as much damage as your ship potentially could. You’re talking about you may be dramatically outgunned - not because the other guy’s got a bigger or fancier ship - …
TZ: … but because he’s got a ship or a buggy and you’re on foot. So all of a sudden the magnitude of the differential between what different players - or a player and the NPCs - could be bringing to a given encounter, it increases exponentially. And what that’s going to do is open up a lot of interesting possibilities for the clever player ...
TW: Emergent gameplay.
TZ: … to take this and figure out - to use their brain - to figure out how best to exploit an opportunity to turn it around to where they’re at a disadvantage but because they come up with a more clever strategy they’re able to overcome in a way that another player that’s not as experienced wouldn’t necessarily be able to do. And so there’s going to be this whole aspect of getting to know more about the game and as you start to learn more about how things work of course you will be able to do … excel at whatever you’re trying to do within the game.
If you tailor this to something like mining there’s going to be … which is now, and I’m happy to say, that we’re now turning our attention to this in significant fashion because we’re going to try to get this out for 3.2 ... the initial iteration of mining and resolves … and resolve it to a level such that players are actually - kind of like the cargo transport - to where you’re actually able to go down on the surface. You’re actually able to scout, find areas where there are worthwhile concentrations of valuable ore - the value of which will be determined by what the price is at a given trading post at that moment in time. Sometimes the prices of a particular commodity may be higher. Sometimes it may be lower. And therefore the justification for you investing time and money and effort and enduring risk to actually extract these natural resources from an area and cart it back and sell it: it will ebb and flow along with the economy. So you’ll finally start to see some of this emergent gameplay that we’ve been talking about so long …
TZ: … actually come to fruition.
TW: Which is awesome. Now a lot of folks noticed in the community that we had some bug fixes and changes come in over the holiday break while the team was supposedly out of office. Which begs the question “How do those changes take place?” because sometimes fixes and changes require a patch and sometimes they don’t. So how does that work?
TZ: Yeah. The reason why we were able to affect some dramatic changes to the game for a couple of specific issues was because those particular issues were controlled via these backend services as opposed to the game per se. The game takes care of all of the ... I’d say the “high fidelity simulation” but it gets a lot of information from these backend services. And you’re going to see that, like …
For example in terms of shopping: the game server itself doesn’t actually know what the price of something is. It doesn’t know where certain commodities should … which shops should actually have them. It doesn’t know what the quantity should be. It doesn’t know what the relative demand is. It doesn’t know about any of that stuff: that’s all controlled via this whole back end side.
TZ: And so what the game does is it understands when it needs to request certain chunks of that information and it knows how to display it to the players to allow them to make choices. And then it knows how to take the outcome of those choices and plug it back into the service infrastructure so that it can have the desired effect. So when you’re talking about things like what we were working on Christmas Eve/Christmas Day, there were things along the lines of the environmental mission scenarios that were controlled via a probability volume service and the actual problem we were running into were that some of those environmental scenarios weren’t being properly cleaned up. So they were residing too long on the server so all we wound up effectively doing is basically tweaking the probabilities of some of those things so that we made them a little bit less likely to occur… you know, tamped down the number of ships in some of the scenarios. What that in essence did is it gave the clean up functionality that was running on the server a little bit more time to clean things up before the next, you know, scenarios on average would be getting creating. Basically it just evened out the demand load at any given time a little bit and so the interesting part about this is that as we push more functionality of this sort from the server back to these backend services. It makes the iteration much, much easier, all of the sudden we don’t need to basically do a big, gargantuan build of the entire game… run it through QA, do a much more massive test. All of the sudden we can be much more focused and specific in terms of what we’re testing, it’s like the same game version so we’re just going to tweak the server so we’re just going to tweak the data the server actually utilizes.
You’ll see a lot more of this, you know, we’ll have for 3.2 for example there will be a commodity service we’ll have, there’s going to be a service beacon service, there’s going to be… we’ll be changing fairly soon so we’ll have the air traffic control system go to a service. And so, in general it’s just taking certain key select pieces of functionality from the game and putting them into the back end where they logically belong and there are a number of reasons for this. You get some nice benefits, like I'm talking about the ability to more rapidly iterate and tweak this stuff but one of the big reasons why we have to do this is - this is necessary speak we're getting closer to the point to where we're gonna try to get this whole server mesh functionality going to where the game is not exclusively run on one individual server that knows everything about everything.
Where we're headed in the not-too-distant future is you know essentially going to be there's a solar system and there's hundreds and hundreds of different servers and each one of those servers knows about the key points - they know where the Sun is, they know where the planets are, they know where some major pieces are that they could actually see or that the player on them could interface to at any point - but in general most of their attention will be focused upon just the little localized area where they're at and this will give us all sorts of benefits in terms of performance, it'll give us all sorts of benefits in terms of a whole number of other things and you can start to understand why you need this when you think about - so imagine you've got a server and it just knows about a few key elements of the solar system but most of its knowledge is fairly local to the area that it represents but now all of a sudden a player that's going to go from that server to the far end of the solar system, he needs to be able to query, for example, all of the probability volumes that lie across this enormous expanse.
The problem is that server doesn't have knowledge of all of those probability volumes and so what you really need is a higher level system that does have that does have knowledge of those particular data structures to the level of fidelity that you need it and, like I said, the problems that we're usually concerned with on the back end are usually quite a bit different in concept from the ones that we're worried about over on the game side. The game side is really where, like I said, you take care of all the high fidelity simulation, all that type of stuff. The back end is, in general, more macro concepts.
TW: Okay cool. Now you keep mentioning probability volumes. I've heard that term thrown around a lot. What is a probability volume?
TZ: A probability volume is really just an area of space that dictates what you could potentially see within it and by that I mean - are there a lot of pirates this area? Should you be seeing a lot of cargo transports going back and forth? From where should they be coming and to where should they be going? If they're headed in one direction should they already be loaded up? If they're going in the other direction should they be empty? If they're going somewhere and they actually have cargo, what type of cargo should they have?
This is all dictated by these probability volumes and the way the workflow generally works is that the designers set up these so-called environmental mission scenarios. We refer to them as missions. They're not missions as you would normally think of in terms of - you have to explicitly accept something and there's some sort of definitive reward that you're gonna get when we say environmental mission scenarios what we're really talking about is just the world happening as you're driving by. So, the first step of this is that the designers create all of these environmental mission scenarios and then for each one of those scenarios they go and they tag it in various ways to identify what type of content it contains. They can also associate various rules and requirements with each one of these scenarios and these scenarios can now be linked to each one of these areas and what we're able to do is basically say how the probability of running into this particular type of scenario, and these can be combined - you can have an area that's tagged for pirates and it's tagged for freighters…
TW: “Does it make sense that you run into pirates in this area?”
TZ: Yeah and then you can logically combine these things explicitly and implicitly and by that I mean if you have pirates and you have freighters, then it's only natural that you would also see a lot of debris because some pirates have actually blown up freighters and if you have security that's actually in that system, then sometimes - if you think about how big space is, the odds of you coming upon a freighter right as it's being assaulted by a pirate you know is actually really really low…
TW: Not realistic, yeah.
TZ: It's not. The vast majority of the time you would cross the area of space where they were well or those two came into contact or well after and that's in general not that interesting and so we are able to control the likelihood of you seeing more or less of something, so the player sees a little bit more of the interesting stuff and not quite as much of the after-effects except insofar as we actually want that to establish the general atmosphere of a given area. In other words, if you're in a dangerous pirate area and you see lots of debris, that can actually be used to basically broadcast the fact that you're in a dangerous zone…
TW: And promote salvaging profession.
TZ: Oh yeah, there's a whole discussion we can have about the salvaging.
TW: We’ll save that for another time.
TZ: So, in general you're able to take these probability volumes to the designers right now and they can create these regions of space and they can link them to these environmental mission scenarios which have basically had tags and various other rules and such associated with them so that we're able to instantiate them as appropriate based upon the distance from the source point of these volumes.
The piece of the puzzle that we don't yet have operational which we're going to be aiming to implement later this year is gonna be all of the systemic connections so that what an area basically knows about is dictated by what's actually going on via just players and NPCs in general. In other words, right now the designers are having to do a little bit more hard coding of these volumes than what we're gonna be doing in six or nine months’ time. At that point, what will wind up happening is you'll see the macro economy will basically be providing inputs to the probability volumes which will then be querying all of the environmental mission scenarios to figure out what you should see at any given time and place.
TW: Very cool. So, why did we decide to go with this technology? Why probability volumes?
TZ: The big benefit to going this route is really one of optimization. Clock cycles in general on a server is a zero-sum game, and so we want to be able to put the vast majority of the clock cycles to work where there are actual players. But we also need to have the simulation aspects in terms of - well if you've got a lot of pirates in an area and they're basically picking on shipping, well then you would expect that the price of security goes up to a certain level and if players aren't responding or even if they are, there's also going to be NPCs that are responding. So, how should that gradual rise in additional security forces taper off the frequency of piracy within that sector. So, what the probability volumes give us in essence is really the ability to more effectively simulate a big complicated world where a lot of things would theoretically be happening but we're controlling it via dials and sliders as opposed to having to explicitly instantiate individual pirate ships and physicalize them and actually have them do the whole flight control systems, basically tracking other players, other NPCs, etc to get this effect.
We're able to control a lot of that stuff at a much more algorithmic level and to the player if we get this all right, they should see no difference. What you should see is basically you're going from point A to point B through all of these areas that dictate certain probabilities of certain things and it varies, like I said with distance, such that you may be going through you know a low probability pirate area, then it's high, then it's low…
TW: For the player, it’s seamless.
TZ: When you're going from planet to planet, there may be 20 different areas to where the
likelihood of encounters kicks up and then in between those areas for whatever reason - maybe because there are more security forces, maybe because there's nothing of value in there - whatever the case may be you're not likely to run into any encounters at all.
So really, you can think of them as a type of optimization so that we're able to get all of the benefits of explicitly simulating you know millions and millions and millions of NPC characters within the game without actually having to create millions and millions of ships and actually run that fairly computationally expensive logic on the servers which in the end would take away from how many clock cycles we could devote to the experience that immediately surrounds any particular player.
TW: It’s kind of like the LOD technology but for gameplay encounters.
TZ: That's exactly it.
TW: That's that's really cool. So. the last question I have for you is: let's take a look forward specifically with probability volumes, what's next? I imagine we're gonna iterate upon this technology.
TZ: Yeah, the next big step will be, like I mentioned, hooking this up to the back end economy so that you start to get more systemic linkage of all of these different behavioral characteristics.
Right now, for example, you have the price of items can basically rise and fall based upon inventory and stuff like that but we don't have the macro system in place - once the quantity of titanium in this area reaches a certain point then I really don't need any more titanium regardless of what my inventory level is until I’ve basically gotten a certain quantity of copper because it's copper and titanium together that will be used to formulate ship hulls and I need ship hulls because players are demanding lots of repair services and they're demanding lots of repair services because there's lots of combat activity in this particular area. So, what you what you should see later this year will be more linkage of these probability volumes into this whole systemic economic backend so that you start to get a lot more a lot more intelligent flow of prices and quantities tee's and what npcs and what types of scenarios you should see within a given area according to these higher level rule sets.
TW: Well thanks Tony, as you can tell, getting 3.0 out the door was a complex and ambitious process, so let’s take a moment and hear from some of the others on the team.
Mike Jones (MJ): The large part of our focus is on publishing the back-end systems. We do all the deployments for PTU and live releases, and so all the code-fixes, all, everything that gets done by the developers and the development studios comes here for the builds, and then that goes to QA and eventually becomes, a publish goes out to the players. In our run-up to 3.0, I think things really got intense for most of the support teams that we have here in Austin. I know that the QA team was working pretty much around the clock, keeping up with the amount of build candidates that we had flowing through the system, and every time they’d find a bug, the developers were really quick to send in fixes, and so we’re really pushing a lot of builds, a lot of build candidates. As you know, we went through a lot of PTU publishes, multiple publishes per week, all of this is driving us toward the final 3.0 release, and we had hoped to put it out a little bit earlier, but every time we’d get a build we’d say, just a little bit more, a little bit more fixes, a little bit more improvements. And so toward the end, we were really stressing all of our systems; our critical systems were at 4x capacity, and we were really, you know, I think there was a buzz here, everybody was really having a good time. We were working long hours, but everyone had that energy and we were happy.
But of course, when you push the systems this hard, oftentimes you end up kind of overdoing it a little bit, and so we definitely had some system failures toward the end. In fact, right at the very end, it was our last candidate, we had overdrived our storage system to the point where some of the drives overheated, and we lost one of our flash arrays, and this was a pretty critical time because that was our live release, and we were suddenly faced with this prospect of… what if we can’t publish before christmas? So, but fortunately all the guys in this team are really used to this kind of pressure, and so everybody jumped to action and we were able to do some quick maneuvering within the build system and IT jumped in. We had six or seven drives to replace; it didn’t take long before we were back up and running. It made a late night for us, but we finally got it out. So, I felt really good that we actually accomplished it, in spite of the challenges we had.
Ahmed, he’s, he’s like a one-man army for us. Once we had released 3.0, immediately we saw tremendous response, so many people came, it was really exciting to see them on the delta patcher, and all of our graphs were lighting up with the amount of bandwidth we’re consuming, and of course people start jumping into the servers, and we immediately saw growth beyond our expectations. We thought we had planned pretty well, but Ahmed was very quick to grow the server space and the server capacity in the regions that we needed it, and I don’t think he really stopped over the holidays. Everybody said goodbye, great job, job well done, and Ahmed just kept on, kept on going. I think he really enjoys doing that, I think that was probably the best holiday gift we could have given him, was to be able to keep working on the servers.
We did have a number of issues pop up during the holidays; and it’s just the stuff we kind of expected, some that we didn’t, but fortunately now we’ve spent enough time working on our back-end systems and our back-end procedures that we’re able to do quite a bit without having to do an actual patch, which is important because I think the players noticed, I know that some of the livestreamers were having a good time, and they could tell that systems were improving even as they were playing. And that’s Ahmed and his team doing their work on the back-end over the Christmas break that everybody enjoyed.
Ahmed Shaker (AS): 3.0 is such a big release that has a lot of anticipation on our end, and I’m sure on the backers end as well, that we had to blend very very well for it. Whenever you are about to make such a large release to a large audience across regional locations in the cloud, where you’re going to look at servers in the US, in the EU, in Australia, you have to coordinate with your cloud provider to make sure that there is enough capacity for what you are about to do, and right before releasing 3.0 AWS services announced a new generation of high-commute VM’s, called C5, and we were right on like, once we received the news that they have them now generally available, we have already adopted them in our plan, so it was an important thing for us to make sure that the game is going to run on the latest VM’s, with the latest hypervisor, the latest CPU’s that we can get in AWS, but at the same time we are trying to allocate a large number of resources at the newest, hottest generation in all essences.
So we had to file a lot of like, limit increases with them, making sure that there is going to be enough capacity. We had to have backup plans so, if we’re going to have a region in Ireland, I cannot only rely that I’m going to get all my instances from there. I can have another backup region in Paris. Same thing in the US, we would have a main region in Northern Virginia, and a backup region in Oregon, even within the region itself, there are ways to allocate your request and your networking in a way that you can maximize your availability, and work around any quirks that you can see during deployment.
It was definitely not an easy release, but I think it was the most cared-for release. I think I’ve witnessed a lot of releases, I”ve been dealing with live releases I think since 1.2 or 1.3, I believe 1.2, and every single public release I was a major part of it. That release was the most cared for, that release took the most amount of hotfixes, of changes that were done on the fly, where players can really, we’re also so graceful with the way that we’re applying hotfixes in a lot of situations that had to incur extra effort in us to make sure that the fix can exist without kicking all players out, while maintaining the highest stability. So yeah, the release started, we allocated large numbers of VM’s because we were anticipating a lot of players now with the delta patcher, a lot of players already were in the PTU from the last night, and we knew the moment we opened the servers, we would have two, three thousand, five thousand, these are going to be easy numbers because a lot of players have been playing in the PTU, and a very acceptable day, we just keep allocating more resources, building more instances, and everything was going relatively good, we were trying to look at the performance because the behaviour of the players, a mob that’s like come from the PTU, when the players are in Evocati or on PTU for a while, they start getting used to that daily thing, where you walk in the game, look around, see what’s going on, that’s a completely different behaviour from a player who hasn’t seen 3.0 at all, I just want to see it for the first time, and then you have 50 of them in the same instance trying to see what they will do, so yeah, a few hours in we started realizing that we have some sort of a deadlock at DGS with the dedicated game server would get in a situation where it spins over something, and players end up like, disconnecting and the game server crashes, and going a lot through the logs that an environment the size of live generates is just insane, like, you are imagining sixty, seventy thousand messages every thirty seconds. That’s the amount of logs that get processed and indexed every thirty seconds, so you have to sift through these logs super quick to figure out what is going on; there must be a common factor, a common medium why all of these are crashing, or all they all crashing in a specific pattern we haven’t seen before?
And then we realized there is one of the missions causes some sort of issue with the physics that causes DGS to get deadlocked, and in no time, we’re talking 23rd of December, a lot of people just went home, gonna enjoy their holiday, but it doesn’t matter, in no time I had the backend server team with me, I had Tony Zurovec with me, and we made some tweaks, and that was the first hotfix. We mitigated this crash, we dealt with it and the game started, and it just like… Jeffrey Pease would say that’s like an onion; you’re peeling it off and as you uncover a layer of unstability, or a layer of an issue, you get to see another one, we started realizing that interdiction is really really really intense, and it’s affecting the gameplay experience, players get interdicted a lot with the amount of AI in the instance, and we had to make changes in the matchmaker in the way it matches players into instances, we had to tweak more different aspects in the service, the probability volume service, just like, responsible for doing this logic and managing it, and it actually was a good thing when it comes to like relying on services, the model, rather than keeping everything in the game server itself, having a lot of logic outside the game server allowed us to make a lot of hotfixes. So every long story short, that took I think three or four days. I believe the 23rd, 25th, 26th, these are just one thing, they’re just one thing of watching it change after change and trying to see the performance, trying to see the difference and everyone was there. We had QA with us, Tony Zurovec working with us, Chris Roberts with himself was like, almost every one hour making sure what was going on, making different decisions, changing our strategy how we tried to deal with it, how we look at it, it was just a great thing. We enjoyed the vacation with the players, watching them watching these hours long videos of gameplay, watching people really really happy with being able to play 3.0 before 2017 ends, so, yeah. It was not an easy release, but I don’t think anything will top 3.0 for quite a while.
Justin Binford (JB): It’s a lot of kind of judging different priorities, and so one of the challenges was in Austin we’d handle the publishes, so we’ve been when we started publishing at the beginning of October, we started publishing nearly every day, and so we had to balance that with other testing priorities, and so I would think that would be one of our biggest challenges. And so that, that entailed us to work really closely with stakeholders in production to figure out the right priority, and so we can make sure we can get tested what needed to be tested on a case by case basis.
So when we do find an issue we deem to be rather serious, what we do is we’ll write it up in our project management software we use, Jira, and then we’ll email it out to the leadership of the team, and then they can, it gives the issue visibility and then people can make decisions on whether that would either stop the publish, or if it wasn’t a publish situation, it would get everybody together to solve the issue.
We have a test request system developed by our own Chris Eckersley, our QA lead in the UK, and that was as we approached the deadline for 3.0, development started utilising that process more and more, because as we approached the deadline build stability started to be more and more important, and we couldn’t risk the 3.0 becoming unstable right at this critical juncture, so the development adopted the QA test request system more vigorously, and every single feature related check-in towards the end had to be tested properly, and so that increased our workload immensely at the same time we had to continue these publishes nearly daily.
MJ: One of the things that we’re now doing is we actually have monitoring systems now that can see what players are doing, how many players on a server, what their performance is, and if we sense a, if the monitoring system sense any degradation, or there’s any kind of a problem, one of the things we can do now is mark that server for downgrade, so that, without having a serious drop, dumping people off of their game session, that we can easily let them fade off so that when they log out and come back in, they’ll be automatically switched to a new server. What we do is we use this as a method to apply hotfixes, so we don’t have to do a immediate downing of all the services and then bring them back up, which would be very disruptive to all our gameplay, especially now that people have missions and they’re, really a lot more to do, we wanted to do it this way so that it would really be behind the scenes. You can get these fixes in, it’s not all fixes but a lot of the fixes that we saw during the holiday break, were things that we could do on the backend through hotfixing in this manner.
AS: One of the things we have seen in this release that was really really interesting is player retention, how players want to keep playing, so every now and then we would face an issue with one of the services that might kick six or seven hundred players. In most industries you know that your client is such a, in the web industry where I came, if the page started loading like half a second more or a second more, that really affects your user time. People just go do something else, go read something else. That was not the case. We would have a crash knockout four hundred players and we would get back six hundred. Players just knew that the game is out there to be enjoyed, and also at the same time to figure out bugs that we cannot figure out without these insane large concurrencies that we were having. Thanks to the new delta patcher, we were able to get the players in super fast time, whoever patched the PTU the day before, whoever was in Evocati, was able to get a new patch super patch. A lot of them were trying to figure out how the trade system works, building trade apps, a lot of them trying to correlate bugs to each other. Whenever I was watching the servers during the holiday, I would be watching server stats, looking at different metrics, and at the same time I’m watching forums or chat, watching all kinds of social media, and a lot of times you would get a tip about something happening, because they managed to, they know. They know it’s not the game here, just to spend time to play the game to do the missions, they know they are part of the game. What they’re doing is the game, they’re building the game with us. So they start testing every single possibility and letting us know. Oh, if I did this and I did that, this kind of ship is the reason behind the bad performance or the reason behind this glitch we’re seeing.
Bryce Benton (BB): We spent a lot of time watching the PTU and just looking for our major crashes and things that were, players were encountering that was making the game unplayable, and we worked really hard to figure those out and on Saturday we felt like we were ready and we went ahead and published 3.0.
So on Saturday evening Ahmed and I were setting up various servers for Arena Commander and Star Marine and additional PU instances, and we were looking at the crashes and we saw a lot of people were having issues getting in and a lot of servers were crashing, so we had to work with Tony Z to figure out what was happening. We found out that players were able to get asteroids over at Olisar still, and that was causing a lot of crashes, so we had to QA servers and duplicate the issue, and then test the fixes. At one point we did have to do some major restarts, but we did probably three or four hotfixes throughout the period between Saturday and the end of our holiday break. Justin, myself, and a few others were logging in from home and checking the QA servers to see if the new fixes were going to cause any issues, and then we would hotfix the game servers on the fly with Ahmed.
AS: It’s just a different experience. How close we are as, usually we were running doing my job would not have a lot of contact or even know their end user. If I’m running a webserver that serves millions of users a day, no one would ever know me, know my name, have to deal with me, and Star Citizen’s a different experience, because you feel like each one of these guys is dedicating hours of his day to come play the game, test the game, and try to have fun. Completely different intimate relationship between you and your end user. You get to know… it’s a different feel. With 3.0 exactly, it was much more because we knew that it's the holidays, players are hanging out with their families, but at the same time we want to see the concurrency. Every single time we release we want to see how high it can get, how many concurrent players we can have, what kind of issues we will have at that scale, and we have insane concurrency, but like… people definitely are not having christmas dinner now. They’re playing the game. It was a lot of fun.
MJ: Under any other project I”ve been on, you would have a team in the dozens if not in the scores of people working to support this amount of service, but we really only have a handful of guys, but they’re just so good at what they do. I know they’re watching every little aspect, every frame rate, every piece of memory, they’re keeping an eye on it and watching the traffic patterns of the users. So one of the things that Ahmed really enjoys doing is bringing servers up when certain time zones happen so that we have enough capacity, or taking them down to save a little money. All this is critical work and eventually we’ll have a whole separate team to do just that, but for now, it’s pretty much Ahmed’s fun project.
Eric Green (EG): The initial publish to the Evocati was our first step into getting 3.0 out the door and that first night when we were deploying, we actually had a six hour skype call with a deployment team just to make sure that everything was going the way it should. The reason it took so long was that we really hadn’t deployed with this new launcher and with a bunch of the backend service changes that we had, so we wanted to make sure that we got everything right, we updated our checklist to make sure that everything went smoothly and we could monitor the servers as the players started to come in. There’s just so many changes being introduced with 3.0, basically the entire game had changed, all the backend services had been rewritten, it was a brand new launcher so we wanted to make sure that, while we tested it for many many many weeks with QA prepping for the evocati, you just can never predict. You’ll give it to one backer who has some weird setup and it just takes the whole system down, or encounter some weird issue with the release channel so, we just wanted to make sure that things went as smoothly as possible.
After the first Evocati push we had just dozens and dozens of deployments. Now that we had the delta patcher we could put out builds easily every day, meant just iterating with the feedback from the Evocati and then ultimately from the public PTu feedback we were getting, and just pushing out builds as fast as we could with as many fixes and feature deployments as we could.
Once we hit the public group, things changed a little because the deployment process from Evocati is fairly simple. We do a much faster checklist, just to get the build out to them. Once the build is going to the public audience, the QA checklist actually ends up taking a little bit longer to make sure things are tested a little better, and then as we moved into late December and started heading towards live, we started using more and more checks and having to recheck some of the older systems that maybe had been neglected for a little bit cause we were just pushing out so many fixes and so many new features in o the build that we had to double check, hey, Arena Commander was working a couple weeks ago, let’s do another deep dive into it to make sure nothing broke in the meantime, and then that actually ended up running into that Saturday, December 23rd, where a small team came in on Austin to deploy on that saturday.
We deployed around 2:30 ish, Austin time, on December 23rd that Saturday, and then immediately we were monitoring all the core services and the game servers, talking to players to make sure things were working okay. There were a couple of issues with some of the missions or interdiction that we found immediately and the great thing about some of the work Tony Z has done, is that all of those backend services run on our side to where he can make changes to the code, the missions, interdiction timings, or how often it happens, and we can deploy those hotfixes on our end and then gracefully restart all of the servers so that players don’t even really know anything’s changed, but things get much better. So we actually ended up deploying hotfixes almost every single day of the Christmas break starting from the 24th all the way through I think January 2nd.
BB: Christmas night was pretty interesting because for the most part the instances were fine all day, and then all of a sudden things started to happen, and we were you know, logging in remote and a bunch of us, we didn’t really do any calls, but we were all in Skype trying to figure out what was happening. I think it was 3am and some people realised that they needed to put christmas presents under the tree but we were all watching the servers, providing FPS information to Chris and Tony, providing other graphs to show them what the fixes, how the fixes had improved the game play.
There are things the players can do that will sometimes cause the instance to get in a bad state, we can see that in graphs and we can kind of predict that and fix that before we can make an instance set so that it, so that nobody will go into it and it will, once everybody’s left it will restart itself, we do a lot of things to make sure that the players aren’t affected as much as possible.
EG: So what would happen is we would identify an issue, so for instance that first day on the 23rd there were asteroids popping, that was actually an environment mission that was spawning in places it shouldn't be spawning at and causing all kinds of physics grid nonsense, that was just causing massive FPS fps drops, and so what Tony ended up doing is just removing that mission from the actual service and we just rebooted everything, and we have a way, we can ungracefully force everyone out of the servers and shut everything down and restart it, but we also have the option to gracefully restart services, so what happens is GIM will look at what servers have a population declining, and once a population of a server hits 0 it’ll just reboot that one, so we’ll actually have players experiencing a slightly different game at a different time, because not every instance has been restarted, but then over the course of the next 24 hours all the servers will just naturally restart, and then they’ll be running on the better code.
I would say that right now, one of the biggest struggles we’re having is that the connection to our database and our persistence cache can fall over, and when that happens there’s this short window where, if a player does anything while that service is down, it just causes all kinds of wild inconsistencies with the data, and this will often end up with the player account sbecoming broken and we’ll have to go in as game support and say, okay I can reset your data here, or we can fix this, unfortunately it means a lot of players lose out on missions or their cargo gets lost, and it’s unfortunate but we have already identified most of those issues, and are working on fixes.
Jeff Daily (JD): We’re doing something that no other game i’ve worked on is doing, it’s… this combination of MMO simulator sim FPS all in one giant seamless universe. From a player’s perspective it’s like, that’s crazy, how are they going to do that. From a QA experience, it’s… how are they going to do that I want to understand this, so it’s been a real learning experience, and a lot of fun.
JB: The team has just been doing such an incredible job and I'm just so proud of them, from the… I have such an accomplished and capable leadership team, and from the QA managers all the way down to the QA testers, everybody on the team in their own right is a rockstar, and together they just… successfully support the development of the team, and I’m just really proud of that.
MJ: I think there’s a lot of people that realized, at least with this particular publish cycle because it was such an important one for us, that all of our process kicks in when that final build comes for that final publish, but what people don’t remember is that this team is doing that constantly. Every time we do a publish to the PTU it's pretty much the same process, it’s just to different servers with different security protocols. So for the most part the team is doing this every day. You may only see two publishes, three publishes a week, but we’re actually running the process for a publish candidate every day. It’s not until the very end when QA tells us whether or not we can actually release that version, or if there’s some critical fix that they’re waiting for, but by then the work has already been done, so we’re constantly publishing, constantly exercising the systems, and this helps us to really clean out any of the rough edges and smooth things over so that when we do have a major 3.0 publish, it really goes down pretty much like clockwork.
In fact i remember one of the things that, it wasn't really frustrating but I noticed that it happened, is that we have a very rigid checklist that we follow to make sure that everybody does every step in a proper order. The team was publishing so quickly that night that it was hard for the production team to even keep up with the checklist, just because things were happening so rapidly.
TW: Well, that’s it for this time, we hope you enjoyed the update, and we’ll see you next time!
CR: So the Austin team is going to be busy this year ‘cause we’re planning on delivering content enhancements and refinements in a quarterly release schedule which I think will be pretty cool. For an in-depth look at what’s been happening at all our studios be sure to head to the RSI website this weekend where you can read our monthly report for December.
SG: Yes, and that’s all for today’s show. Remember to check out the Squadron 42 webpage as well and enlist to receive the newsletter. Else you won’t receive it.
CR: Yes. So definitely enlist. And thank you to all our subscribers: your support and feedback makes us enable to do things like this show - Around the Verse - as well as the series that we’re going to do like Reverse the Verse.
SG: Remember to tune in to the premier tomorrow noon PST and also the Tumbril Nova concept remains on sale through January 15th so there’s still time to add this bruiser to your fleet.
CR: Yes there is. And of course thanks to all our backers who make the development of Star Citizen and Squadron 42 possible: we wouldn’t be able to do this without you so thank you very much.
SG: We would not and until next week we will see you …
Both: Around the ‘verse!