Building an MMO in the cloud – what makes SC in cloud unique?
When the ‘cloud’ movement started, people said it’s not for everyone. Can’t just take an app and put it in the cloud. You have to use other layers that Cloud provides; object storage, movements between VM’s, firewalls, etc… and everything operates as a service. All really good stuff, but you’re relying on resources shared among other people. Have to design it all for failure, expect things to go down at any moment. The ‘statelessness’ – there’s no state that might die at any moment. Everything can be accessed in different ways. It’s all about distributed systems. Having it in the web, everyone does it. Games are the complete opposite. They’re very stateful, MMO’s are extremely stateful. Leave your machine running 24 hours, you won’t tolerate a 404 or a disconnect etc… Getting stability in the cloud is not easy. Takes a lot of control out of the dev’s hands. The challenges are really that MMO’s are alien in the cloud. No-one else has done what CIG does. Not a lot of people manage to have a game server in the cloud with things working, and that’s the challenge. If you want to use a new tool in the cloud, and you code in a modern language, and your use case is something to move records, you’d find an example in five minutes. Making a game, that’s different.
CIG have plans in place to adress the challenges. Some we can’t talk about, some are going to be revealed by Chris, but in the broadest sense, how are we facing the challenges?
They want to implement what’s known as event-driven data centre. The more they provide float resources, ‘time to market’, how fast would it be to scale a game server. How can they re purpose resources, etc… Cloud is mostly utility computing. You pay for what you’re using. They want to implement event driven centers. Everything that goes through a major event, it would go through busses that would react to it. They can orchestrate that how they want. They need to make all the logic that the gameplay guys are writing report what it needs, and the cloud infrastructure has to react to it by expanding, contracting, etc… That’s the plan. Moving to that, the way they usually do it in dev ops, you can’t really draw everything out. They create the minimum viable product, get the players in the game, and upgrade from there. Move fast and break things. It’s like persistence – if they had laid out persistence and coded it all before laying it out, they’d have hit errors. Implementation is easier when you have something to test with. At one point they didn’t even have game data designed, they didn’t even know what they had to persist. Disco remembers those meetings, deciding what the players get to persist, what doesn’t, etc… and all that is still evolving. There are many game systems still to come online, still to evolve.
Scale of the game, scale of how SC dwarfs previous projects. Talk about the scale
Itemization. WoW, for example, you can have 100s of items on a character. Lots to keep track of. In SC, just with ships, things you buy on the site right now, you’re in the 1000s, and there will definitely be players with 50,000+ items associated with a character. Lots of thought being put into how to deal with that. If they have a million online at a time, 50,000 per player, it’s a lot to move around. Before persistence, architecture was very FPS-like. They could pretty easily hit 250,000 player numbers with the code they had before, but once you add persistence, you have to change everything. Writing all services in C++. They can scale different ways, but they have to consider load balancing and availability. If any service goes down, something else has to pick it up. The service has to always be available. Multiple instances of any service up at any time. They want to apply mega server methodologies. Single login, single Universe where you can play with everyone, rather than locking people into servers. Designing the mega server architecture up-front.
What’s a megaserver?
One giant world. A collection of servers with services working together to give the impression of one server. Rather than something like WoW where you have 200+ servers to choose from. Makes it difficult cause they have to work with players all over the world as well, it’s a global mega server. Players have to feel like they’re there, can play with whomever, and it needs to feel like they’re next-door. No transitioning to another realm or anything.
Ever had a typo break everything?
The way they deploy environment is ‘green load deployments[?]’ They stage a shadow environment, then switch the environments when they’re ready. It was around 2.2 or something, and they were breaking some rules cause they had to go live. With DevOps, they deliver what the community needs as quick as possible. Ahmed had an extra space in a URL that would generate lots of stuff going to servers, and it caused an issue with authentication. Caused login errors, QA couldn’t get in. It wasn’t production though, so they were able to catch it, but it was hard to find the extra space.
Do you use common framework like springs or struts on the backend?
That’s Java framework? Universe cluster is primarily C++ with some 3rd party libraries. Web does lots of java stuff though.
[What config management tools are used? Log management analysis tools?
hey use Chef, and a bunch of other homebrewed scripts. They have a graph to show explaining some of it. There’s a graph now on screen. Album of the graph Current build is based on buildbot? This has gotten extremely difficult to explain. Incredible amount of detailed explanation here, about how the server stuff works. If you’re interested, watch the video at ~30 minutes. I simply cannot summarize this. Any cloud environment in CIG has three VM’s, one is the hub, one’s the game server, etc… None of it is final though. They’ll be doing a lot of work on it. There are lots of different kinds of servers, they have stag drivers, analysis servers, they run Chef… What they’re showing is not final. They’re working on a new deployment pipeline. And that explanation is done. Time to get back to summarizing.
Patch reduction – What insight on progress of patch size reduction?
That project is running on lots of caffeine. It involves people from all different departments; Dev Ops, IT, Engine programmers, etc… Everyone’s working to create a new batch delivery system. Affects players and developers. When they get it done, it’ll help devs as well, cause devs have to pull new builds all the time as well. They feel really bad when they deliver three builds in a day, they want to give a seamless experience. Having a smaller batch is key for everyone. It’s a challenge because they’re operating a live environment at a very early point in the game. Mike Pickett in Austin is working on the issue. He’s one of the first that was working on it, but it extends to lots of people. The system has to be native; every part needs to use it. There’s a huge team involved in SC, Disco likes to give shoutouts to people as much as he can, so Mike, Good Work! Wooo Thanks Mike! Keep it up!
Ahmed, why are you so amazing?
What colour network cable is your nemesis?
Dealing with stuff you can touch with your hands stopped years ago. They forget about colours of stuff now, because of the cloud. Now they just tell stuff to happen and it happens. Not always the right way, that’s their job, but Ahmed hasn’t touched a network cable in a while. Disco ran his own IT company for a while, always got interns to wire the cables.
In some MMO’s, bot armies set up to grind currency is an issue. What’s being done to prevent that?
They don’t answer those questions cause they don’t want the bad guys to know what they’re doing. They know we want to know how we’re gonna do it, but the more info they put out, the more info the bad guys have to circumvent. Jason has lots of tricks up his sleeve though. They’re watching, they know what they’re going to try to do… they’ll catch ’em.
Client Framerate vs Server Framerate?
Client framerate is the rate at which your graphics are refreshing. On server side, it’s the simulation rate. They’re not locked together. Lots of players are concerned about FPS. First off, thanks to everyone that plays PTU, and to all the Evocati. It’s amazing. What they try to do on PTU is to define points that should be addressed. Optimization is a rabbit hole. They have a product that’s out, that they want to make playable and fun. They try to point things to engineers in Frankfurt etc… to try to make it okay, but they have a lot that they aren’t delivering on, because it’s not the time for it. It’s liveable right now, that’s important. They do have tricks up their sleeves for later though. People have on their radar what’s most expensive on the game server and on the client side. But there are so many features going in, it’s premature to optimize them. Once you optimize code, it looks different. Wide range of things they can optimize. Rendering, client side simulation, rendering. On the backend, there’s physics. Also game servers rely on response from backend, etc… They optimize certain things over time, but there are so many departments that’ll optimize later. If they optimize now, they might pin themselves into a hole later. The team is finite. When they know something is temporary, meant to hold over to the new system, you don’t want to spend time optimizing it.
Will site be hosted form the same servers as the game?
Website and backend are about to have a very close relationship. Prior it’s always been services and platform separate, but now they’re coming together. They’re getting married. They’ll have dedicated channels for bi-directional communication. Working on how to architect both sides to make data and responses as easy as possible. When you buy a ship on the site, there’s no way for platform to tell that. They grab your package, and then they know about it, and they can create the ingame items and persist them. With the new system, the moment you buy a ship, they’ll get notified, and it’ll be in-game.
What hypervisor do you use in the cloud, and what core OS do you sue?
GCE cloud engine, custom KVM. The hypervisor has different ways to extract instructions, but there shouldn’t be an issue whether they’re relying on XEN, KVM, or anything else. It won’t be the main issue. Currently they use Ubuntu 14.04 as the core OS that runs the game, but the stream just cut out, and I don’t know the rest.