|March 31st, 2006|
This always comes up, and I often seem to be on the opposite side of the argument from many players. I’ve usually found that those who have worked on the implementation side of both tend to feel that they are the same thing, but that thsoe who haven’t see them as somehow categorically different.
So here’s my stab at explaining why I think both are really the same thing; in many ways, there are far larger differences between certain kinds of text muds than there are between graphical and text-based games.
What is a virtual world?
Doing definitions is always tricky; I’ve offered mine in the past, with these following core requirements:
- A spatial representation of the virtual world
- Avatar representation within the space
- A sandbox to play in that offers persistence for some amount of the data represented within the virtual world
Others have taken a whack at it as well, of course; to wit, Richard Bartle comments in the opening paragraphs of his indispensable Designing Virtual Worlds
Virtual worlds are implemented by a computer (or network of computers) that simulates an environment. Some — but not all — the entities in this environment act under the direct control of individual people. Because several such people can affect the same environment simultaneously, the world is said to shared or multi-user. The environment continues to exist and to develop internally (at least to some degree) even when there are no people interacting with it; this means it is persistent.
He goes on to offer a list much like mine, which (paraphrased) gives the requirements list thusly:
- Underlying automated rules: a simulation running that defines the world’s physics.
- Players represent individuals within the world, at some level, though they may control more than one entity.
- Real-time interaction with the world.
- The world is shared.
- The world is persistent to some degree.
You’ll notice that graphics or text appear in neither of our definitions, and the reason is that word “simulation” that appears in Richard’s list.
Space and why it matters
Virtual worlds of all stripes are undeniably running a simulation model. Broadly speaking, the basis of this simulation is that it is modeling space, providing a sense of place. For example, the early hypertexts and the text muds both make use of brief database entries with a title and a block of text; they both make use of hyperlinks between these blocks of text to allow a user to move from one to the other.
Text muds, exploiting the power of hypertext, treated each block of text as a discrete location, but did not apply scale to the “room” that was described. In these room-based systems, therefore, you could move a very large “distance” by traversing a link. Many of these links were literally labelled with cardinal map directions (“north”), but some of them were instead “prepositional” such as “under,” “through,” or “into.” Indeed, more adventurous builders sometimes created links that were conceptual, carrying users to “places” that were not real, such as dream states.
Nonetheless, as with any node system with links, these environments could be mapped, using directed graphs. It’s important to note that nodes are therefore not required to rest on any form of coordinate system; nonetheless, mud maps reveal that frequently, the developers did indeed do this, for ease of navigation. Spatial relationships between nodes were generally established, and while the spaces may have been non-Cartesian or even non-Euclidean, there was an explicit desire to represent location.
It’s important to realize this, because the metaphor of location is a key characteristic. Space can be modelled in many different ways. The hypertextual directed graph is a model that compresses and elides; only locations of significance are included in its map. It’s akin to a subway map; only stops are shown, and moving to the next stop is assumed to involve an arbitrary amount of uninteresting travel. This doesn’t mean that you cannot build other maps of that space. The most obvious alternate method to use is the standard Cartesian coordinate system.
Many text muds employed Cartesian coordinates within their nodes. In fact, the modern successor to a MUD’s room is not a given location within an MMORPG, but in fact a zone. The term itself has a MUD heritage, where it meant a collection of nodes that were linked thematically or via database tags; but in effect, a zone today is a node in a directed graph, with an embedded Cartesian coordinate system. Its links to adjacent zones use a geographical metaphor, but are in fact hyperlinks just like a MUD’s exits.
Many muds even supplied “seamlessness,” allowing users in one node to look into another node to see what is going on via an explicit command (for example, the “scan” command on Diku derivatives). In fact, it is a trivial exercise to add something like the following:
if we're in a room classified as "outdoors" (and perhaps if visibility is good, etc)
for each room at the other end of an exit
if the exit isn't closed
list every person in that room, in the format "To the [direction], you see [person].
Also list any objects over a given size.
Thus giving something directly analogous to the very fancy code required to proxy avatars across processes for modern seamless MMORPG architectures.
It is equally possible to see space within a given node as a polar coordinate system, or some other scheme. Similarly, it matters not a what what scalar factors you use for your coordinate system; there is no intrinsic difference between a text-based world with a coordinate system measuring 1×1 within a room, a text-based world with a coordinate system measuring 10×10 within a room, an isometrically displayed world with a coordinate system measuring 256×256, and a 3d world with a coordinate system measuring 64kx64kx64k.
The real characteristic here is that nodes are just a convenient way to chop up the data. Nodes are still a characteristic of pretty much every virtual world system today, even if sometimes they get called “shards” nowadays.
Because of this, we have seen all of the following:
- Text-based worlds with rooms that fall in implicit locations on grid maps
- Text-based worlds with rooms that do not fit onto grid maps
- Text-based worlds with rooms with Cartesian coordinates in them
- Text-based worlds with no nodes, and instead text descriptions dynamically built based on what was nearby in a Cartesian coordinate system — think, exactly like a seamless 3d world, except that textual room descriptions were built dynamically as you moved around.
- Graphical worlds with rooms (and a static picture)
- Graphical worlds with rooms that do not fit onto grid maps
- Graphical worlds with rooms with Cartesian coordinates in them, both with the ability to look into adjacent “rooms” and without
- Graphical worlds with large, seamless Cartesian coordinate environments
This brings us to the next observation: that the client is stupid.
Nothing happens on the client
It is a core tenet of virtual world design that the simulation on the server is authoritative. The client may run its own sim, but this is generally done solely to cover up latency; the client’s sim concedes authority to the server in any cases where the two simulations disagree. Any cases where the client is modelling something that is not present in the server’s simulation are, as a matter of best practice, not transmitted to other clients. That means that as far as other people are concerned, things didn’t happen that way.
There are many reasons why this is done, but the fundamental to take away from this is that the client is a dumb terminal. It is merely a filter for displaying the results of the simulation on the server, and a means to poke and prod that simulation. It does not simulate anything itself.
Given a simulation, its method of display is arbitrary. Given simulations that have differing characteristics, you may well want to choose a display that is well-suited to displaying the nuances of the sim in question — but there are often reasons you do not (for example, you may wish to surface only certain characteristics). In the case of virtual worlds, the primary thing to convey varies depending on the end user. It is common, for example, for there to be custom clients that connect to a virtual world and do not display anything in a spatial metaphor whatsoever; instead, they display graphs and figures, node by node, and surface different elements of the simulation such as loop times, number of connected users, and so on, for the purposes of administration. This is just as real a client for a virtual world as the one you play the game with, even though it doesn’t even allow you to “move.”
This is an important concept, because what it tells you is that regardless of the design assumptions behind the server simulation, you can always build a client that filters the data differently. In that sense, the datastream down from a virtual world (and at this point, we must consider that as solely and exclusively residing on the server) is a lot like the datastream of HTML from a webserver: different clients can render it in vastly different ways.
The only thing a client cannot do is add data or rules to the simulation. A client could easily collapse a world with Cartesian coordinates into one without them, by simply not parsing that element of the datastream. What it cannot do is add a Cartesian coordinate system on top of a world that lacks one.
The scalable client
At one point during the development of Ultima Online, we had internally a mini-client written on a lark that got packets and displayed the results as text: a classic “collapse” scenario as described above. I’ve also played text muds that used ASCII graphics to display crude overhead maps updated in realtime as you moved around within their embedded Cartesian systems.
The real difference between the MUDs of yore and the modern MMORPG client isn’t the sim on the backend; it’s the fact that the datastream is tokenized. When you connect to a MUD and it attempts to inform you of the presence of an object, say a chair, it actually sends the definition of that chair down: the words that make up its description. When you connect to a graphical MMORPG, instead you are sent an index number, a token that lets you look up on your local client install the description of that chair (which these days, is likely to be a 3d model).
A client install is nothing more than an elaborate caching scheme. Tokens are used to minimize bandwidth during play, but these days we see more and more worlds returning to the older practice of sending down the descriptions of objects, and not just their lookups, with titles such as Second Life but also games like Dofus or Runescape, which “stream” off the web. Text was the original streaming technology. Non-streaming games are (to use a phrase that seems to get me in trouble a lot) a historical aberration, a transitional technological hack to get around bandwidth limitations and the idiosyncrasies of embryonic delivery systems.
Even with tokenized systems, it is trivial to have multiple clients where the tokens (which are from the server and therefore authoritative) can map to different sets of descriptions on the client end. Thus we get things like UO 3d versus the UO 2d client, just as we get selectable “themes” on webpages.
In the future, all virtual worlds will almost certainly make use of streaming, just as they did in the beginning; descriptions of objects within the world may be cached and tokens sent thereafter, but the basic premise will still be that the client will become dumber, something that interprets descriptive markup of some sort, rather than supplying descriptions itself.
The result is a scalable client; given a robust enough sim, a given client can scale down to display it under more limited circumstances. If a given client is incapable of displaying data contained within the datastream, it simply won’t, or it will substitute something it can display. Just as a WAP phone can browse a webpage, but not display the Flash nimation, so will virtual clients display what they can and skip what they can’t.
Different data, different datastreams
Different data surfaced by the server simulation will by necessity require different display methods. For example, a very common data type output by a server is text. Until such time as a server fails to supply text as an output, virtual world clients will continue to support text displays that perforce will look a heck of a lot like a text MUD. Hence the classic gripe about MMORPGs that “you can play the text window.”
The question isn’t whether the simulations the clients are connecting to are different — in virtual worlds, they aren’t significantly. The question is what display method is most appropriate for conveying the meaning of the information. A tree could be displayed as a graphical oak, with girth and collision volume conveyed iconically via a cylindrical texture-mapped 3d object, or you could get a textual description saying “there is an oak tree, diameter 1.21m located at coordinate 1114.42, 1274.63.” Clearly, for ease of use of the client, the former is more useful. The server-side data, however, is identical. The classic example of this is perhaps the way in which earlier graphical games such as EverQuest sent down text saying “It begins to rain” in parallel with the token indicating the graphical effect to play. One event: two media.
For the foreseeable future, clients will mix and match media to datastreams in order to convey meaning as efficiently as possible. This says nothing whatsoever about the nature of the simulation, and it says a heck of a lot about user interface design instead. Someday, the clients may well be truly immersive 3d worlds with direct cortical connections, and we may be able to skip text. But we’re a ways from that right now.
Because of all of this, the proper way to think of clients is as filters on the simulation. The sim that is a virtual world is always going throw off far more data than we even want to perceive. The method of display of said data is truly arbitrary; while different simulation models will suggest different displays, there’s nothing that specifically binds a given set of information to one method of display. There are only best practices, not mandates.
In conclusion: they’re the same thing, dammit, but clients do matter a lot
While clients are nothing more than filters, it’s undeniable that presentation has an enormous cognitive impact on the user of a client. Many variables may fluctuate massively, such as degree of immersion, spatial awareness, ease of parsing data, and so on. However, we must not forget that different brains have different cognitive strengths. A common mistake made by MMORPG exceptionalists is that graphics are necessarily more immersive. This is exactly the same sort of logic that says that movies must be more immersive than books. Different strokes for different folks, as always.
The thing we should never lose sight of is that in the end, what the player is participating in is actually the simulation on the server. Humans have an amazing ability to see abstractions behind displays, and to elide out insignificant information. Something like not being able to jump over a short wall because a 2d world is being represented as true 3d will definitely cause cognitive dissonance for a while; but it isn’t long before players see trees as collision volumes and milkmaids as XP bags, reducing displays down to functional descriptions.
As designers, we should make our best effort to supply the right sort of filters and displays for the sort of interactions we want to provide to our players. We design experiences and affordances, not just functionality. None of this discussion is to minimize, therefore, the importance of the client. But as designers, we should also be fully aware that the client is primarily a feedback agent connecting to something else. Feedback is critically important, but feedback is not the simulation.
The true typology of virtual worlds lies not client-side, but in the differing methods of handling databases, user interactions, and persistence. But that’s a discussion for a whole other day.