Welcome to Raph Koster's personal website: MMOs, gaming, writing, art, music, books.
Game talk

Avatar body language

June 8th, 2009

Regular blog reader mrseb has a blog post up on emotional avatars in virtual worlds inspired by this NYTimes.com article (it’s behind a reg wall).

In short, the research is about how important blushing is as a social lubricant, as evincing embarrassment or shame serves to reinforce the social rules held in common by groups of people. It’s a sign that the person knows they are transgressing to some degree and is sorry for it, and people judging them tend to treat them less harshly.

Which leads Sebastian to ask (emphasis mine!),

Why are we still running around in virtual worlds with emotionless, gormless avatars?

It’s not that the question hasn’t been asked before. For example, back in 2005 Bob Moore, Nic Ducheneaut, and Eric Nickell of PARC gave a talk at what was then AGC (you can grab the PDF here)., which I summarized here with

The presentation by the guys from PARC on key things that would improve social contact in MMOs was very useful and interesting. Eye contact, torso torque, looking where people are pointing, not staring, anims for interface actions so you can tell when someone is checking inventory, display of typed characters in real-time rather than when ENTER is hit, emphatic gestures automatically, pointing gestures and other emotes that you can hold, exaggerated faces anime super-deformed style or zoomed in inset displays of faces, so that the facial anims can be seen at a distance… the list was long, and all of it would make the worlds seem more real.

I was at that talk, and in the Q&A section, which was really more of a roundtable discussion, the key thing that came up was cost.

There are two equally challenging barriers to better emotional displays from avatars: input and output.

In text-based worlds we had very low-cost output. Text is extremely malleable, with very little overhead. You don’t need to bake out phrases in advance inside of dedicated software and ship it on a disc. It takes up little bandwidth. It has very low hardware requirements for display. And you aren’t confined into tricky complex formats: nobody asks you what the poly count is of a sentence, or whether you have room for one more bone in your paragraph.

The flip side is that the creation tools for puppeteering are then very freeform and very simple. Just about all text worlds — even IRC and IM — offer the following two simple tools:

  • canned social commands (“smile” “laugh” etc). Execute the command (which these days is often just a button press) and a pre-written line of text is delivered to the interlocutor. The advantage here lies in easy puppeteering and the expense of flexibility.
  • the “emote” command, also know as “me” or “pose.” This is the reverse; you type in your own text, so you have maximum puppeteering control, which also takes the most work.

Once we got out of text, however, we were in trouble. Graphics adds very heavy load on the output side of the equation, reducing the available palette to only canned choices — and few of them at that! Systems such as Microsoft Research’s Comic Chat (remembered today as the place where the much-reviled Comic Sans font was born) offered “emotion wheels” and facial expressions, and pioneered text parsing to add mood to the avatars.

In the mid-90s, the addition of text parsing into the mix gained a lot of popularity in non-VW circles, allowing emoticons to get parsed automatically out of lines of chat and play sounds or insert small graphics like this one: :) Emoticons themselves, of course, were a way of adding emotional tone to text, a way to puppeteer in a very low-intensity way.

Text worlds merrily went on developing more elaborate systems, such as moods and adverbs, and eventually these made the jump to a couple of graphical worlds. We also saw systems like that of There.com, which managed to convey intensity of emotion in smilies by using tricks like including more pieces of punctuation in a social command in order to make it “stronger.”

When you type these commands, they appear in your chat balloon and your face and gestures match the emotion. If you don’t want the Smiley to appear in your chat balloon, type a tick before and after the command, such as ‘surprise’.

You can intensify a gesture by beginning it with two or three ticks. If you think something’s pretty funny, type ”laugh. If you’re really excited, type ”’yay.

Needless to say, the number of these was limited by the graphical output. There had quite a lot of other features along these lines including automatic head tracking to recent speakers, and eyegaze on the head matching your mouselook.

The puppeteering challenge remains, of course, even as graphics have gotten better and the range of possible emotional animations has risen. In Star Wars Galaxies we had moods and a subset of them could affect body language, causing the base idle of the avatar to change. We also supported the stuff that There did (which I cheerfully stole). But the cost of all this stuff definitely adds up. Adding modifier animations on top of all the other animations that are required for actually playing the game can easily get prohibitive.

The other direction to go here, of course, is procedural. A glance over the animation work of Ken Perlin shows that it is quite possible to convey emotions in facial and body animation using relatively few procedural datapoints. (You’ll need Java for these).  His Responsive Face demonstrates how you can create 2d faces of remarkable expressivity by adjusting a few sliders; another demo shows how you could use this same system as an animation tool to build more complex sequences of emotion. Both of these are based in part on the research of Paul Ekman (best known these days for inspiring the TV show Lie to Me). Aspects of this are reportedly in Valve’s Half-Life 2. Finally, his Emotive Virtual Actors demonstrate how a fully procedural animation system can get across highly subtle information such as “interest in a member of the opposite sex” purely via body language!

I’ve wanted to make use of this stuff for forever… and have never quite found a way. The barrier is that it requires that the entire system be driven procedurally, which is a larger step than most art departments are willing to take.

These days, of course, all the news is around cameras instead, providing direct puppeteering of avatars by motion-capturing movements on the fly. This has gotten more and more sophisticated since the days of pioneering art installations like those of Zach Simpson, or even the EyeToy’s simple games. Among the demos of Project Natal at E3 was Milo, which mirrored your actions and did facial recognition.

The step beyond this is direct brain interfaces — which are no longer science-fictional crack. We can control movement via a brain interface today, and it is not a stretch to imagine avatars displaying emotions simply because you feel the emotions!

The difference, of course, is that at that point you are no l0nger puppeteering, and are instead engaging in a form of emotional telepresence. For many applications, it will be as critical to hide real emotions as to display them; woe betide the brain interface that displays what we really feel as opposed to what we are trying to show!

This is, of course, why voice is so often used now as a method of increasing emotional engagement. The puppeteering problems are bypassed entirely. It could be that at some point we use stressor analysis in voice chat in order to puppeteer in the same way that we use emoticons today.

At the moment, then, we are caught in a mode where the displays are almost good enough but the controls are bad — getting us back to where we were with the text displays originally. Alas, for most developers of virtual worlds, particularly game worlds, all of this stuff takes a serious backseat — even though depth of emotional connection is a major predictor of retention. With the highly operational gameplay mode being dominant in MMORPGs, we see very little attention paid to avatar expression beyond customization — and even that is oriented far more around the avatar as trophy case than around self-expression.

Given the cost of doing any of this stuff beyond the minimum, the hope for better avatar body language, then, rests in the casual market, and the worldy worlds, because those are the only markets which privilege it. And even these struggle with the puppeteering, because all of these systems have interfaces that increase in complexity and likely user confusion as the capabilities increase. In the name of balancing resources and keeping the interface clean, developers lean towards less emotional bandwidth.

It might be the case that World of Warcraft could significantly extend the lifetime of a customer by adding better puppeteering, but weighing the benefits against the opportunity costs of more armor sets has thus far come down in favor of less emotion and more ease of use, less communication and more monster killing.

*

You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

Meta

Recent Comments

Categories

Tags

Recent Trackbacks

Archives



A Theory of Fun
for Game Design

Book cover for A Theory of Fun for Game Design, by Raph Koster

Press
Excerpts

Buy from Amazon

Twitter @raphkoster



The whole Web

Raph's Website

See popular posts »



After the Flood

After the Flood CD Cover

Available as MP3 download
$14.99


More stuff to buy

Cat Fishing T-Shirt

Cat Fishing
Ash Grey T-Shirt

$16.99


LegendMUD

click here to visit the Legend website

"The world the way they thought it was..."