Thursday, January 14, 2010

Metrics

Nothing is perfect. That can be a kind of defeatist perspective if looked at the wrong way, but it's also a source of optimism. Everything provides an opportunity for improvement if you can just stop and learn the lessons of what went wrong.

This is often easier said than done. "What went wrong" is a terribly amorphous idea, especially when dealing with something that has lots and lots of moving parts but will still lurch forward instead. Some endeavors have outright failure conditions. If a car won't start or a house falls down, then there's clearly something wrong. But more often the car is making a strange noise or the house has intermittent electrical problems. These are much harder to address than a true failure, if only because failure demands action. Persistent, systemic problems that don't actually cause failure can be tolerated, often well past the point of reason.

This is why people in expensive suits talk a lot about "metrics". In the absence of a failure scenario, you need some way to improve upon things, and the only way to consistently do that is by measuring things, changing them, then measuring again to see if there's some improvement[1]. But what should you measure? This can be surprisingly hard to figure out, and it's valuable enough to explain why those guys can afford such expensive suits.

So, take all those problems and complicate them further with ideas of "fun" and "entertainment" and you have the problem of applying this thinking to RPGs, where I would argue it's desperately needed.

First and foremost, how can you even spot a failure scenario with an RPG? Is it the game that doesn't happen? That doesn't finish? That finishes badly? Who's to say those breakdowns came from the game and note from secondary failures (perhaps in transportation or coordination). As individuals, we may have some spectacular car crashes of games we can think of that we might be able to deconstruct, but those tend to be so intensely personal and specific that there's not much room for common language. That is to say, if you want to deconstruct one of your failures, then don't expect much help.[2]

And that's the big easy part. What about games that don't fail? What concrete things can you look at and say 'this didn't go so well' so that you can try something to fix it and see if it works? What kind of metrics can you even look for in an RPG?

I don't bring this up rhetorically. I genuinely don't know, and it frustrates me intensely. There are a few vague shapes - you can track resources in a resource management game and use that as a rough guideline, but that tends to be very rough indeed[3]. You can track concrete social elements like attendance and time played, but I'm not sure what those tell us.

Arguably, the best case is probably to just pick some things, assign some numbers and fake it. If I rate "Player enthusiasm" at the end of every session from 1-5 then there's lots of room for error, but over time I am likely to get some useful trending, even if the specific numbers themselves are only so reliable.[4]

But what to measure? What should the metrics for a successful (or failed) game be? I'm intensely curious to hear people's thoughts on this.



1 - This is contrasted with just going by gut. Some people can do this, and that is truth, but many many more people THINK they can do this than actually do. And, of course, determining whether or you can is probably best not judge by your gut.

2 - 'But I can discuss it on the Internet' says the hapless optimist. And technically, that's true, and there might even be some faint insight to be gained in that fashion, if you're willing to dig and sift through the noise, but at best you'll usually get the answer for how someone else (with perfect hindsight) *would* have done it. This is kind of poisonous because, like the TSA, we see clear trails into the past and think it should have been equally obvious in the other direction, and the problem is that clearly we just missed it. From that perspective, it is too easy for the guy giving advice to be taken as a guru. And we don't need any more gurus.

3 - For example in D&D, you an track damage taken, healing surges and powers used, and use that to gauge how well you've been balancing encounters. Unfortunately, there are so many other variables (such as changes form leveling up, different monster abilities and dumb luck) that it's only so useful.

4 - There's a dirty trick implicit in this, and that is this: you get what you measure for. As such, deciding what to measure for is also an implicit declaration of what you want to see more (or less) of in a system. There's some really fascinating stuff about this and the impact of the Apgar Score (a measure of the health of newborns) in Atul Gawande's Better.

11 comments:

  1. For me there are a few things.
    •Does the game system resolve actions in roughly the way I expect?
    •Does the game system allow for my players to do any action/maneuver they can think up easily and fluidly?
    •If the game doesn't cover a given action/or maneuver is a fix easily implemented?

    That's system stuff. It encompasses things like dice mechanic, combat actions etc.

    For setting I gauge it on...
    •Are there lots of things I want to explore?
    •Are there loose ends that I want to grab ahold of?
    •Does the premise of the setting compel me?
    •Do the pictures and flavor writing inspire me?
    •Do I fall asleep reading it? (I get about 5.5 hrs of sleep a night not by choice. Regrettably books have to be amazing not to put me to sleep nearly instantly, if a RPG book can keep me awake reading it, the writing is superbly compelling.)

    And then there are game aspects. This is something I only came into realizing were important recently. My friends have always been RP'ers that wanted accurate simulation rules, and had pretensions of telling some sort of TV-worthy script story with it and would totally overlook the fact that their players were showing up to play a GAME. Now I have a new set of metrics that only a precious few games I've come across even get close to passing grades on. Interestingly the game I've found gets the best marks in this category is a game that I've despised for over a decade.

    •How easy is it to build adversaries/obstacles/scenarios? •What tools does the game offer to facilitate this?
    •How many premade adversaries/obstacles/scenarios are there?
    •How easy is it to customize adversaries, obstacles and scenarios and what tools does the game offer to do this?

    Those are my metrics... take them as you will.

    ReplyDelete
  2. More than anything, I like that your metrics are easily pointed at the player rather than the GM. I must think on this.

    -Rob D.

    ReplyDelete
  3. Maybe "nothing is perfect" but I want the Holy Grail - a universal system - so my meter may set the bar too high. I think the real test is to separate system and setting. A system may rock in it's native setting but how easy is it to plug a new setting into the system. Sometimes system and setting or so tightly entwined that you can't separate them. When the two can be separated, I turn to the forums. If people are struggling to hack the system to a new setting, then something is missing.

    ReplyDelete
  4. Lots of things going off in my head right now, but here's what comes to mind first: You're wrong, Rob. (Man, it feels weird to say that.)

    You're right that the group needs a feedback loop. However, for something as non-quantitative as the social space occupied by gaming (and for the small number of people that make up the group), I can't imagine that metrics are the best way to do that.

    Metrics are almost always substitute measures. They stand in for things that you want to directly observe but can't. With a gaming group, it seems to me that much better approach is to just have a regular retrospectives, where people can talk through as a group about what their guts are telling them, so as a group you can figure out why their guts are telling them that.

    ReplyDelete
  5. Paul: We are totally beering this at origins.

    I'll concede that metrics are more useful with a larger sample set than we're probably talking about, but i think there's a lot more flex in their utility than you're giving them. Consider that the step of checking something, even something which is almost entirely arbitrary, can be enough to make you _think_ about it, even i you might normally take it for granted.

    That said, I concede that some of the benefits have serious bleed over with Checklists, but that's its own post.

    -Rob

    ReplyDelete
  6. Beering is on.

    Have you read Results Without Authority by Tom Kendrick? It's about project management, and the fourth chapter has one of the best discussions of whats and wherefores of metrics (like the differences between predictive, diagnostic, and retrospective metrics).

    Regarding your checklist/retrospective comment: That's what end-of-session Artha awards in Burning Wheel/Burning Empires/Mouse Guard do.

    ReplyDelete
  7. @Paul Oh, man, if you have not read Checklist Manifesto by the time origins roles around, you are so getting an earful. :) Short form, there's a lot more to it than that.


    Kendrick added to cue, and I'll check it out.

    All of that out of the way, the thing I feel is lacking is any language our ideas for describing our failures. I propose metrics on the idea that bad measurement is still going to be more useful than no measurement at all, which is what we currently have. Worse, since this is not business or sport or the like, we don't even have the external yardstick for success (did this produce a good? Did it make money? Did we win?) to wrap the process in, so this is _totally_ grasping at straws.

    Which may be futile, but i still feel we need _something_.

    -Rob D.

    ReplyDelete
  8. @ biff-dyskolos
    Not to be contrary, but let's not forget that some people like to use the opposite metric; that is, if a system is not tied to and supporting the setting or genre it was built to support, then it's not doing its job.

    ReplyDelete
  9. @ Nick
    You are not being contrary at all. I think we are saying similar things. However, I don't think that "tied to" and "supporting" are the same thing. Any system that does not support it's native setting is a bad system to begin with - no more need be said about it. What I am interested in is taking a good game and seeing if it's system can support a different setting. Sometimes the system is "tied to" it's native setting so tightly that you can not separate the two.

    ReplyDelete

Note: Only a member of this blog may post a comment.