Quantifying the unquantifiable – Expert Evaluations

At a recent UXBrighton talk, iCrossing presented an interesting idea about applying metrics to expert evaluation. This is a potentially controversial topic, yet has numerous benefits if it can successfully make qualitative data quantitative (and turn impressions and thoughts into numbers). I’ve outlined the method, and my thoughts on the issues around this.

The UXBrighton event was presented in a new format as a series of short talks, from Harry Brignull’s tips on time stamping notes, to Danny Hope’s templates for understanding user roles. Also interesting was a talk on using google analytics, although the length of the talk meant that topic could only be skimmed, dissapointing as I’m an analytics fan. The most interesting idea presented was iCrossing’s presentation on “The iCrossing Connected Brand index: how to measure a brand’s effectiveness online”, given by Ifraz Mughal.

Expert Evaluation

As I’ve mentioned before an expert evaluation is a useful tool for getting an insight into potential usability and user experience issues on a website, or game, with limited resources. Although it can never replace running tests with real users, it can provide a quick approximation, and help highlight the biggest issues.

The ‘method’ for an expert evaluation is simple. Get an expert to look at the site, or game, and tell the client what they think. Job done.

Scientist with test tube

My expert eye tells me you need smarter users

However an expert evaluation can only ever be subjective, and this is it’s biggest weakness. A client can look at your page full of recommendations, and dismiss it as the opinion of one person. There’s no easy way to see progress with changes, and a comparison with other sites can only ever be abstract.

Quantifying an Expert Evaluation

iCrossing’s solution is to quantify their expert evaluation. As part of their ‘Connected Brand Index’ idea, they rate their clients sites (and competitors), on UX-centric areas such as “usefulness”, “usability” and “desirability”.

A traditional expert evaluation would give a qualitative rating, and give examples to back this up, i.e. “Poor – little emphasis, and diffused call to actions”. Instead iCrossing will give the site a score, on a scale of -2 to 2 (2 being very good). This of course can be backed up with examples in a more in depth report.

kittens in a cup

after the first few pages, the report can just be pictures of kittens. No-one reads that far.

The advantages:

There are numerous reasons why a client would prefer a scored ‘rating’, rather than comments.

  • A ‘score’ makes it easy to benchmark, and compare your own scores against competitors. By dividing the expert evaluation into separate topics, and scoring each, a finely grained comparison can be made, and communicated
  • Similarly, a score makes it easy for a client to see progress. If they scored -1 before hiring you, and 1 after, your work can be justified (as long as no-one questions who is doing the scoring!)
  • Because this produces a concrete score, clients will be able to handle and communicate the data. Graphs can be made, which wouldn’t be possible for subjective comments. These can be invaluable for justifying and communicating with managers and project sponsors, who do not need to see the details, just get a high-level overview.
  • This expert evaluation can be encompassed as one aspect of a larger ‘score’ given to websites, or games. This is the idea behind iCrossing’s connected brands index.

Conclusions:

There is an argument this can be seen as a bit of a scam. Giving arbitrary numbers to your opinions doesn’t make them any less subjective. This method of presenting the data could be misleading if presented incorrectly, and the client should be made aware of the method behind the score system. This could become an issue when running comparative studies before and after your work, since you’d be biased towards giving the site a better score after you’ve worked on it.

The point of this method is to aid communication with the client, and give them data in a format that is useful to them. As I discussed in the review of Selling Usability, management and non-technical people would typically much rather see pretty graphs, and statistics, than a list of comments. This method helps manage client expectations, and gives them what they want.

To make the method more valid, it would be useful to perform a study to ensure the method is sound. Perhaps get a wide range of experts to independently rate a wide range of websites on this scale, and note the correlations between the scores. It’d be first step in countering complaints that this method is still inherently subjective, and help make an art into a science.

One Comment On “Quantifying the unquantifiable – Expert Evaluations”

  1. Hi Harry,

    The IBF (Intranet Benchmarking forum) go into orgs around the world an measure (amongst other things) the usability of their intranets.

    Last year they asked me to help revise their model, which involves a brief expert evaluation against fixed criteria and a day of quantitative-based user observation.

    I was super sceptical at first, even feeling a bit dirty about the idea of ‘usability-by-numbers’. So we built in lots of measures to
    a) ensure that all evaluators could get close in their findings
    b) there was opportunity to identify the subtlety in the scoring.

    So the metrics are highly specified – with exact interpretations of what should score 0,1 or 2; the report is presented with lots of co-located expert commentary; the reports are all re-scored by another evaluator; scoring is monitored to see if individual evaluators are generally more generous etc.

    The user session are split, with the task completed first so that users can be timed and tracked for comparison, and then encouraged to provide the kind of qualitative feedback that makes studies so powerful. Also, some of the orgs (I think BT included) use the results to drive performance related pay, so objectivity is essential.

    So I agree with your comments. IBF provide a cunning blend of stats and scores – to grab the attention and interest of the management, and screenshot examples and comments – to point the way forward for change.

    Also, the IBF is effectively the study you mention in your last paragraph – a group of independent usability evaluators applying a consistent framework across a big range of intranet sites. Check out the website for more info if your interested in the issues, http://www.ibforum.com or I’d be happy to chat.

    Lou.

Leave a Reply

Your email address will not be published. Required fields are marked *