Posts Tagged ‘qualitative’



26
Jul

Test with real users – not your team

‘Free pizza and coke! Just play our game for an hour’. Sounds like a good deal right? And pretty easy to organise, just pulling kids of the street. It can even be done in the pub, for mobile devices. Even this ‘free pizza’ recruitment is better than testing your game (or website, or application) with people from within your office. But why?

Game development teams need a constant supply of fresh users to test the ‘new user’ experience with. I’ve seen teams keep their project secret from their colleagues, not for official reasons, but so their colleagues can be tested as ‘new users’. Other teams test their games with their HR and secretarial staff, since they are unlikely to have had much exposure to the game.

However it’s a good guideline to never test with your team (unless of course you are building something for them). It’s understandable why this situation arises – often budgets are too tight for intensive user testing, forcing teams to perform ad-hoc tests with their colleagues; however this often causes problems further down the line:

accident at work

such as ...accidents

Dont test with your team

  • Your team are not your users – Unless you are in a very specialist field, or are developing an internal project like an intranet, it’s unlikely that your team are the same people as your users. And they are very unlikely to act in the same way a typical user would.
  • Your team know things users wouldn’t – It’s likely your team will have had prior exposure to your game or application that a new user wouldn’t, and will be bringing prior knowledge to the testing session. This also applies to people who do not work directly on your team. To get a true outside perspective, you need to seek outside users.
  • Your team know you – Unlike a stranger, your team are already know you, and (hopefully) like you. Their answers, and interactions will be biased to please you, and tell you what you want to hear based on what they know about your job, the project your working on, or your beliefs (for example, attempting to validate your design choices).

Advantages of testing with real users.

How people act can often be surprising. If this wasn’t the case, there would be little point in user testing. That’s why it’s extremely important to gather real data, from the people who will actually be using your product. Only real users will approach your product from an authentic ‘new user’ angle, and give an insight into how your product will be perceived and used.

Getting real users involved with product development will get them engaged with the product. Asking their opinions, and being interested in their experience will make the user feel positive about you, and your product, and will mean they will be more likely to purchase it when it’s ready. In newsrooms, this has been widely known for years – hence the proliferation of lists of names in local papers.

Most importantly, involving users will get them talking about your product, generating true grass roots ‘word of mouth’ promotion (hopefully without breaking any NDA’s!). Giving customers an early exposure to your product can build excitement, and market your product for free!

megaphone

saving millions on megaphones

Conclusion

Finding real users can be cheaper than you think. Not only is it possible to pull people off the street, using the methods suggested above, but new usability testing methods such as remote user testing allow you to find and test real users from the comfort of your office, for very low cost. These days there’s almost no excuse not to test with real users, and it can be just as easy as testing with your team, with much more rewarding returns.

7
Jul

The Problems with Surveys for User Experience Tests

In the run up to Margaret Thatchers’ election victory in 1979, a poll was taken to estimate who would vote for her. Only 1 in 100 said yes. However, as revealed by the final results, 1 in 3 actually voted for her. The poll was inaccurate, and inappropriate for the task.

Surveys are a common tool used to evaluate a participant’s opinions of the user experience, and usability of a system. I’ve written about how to make good questionnaires before, and have often seen them used as a tool when analysing a large group of participants. However, as a method of understanding users, they are imperfect, and not just because they are poorly designed – instead it’s a fundamental problem with surveys. Let’s look at why this is the case, and why people are tempted to use surveys despite this.

Where are surveys used?

When I’ve been involved with user tests for games, I’ve often seen surveys used as a way of recording the player’s experience. For example, after completing a level, or game mode, they would be asked to rate their experience on a Likert scale (1-10), on categories such as how difficult they found the level, how fun it was, how it compared to other levels. This is often complemented by text notes, where the participant can write in things they particularly liked or disliked.

Outside of gaming, surveys can often be found on the internet – such as website’s satisfaction surveys, or on professional survey sites, like Survey Monkey.

Monkey being Surveyed

Survey Monkey in action

Why are surveys used?

It’s easy to understand why surveys are often used when testing user experience. Most obvious is that they are easy to quantify, since the scores are given as a numeric value, which can then be averaged, and given an overall ‘score’. This can then be stuck on a graph, to impress people too busy and important to be involved with the testing itself. Compared to moderated testing, simple analysis is easy, and ‘results’ can be gained with little effort – particularly if an online survey tool is used.

Similarly, with surveys it’s easy to get a large number of opinions quickly, and in a largely un-moderated setting. Hence, 10 (or 10,000) people can test a game at the same time, with only light moderation, and fill out a survey after to record their views. Surveys also don’t require a large degree of specialist equipment – just a printer, and a pen (or they can be done online). This makes them cheaper than many moderated settings, which require a lab decked out with recording equipment.

Problem with surveys

Surveys sound great, don’t they. Cheap, Easy, and give some hard numbers. However, there are a number of problems with surveys, and one key issue that prevent them being suitable for user experience analysis.

First of all, it’s easy for the data from surveys to be misrepresented (either unintentionally or to further a top secret agenda!). Without hard evidence, such as watching (and recording) an individual player of the game, the analysis becomes reduced to which level ‘scores better’, regardless of the intricacies of the play test. Minor issues become lost within the overarching ‘score’.

Much more importantly, the fundamental problem with attempting to understand user experience with a survey is that they log opinions, and not behaviour. People are (sometimes?) stupid, and don’t know what they think. So a player who has had a positive experience throughout a level, and got stuck near the end, will often be left thinking poorly of the entire level. And without an independent observer to monitor, their in-game opinions are lost, or forgotten. Just like I cannot tell how bad my singing is, a player is too close to the subject matter to gain a full understanding of it.

Guitar Hero Fail

Its pretty bad...

Essentially, surveys introduce a layer of abstraction from the game that is difficult for a player to follow. It is difficult for them to recognise what parts of a game made it fun, and which parts frustrated them, and it often takes someone else to spot these patterns.

Pride, and psychology can also be a contributing factor – players who have needed 10 attempts to complete a section will still say it was “easy” after finally completing it – psychologically they will often believe it as well, since they have felt the satisfaction of completing the task. Other times they will be too proud to say the section was too difficult, and lie. Again, this rich data is lost through a survey.

What should be used instead?

To gain a truer understanding of the user experience (or player experience) of participants when testing a system, or a game, surveys are therefore inadequate. Instead, a moderated task based analysis session, which is recorded for later analysis, will give a truer understanding of how the participant found the system, and their true experience, unaltered by their own perceptions. I’ve written about recording these sessions before, and will discussed them further in the future.

As we have seen, surveys are cheap and easy, and hence should not be disregarded entirely. However they should not be used exclusively, as they can miss key user experience findings, and require users to know themselves, and their feelings, extensively.

23
Jun

Remote Research – Book Review

Remote Research is a new book by Nate Bolt and Tony Tulathimutte, who have worked with the UX agency Bolt | Peters on a wide range of studies, with clients such as Wikipedia and Electronic Arts (I recommend watching the funny out-takes of Spore user testing).
Their new book sums up their experiences with performing remote research (Tony has previously discussed this subject on this blog, in the comments here), and gives clear instructions on how others can perform a wide range of usability and user experience studies with people who are physically distant, by using the internet.

Remote Research

Don't judge it by it's cover...

Why would you consider remote research?

Written by advocates of remote research, the book highlights many of the potential advantages that remote research gives compared to a more traditional lab based study. These advantages are fleshed out throughout the book through testimonies of experts who have experience in this field, who offer real world examples to emphasise these points.

Some key advantages are:

  • Access to a geographically diverse user base. Unlike traditional research, where a moderator would have to be in the same physical location as the subjects, remote research allows a study to be run with anyone who has a high speed internet connection, widely expanding the potential study-group.
  • Easy to let stakeholders get involved. Because the research session is being broadcast over the internet, it’s possible to allow stakeholders (i.e. executives and designers) to view the session, and give (moderated) input. This of course increases their engagement with the process, and will be the ‘evidence’ for any conclusions derived from the research.
  • Natural browsing environment. The validity of the research can be improved, not only because you are allowing the user to perform the task in a familiar environment (their own home computer), but also some recruitment methods allow you to capture a user performing a task they have selected. For example, recruiting a user who came to the site to buy trousers, for a task based on buying trousers, would provide more accurate results than asking someone to pretend to buy trousers…
  • Cheaper (debatably). Not having to pay for travel can keep costs down, however other costs, such as incentives, will still be required, as well as paying for the software.

The remote research book doesn’t advocate killing off lab tests though – instead, it recognises that there are cases when the lab is still appropriate, such as when privacy is a concern. The book also features Andy Budd’s defence of the lab, which argues that remote research fails to pick up aspects of non-verbal behaviour, as well as arguing that remote research doesn’t just remove a selection bias (geography), since it also adds another (internet speed and technical ability). It’s brave of the book to include the case against remote research, and helps project a more trustworthy and reliable image for the book itself.

How to do remote research

The ‘meat’ of the book are the sections dedicated to how-to guides on the different forms of remote research. The book contains step by step instructions on performing moderated or un-moderated research, and includes key topics such as recruitment (and live recruiting), card sorts, and lots of handy hints – such as using IM clients as a chat room for multiple observers to automatically share and timestamp notes.

The book doesn’t just cover basic topics – it goes on to develop novel approaches to user research, such as using ‘reverse screen sharing’ to protect confidential software or data, and using mobile web to gain a new understanding of time-dependant information, outside of the traditional moderated setting.

It also extends the remits of remote research – it doesn’t have to just be websites, but can include doodles or sketches, as well as developing ideas for automatic research with analytics.

Chat Roulette

Another sort of remote research?

Conclusion

Remote Research is one of the easiest to read UX books I’ve reviewed. Like many Rosenfeld publications, it is laid out well, without appearing dense with text, and has a friendly tone throughout. The book can be likened to Krug’s writing in its style, and presentation.

The book is also practical and realistic, and deals with real world issues, like ‘fakers’ (who can be outed by using open ended questions to discover motives), legal issues, and common challenges such as reluctant stakeholders.

Most importantly for the practical UX practitioner, the book is not dogmatic. This is especially evident in the last chapter which admits that usability shouldn’t be the exclusive goal of product design, and needs to be coupled with initiative, and innovation to develop great things.

Overall this book is a great introduction, and how-to guide to the growing field of remote research, and will be an important tool for anyone trying to keep up to date with the latest research methods.

22
Mar

Quantifying the unquantifiable – Expert Evaluations

At a recent UXBrighton talk, iCrossing presented an interesting idea about applying metrics to expert evaluation. This is a potentially controversial topic, yet has numerous benefits if it can successfully make qualitative data quantitative (and turn impressions and thoughts into numbers). I’ve outlined the method, and my thoughts on the issues around this.

The UXBrighton event was presented in a new format as a series of short talks, from Harry Brignull’s tips on time stamping notes, to Danny Hope’s templates for understanding user roles. Also interesting was a talk on using google analytics, although the length of the talk meant that topic could only be skimmed, dissapointing as I’m an analytics fan. The most interesting idea presented was iCrossing’s presentation on “The iCrossing Connected Brand index: how to measure a brand’s effectiveness online”, given by Ifraz Mughal.

Expert Evaluation

As I’ve mentioned before an expert evaluation is a useful tool for getting an insight into potential usability and user experience issues on a website, or game, with limited resources. Although it can never replace running tests with real users, it can provide a quick approximation, and help highlight the biggest issues.

The ‘method’ for an expert evaluation is simple. Get an expert to look at the site, or game, and tell the client what they think. Job done.

scientist with test tube

My expert eye tells me you need smarter users...

However an expert evaluation can only ever be subjective, and this is it’s biggest weakness. A client can look at your page full of recommendations, and dismiss it as the opinion of one person. There’s no easy way to see progress with changes, and a comparison with other sites can only ever be abstract.

Quantifying an Expert Evaluation

iCrossing’s solution is to quantify their expert evaluation. As part of their ‘Connected Brand Index’ idea, they rate their clients sites (and competitors), on UX-centric areas such as “usefulness”, “usability” and “desirability”.

A traditional expert evaluation would give a qualitative rating, and give examples to back this up, i.e. “Poor – little emphasis, and diffused call to actions”. Instead iCrossing will give the site a score, on a scale of -2 to 2 (2 being very good). This of course can be backed up with examples in a more in depth report.

kittens in a cup

after the first few pages, the report can just be pictures of kittens. No-one reads that far.

The advantages:

There are numerous reasons why a client would prefer a scored ‘rating’, rather than comments.

  • A ‘score’ makes it easy to benchmark, and compare your own scores against competitors. By dividing the expert evaluation into separate topics, and scoring each, a finely grained comparison can be made, and communicated
  • Similarly, a score makes it easy for a client to see progress. If they scored -1 before hiring you, and 1 after, your work can be justified (as long as no-one questions who is doing the scoring!)
  • Because this produces a concrete score, clients will be able to handle and communicate the data. Graphs can be made, which wouldn’t be possible for subjective comments. These can be invaluable for justifying and communicating with managers and project sponsors, who do not need to see the details, just get a high-level overview.
  • This expert evaluation can be encompassed as one aspect of a larger ‘score’ given to websites, or games. This is the idea behind iCrossing’s connected brands index.

Conclusions:

There is an argument this can be seen as a bit of a scam. Giving arbitrary numbers to your opinions doesn’t make them any less subjective. This method of presenting the data could be misleading if presented incorrectly, and the client should be made aware of the method behind the score system. This could become an issue when running comparative studies before and after your work, since you’d be biased towards giving the site a better score after you’ve worked on it.

The point of this method is to aid communication with the client, and give them data in a format that is useful to them. As I discussed in the review of Selling Usability, management and non-technical people would typically much rather see pretty graphs, and statistics, than a list of comments. This method helps manage client expectations, and gives them what they want.

To make the method more valid, it would be useful to perform a study to ensure the method is sound. Perhaps get a wide range of experts to independently rate a wide range of websites on this scale, and note the correlations between the scores. It’d be first step in countering complaints that this method is still inherently subjective, and help make an art into a science.

8
Mar

Watching ‘average users’: Word

It’s easy to forget how useful it is to watch less technical people use some common programs, and how helpful observation is as a tool to understand the ‘average’ user. I recently watched someone using MS Word (2003 I think), and it was…enlightening. They made a large number of ‘errors’ when using MS Word, but as we know as usability specialists, its not the the user that creates errors – the software does.

The task was relatively simple – design some worksheets, including textboxes, and pictures, and lay them out in an eye-pleasing manner. I’m sure there are many more appropriate packages to make this in than Word, but it was the user’s software of choice, due to familiarity, and the task shouldn’t be beyond MS Word. I observed, and let them lead the interaction, but advised when they asked for help (naughty I know, but it wasn’t a formal lab setting!)

Muppets - Beakers Lab

The lab was busy that day anyway...

How my ‘less-technical user’ used Word:

I noted down (obviously away from the user) some of the more ‘interesting’ characteristics of how they used Word.

  • Used the ‘cut’ function as a ‘delete’ (with no understanding of how it links to paste). Taken out of context from “cut and paste”, ‘cut’ would more likely imply removing or ending something, and so this mistake is understandable. Incidentally this method has some pluses. I still don’t know how to remove a table easily (not just the information within it), and cut seems to do this.
  • No knowledge of the alignment tools, and so using spaces as a method to align text to the center or right. Obviously ran into problems when editing the text later, as changes would make the text run over the end of the line, ruining the formatting.
  • Drew horizontal lines, across the page (i.e. a space to write in your own answer) with –‘s. Seems a pretty effective method, even though I’m sure Word has its own way of doing this. Is there a better way of doing it?
  • Displayed difficulty moving images in Word. Is it right that you have to click on an image twice to move it? The first click just gives you resize options, which confused the user.
  • Had difficulty with resizing objects. What happens if you make an image so big that it falls off the edge of the paper, and you cannot see the border to make it small again? I guess you could format picture, and manually change the size, but this is an entirely different method of resizing, and isn’t cognitively related to the standard way.
  • Constant (constant!) rewriting of words, when word autocapitalised/auto formatted them in an undesired way (which was seemingly every autoformat). User had to delete the word, and re-write each time.

What could word do to improve?

This immediately throws up some questions about how Word was developed. It’s clear that the tools available, such as the alignment, or horizontal lines, are not making their functionality transparent to new users. It wasn’t clear to my user that they existed, or how they should be functioning. Obviously just having the icon on the toolbar isn’t enough, and this should be rethought.

This was also the case with image manipulation. The functions that the user needed do exist in Word (i.e. resizing, moving), but are modal in nature, and so are difficult to find, and don’t offer a consistent user experience to someone who is not familiar with Word’s nuances.

It’s also clear with auto format in particular that the system isn’t adapting to the user’s needs. The constant changes that Word was making to the user’s document, which were then undone each time only created a large degree of frustration in the user. The software should be learning how the user wants auto format to work, and adjust to their preference. In this user’s case, it was causing trouble, and should have turned itself off (or at least given the option)

Clippy

What they need is some sort of helpful assistant

What should we learn from this?

It occurred to me that these issues were not unique to the user I watched since I encounter similar problems with Word. The difference is I’ve had enough familiarity to learn the workarounds, or solutions to these problems that Word throws at you. For example, it’s an unthinking reaction to press Ctrl+Z after Word incorrectly auto-formats things incorrectly. My user just hadn’t used the program for long enough to train that reaction, and so word’s error became more of a big deal.

Its important when considered usability to realise that users aren’t just like you. If you are in a position to make a difference with usability, it’s very likely you are not an ‘average user’, and as such its difficult to comprehend how ‘average users’ use software.

‘Average users’ are not stupid. They are your mum, and just don’t have the time, or effort to put into learning these workarounds, or making them second nature. The solution, rather than ‘educating’ users, is to make the programs better; make programmers understand who their users are, and how they use the programs. And make them program for the ‘average’ users, rather than the power users. And that is the point of usability.