Posts Tagged ‘quantitative’



23
Jun

Remote Research – Book Review

Remote Research is a new book by Nate Bolt and Tony Tulathimutte, who have worked with the UX agency Bolt | Peters on a wide range of studies, with clients such as Wikipedia and Electronic Arts (I recommend watching the funny out-takes of Spore user testing).
Their new book sums up their experiences with performing remote research (Tony has previously discussed this subject on this blog, in the comments here), and gives clear instructions on how others can perform a wide range of usability and user experience studies with people who are physically distant, by using the internet.

Remote Research

Don't judge it by it's cover...

Why would you consider remote research?

Written by advocates of remote research, the book highlights many of the potential advantages that remote research gives compared to a more traditional lab based study. These advantages are fleshed out throughout the book through testimonies of experts who have experience in this field, who offer real world examples to emphasise these points.

Some key advantages are:

  • Access to a geographically diverse user base. Unlike traditional research, where a moderator would have to be in the same physical location as the subjects, remote research allows a study to be run with anyone who has a high speed internet connection, widely expanding the potential study-group.
  • Easy to let stakeholders get involved. Because the research session is being broadcast over the internet, it’s possible to allow stakeholders (i.e. executives and designers) to view the session, and give (moderated) input. This of course increases their engagement with the process, and will be the ‘evidence’ for any conclusions derived from the research.
  • Natural browsing environment. The validity of the research can be improved, not only because you are allowing the user to perform the task in a familiar environment (their own home computer), but also some recruitment methods allow you to capture a user performing a task they have selected. For example, recruiting a user who came to the site to buy trousers, for a task based on buying trousers, would provide more accurate results than asking someone to pretend to buy trousers…
  • Cheaper (debatably). Not having to pay for travel can keep costs down, however other costs, such as incentives, will still be required, as well as paying for the software.

The remote research book doesn’t advocate killing off lab tests though – instead, it recognises that there are cases when the lab is still appropriate, such as when privacy is a concern. The book also features Andy Budd’s defence of the lab, which argues that remote research fails to pick up aspects of non-verbal behaviour, as well as arguing that remote research doesn’t just remove a selection bias (geography), since it also adds another (internet speed and technical ability). It’s brave of the book to include the case against remote research, and helps project a more trustworthy and reliable image for the book itself.

How to do remote research

The ‘meat’ of the book are the sections dedicated to how-to guides on the different forms of remote research. The book contains step by step instructions on performing moderated or un-moderated research, and includes key topics such as recruitment (and live recruiting), card sorts, and lots of handy hints – such as using IM clients as a chat room for multiple observers to automatically share and timestamp notes.

The book doesn’t just cover basic topics – it goes on to develop novel approaches to user research, such as using ‘reverse screen sharing’ to protect confidential software or data, and using mobile web to gain a new understanding of time-dependant information, outside of the traditional moderated setting.

It also extends the remits of remote research – it doesn’t have to just be websites, but can include doodles or sketches, as well as developing ideas for automatic research with analytics.

Chat Roulette

Another sort of remote research?

Conclusion

Remote Research is one of the easiest to read UX books I’ve reviewed. Like many Rosenfeld publications, it is laid out well, without appearing dense with text, and has a friendly tone throughout. The book can be likened to Krug’s writing in its style, and presentation.

The book is also practical and realistic, and deals with real world issues, like ‘fakers’ (who can be outed by using open ended questions to discover motives), legal issues, and common challenges such as reluctant stakeholders.

Most importantly for the practical UX practitioner, the book is not dogmatic. This is especially evident in the last chapter which admits that usability shouldn’t be the exclusive goal of product design, and needs to be coupled with initiative, and innovation to develop great things.

Overall this book is a great introduction, and how-to guide to the growing field of remote research, and will be an important tool for anyone trying to keep up to date with the latest research methods.

20
Apr

Understanding players through biometrics

Last week UXBrighton hosted an event focused on Biometrics, which featured an interesting presentation by Vertical Slice.  Pejman Mirza-Babaei presented his PhD research on the application of biometrics to help understand a player’s experience when playing games. This was presented as a ‘guerrilla’ method, since it was a speedy and rough implementation, not a definitive and comprehensive methodology.

We’ll be looking at what biometric research is, how it can be applied to games research, and the problems that became apparent with this method.

What is biometric research?

Biometrics are traditionally an automated way of recognising, or recording, peoples physiological data, or characteristics. To apply this to video-games readings were taken by Vertical Slice by hooking players up  to machines which record their heart rate, brainwaves, or galvanic skin response (…how sweaty their skin is, presumably).  It’s proposed that there is some correlation between how their bodies react, and how the player is feeling – such as how a player’s heart will beat faster while fighting Gunther Hermann’s Skull Gun or scoring a tetris.

Pejman Mirza-Babaei has been investigating how this can be applied to games research. Working with Vertical Slice, he is interested in measuring the player experience – how to know when players are having fun, or becoming frustrated, and so has been performing studies to see the feasibility of measuring this with biometric data. By having players play either Haze, or Modern Warfare 2, while hooked up to this machine, maybe it’s possible to gain a greater insight into the player’s thoughts, and how they feel when playing.

Clockwork Orange

And in a non obtrusive way...

What did biometric research show?

When playing the games, the player’s heart rate and GSR ratings (that sweatiness rating) were recorded along with a video of the player, and of their screen.  What was found from the biometric readings, at the simplest level, was when the player’s heart rate went up. The researcher would then conduct an interview after the gaming session, and ask why the heart rate went up at those points, for the player to justify the measurements.

We saw examples of these spikes when the player enjoyed, or was frustrated by, a task (such as using a machine gun, or getting stuck looking for a vehicle), and were given the player’s justifications for feeling like this.

The most obvious advantage of this method is that it allows a more natural environment to be created for the player. Since biometrics doesn’t require distracting the player by asking them to perform a  think-aloud, or by interrupting their game by asking questions and yet still having a degree of insight into how they are feeling, a more natural game play experience can be achieved, without stopping useful data from being gathered.

Problems with biometric research on games

However, some limitations on the application of this technology became obvious through the presentation. Biometric data (in its current form) doesn’t give any insight into why the player’s heart rate has spiked, just that it has. This problem is exacerbated by the single range of readings it can give – there is no way to distinguish between stress and happiness (or any other reasons a heart rate can spike).

Exciting Vehicles

such as exciting vehicles

Because of this, biometric readings have to be justified by another method, to give some understanding as to why the heart rate spikes at certain moments. Traditional UX methods, such as a post-test interview, are therefore needed in addition to biometric readings. However this reintroduces traditional UX problems. A player may not be able to accurately remember why they felt excited at a certain moment, and as pointed out by Sam Nixon, may simply justify their opinion by what they see on screen.

For example, the player may explain a heart rate spike caused by audio cues as being caused by the enemy visible on screen when the clip is shown later, missing the real reason for their excitement.

Conclusion

So biometric readings alone cannot tell us what a player is thinking. Hence they cannot (currently) be a replacement for traditional UX methods.

What biometric readings can do, is aid the application of current UX methodologies. When combined with tools such as think aloud, or interviews, they can add weight to the findings. For a think aloud, it can tell you which parts of the game particularly affected the player, and hence what comments to pay attention too. Similarly with interviews, biometric research can pinpoint the areas that the player should be asked about. When used in combination with typical UX tools, biometric research can be justified and have some understanding applied to its findings.

There is amazing potential in the application of biometric data to games. Currently, the ‘AI director’ in Left For Dead controls the game based on how the player is doing – giving less zombies to fight if the player is doing poorly, or making the game harder, and giving the player some nasty surprises, if they are doing well. Imagine if a system like this could take biometric data into account, and change the game experience based on how the player was feeling. Vertical Slice have begun to show us the potential of this technology, and I feel we’re at the start of an exciting journey.

22
Mar

Quantifying the unquantifiable – Expert Evaluations

At a recent UXBrighton talk, iCrossing presented an interesting idea about applying metrics to expert evaluation. This is a potentially controversial topic, yet has numerous benefits if it can successfully make qualitative data quantitative (and turn impressions and thoughts into numbers). I’ve outlined the method, and my thoughts on the issues around this.

The UXBrighton event was presented in a new format as a series of short talks, from Harry Brignull’s tips on time stamping notes, to Danny Hope’s templates for understanding user roles. Also interesting was a talk on using google analytics, although the length of the talk meant that topic could only be skimmed, dissapointing as I’m an analytics fan. The most interesting idea presented was iCrossing’s presentation on “The iCrossing Connected Brand index: how to measure a brand’s effectiveness online”, given by Ifraz Mughal.

Expert Evaluation

As I’ve mentioned before an expert evaluation is a useful tool for getting an insight into potential usability and user experience issues on a website, or game, with limited resources. Although it can never replace running tests with real users, it can provide a quick approximation, and help highlight the biggest issues.

The ‘method’ for an expert evaluation is simple. Get an expert to look at the site, or game, and tell the client what they think. Job done.

scientist with test tube

My expert eye tells me you need smarter users...

However an expert evaluation can only ever be subjective, and this is it’s biggest weakness. A client can look at your page full of recommendations, and dismiss it as the opinion of one person. There’s no easy way to see progress with changes, and a comparison with other sites can only ever be abstract.

Quantifying an Expert Evaluation

iCrossing’s solution is to quantify their expert evaluation. As part of their ‘Connected Brand Index’ idea, they rate their clients sites (and competitors), on UX-centric areas such as “usefulness”, “usability” and “desirability”.

A traditional expert evaluation would give a qualitative rating, and give examples to back this up, i.e. “Poor – little emphasis, and diffused call to actions”. Instead iCrossing will give the site a score, on a scale of -2 to 2 (2 being very good). This of course can be backed up with examples in a more in depth report.

kittens in a cup

after the first few pages, the report can just be pictures of kittens. No-one reads that far.

The advantages:

There are numerous reasons why a client would prefer a scored ‘rating’, rather than comments.

  • A ‘score’ makes it easy to benchmark, and compare your own scores against competitors. By dividing the expert evaluation into separate topics, and scoring each, a finely grained comparison can be made, and communicated
  • Similarly, a score makes it easy for a client to see progress. If they scored -1 before hiring you, and 1 after, your work can be justified (as long as no-one questions who is doing the scoring!)
  • Because this produces a concrete score, clients will be able to handle and communicate the data. Graphs can be made, which wouldn’t be possible for subjective comments. These can be invaluable for justifying and communicating with managers and project sponsors, who do not need to see the details, just get a high-level overview.
  • This expert evaluation can be encompassed as one aspect of a larger ‘score’ given to websites, or games. This is the idea behind iCrossing’s connected brands index.

Conclusions:

There is an argument this can be seen as a bit of a scam. Giving arbitrary numbers to your opinions doesn’t make them any less subjective. This method of presenting the data could be misleading if presented incorrectly, and the client should be made aware of the method behind the score system. This could become an issue when running comparative studies before and after your work, since you’d be biased towards giving the site a better score after you’ve worked on it.

The point of this method is to aid communication with the client, and give them data in a format that is useful to them. As I discussed in the review of Selling Usability, management and non-technical people would typically much rather see pretty graphs, and statistics, than a list of comments. This method helps manage client expectations, and gives them what they want.

To make the method more valid, it would be useful to perform a study to ensure the method is sound. Perhaps get a wide range of experts to independently rate a wide range of websites on this scale, and note the correlations between the scores. It’d be first step in countering complaints that this method is still inherently subjective, and help make an art into a science.

16
Feb

A Terrible User Experience & how to fix it– Zoomerang.com

When running a website, its important to make sure that the user can achieve their goal with the minimal fuss. This is especially important if you are selling, or trying to sell, a commercial service. I recently had to use Zoomerang.com, a survey site, and had a few notes about the user experience. As you’ll remember, I don’t rant often…

I’m in the process of designing a GPS game, and am currently discovering the functional requirements for the project. As UX practitioners, we know that involving the user is of critical importance at this stage, hence we designed a questionnaire to establish peoples experience, and perception of GPS games, and what they’d like a GPS game to be like. (linked here)

When at university, our internet access goes through a proxy server, which blocks unsuitable content. For some reason, this includes surveymonkey.com, a site I’ve used a few times in the past to construct online surveys. Interestingly, the ‘site blocked’ dialogue said “for survey sites, try zoomerang.com”. However, when I search for some hardcore action, it never gives me alternate suggestions for that. Have I uncovered a conspiracy? Nonetheless, I followed the link.

And so I ended up on zoomerang.com. Being fair, there is one key advantage to zoomerang which immediately put me in a good mood. On surveymonkey, for a free account, you are limited to ten questions. On Zoomerang, you can ask 30 questions before you have to pay. This meant we didn’t have to redo, or concatenate our questions, and made me smile inside

smiley eye

Pictured: an inside smile

Problems with Zoomerang.com

This goodwill was shortlived, when I tried to use the site to implement my questionnaire. Heres why:

  1. The workflow isn’t clear when making a survey, and so I entirely missed the step where you add your questions. Clicking through the process actually caused me publish a blank questionnaire. Which wouldn’t be a problem, except…
  2. …You can’t edit an existing survey. Once its published, you cannot add/remove/change questions. Surveymonkey allows this. So I was stuck with my blank survey, and had to start again from scratch.
  3. Having figured out how to add questions, I got started, and selected “insert question”. It added a header, which then had to be changed to type question. I guessed that was because it was my first item, but no, it always defaults to inserting a header (odd, since you’d only need 1 per page, whereas you’d need multiple questions).
  4. So I finally got to add a question, and this is when the terribleness of the design struck me. I selected a question where a radio button would select from a number of answers, and typed in my list of 15 or so alternative answers into a rich text field. I hit submit, and … got an error, saying “answers can only be 1000 characters, including HTML”, and even worse…
  5. …It deleted the data I had entered in that field. All 15 answers. This is a critical failure of any system, since the data a user inputs should be considered sacred.
  6. There was no counter telling me how many characters I had entered, so I had to retry a few times. Eventually I realised that I could only enter 5 potential one word answers before it’d error that I was over 1000 characters. That had to be a mistake? I investigated further…
  7. …Looking at the HTML, it turned out that the rich text editor was writing rubbish html. At the start of each answer, it’d add needless style tags, often multiple times. Heres an example of the HTML it generated for my one word answer “complicated”

    <p><span style=”font-family: Arial; color: #000000; font-size: small;”><span style=”font-family: Arial; color: #000000; font-size: small;”><span style=”font-family: Arial; color: #000000; font-size: small;”>Complicated</span></span></span></p>
  8. …no wonder it was hitting the character limit after 4 or 5 words. I had to manually enter the html for all the possible answers, just so I could get round this.
  9. My last fault with zoomerang.com is just a suspicion. I look after my email accounts, and so have never received spam in my current primary address. After signing up for zoomerang last week, I received my first random spam email. Might just be a coincidence, but I didn’t sign up for anything else that week!
Code HTML Guy

I had to call this guy to fix my survey

How to fix zoomerang.com

To improve their user experience, they should look at red-routing the goals the user needs to achieve:

  1. Make the progression through survey design clearer, highlighting which step questions are added in
  2. Also make it clear how far through the design process you are, and what steps remain
  3. Restrict what the user can do, so they cannot post a blank questionnaire. Its obvious if they are about to do this that they’ve made a mistake, tell them!
  4. Don’t make question types default to “header”. Surely users will only use this type once at most, whereas they’re going to have more than one question on the questionnaire. Make it default to that!
  5. Fix the WYSIWYG code generator, so that the user doesn’t have to manually code the answers in HTML. A lot of user’s would get stuck at this point!
  6. Don’t send me spam!

And what can you do, until these fixes are made? Use surveymonkey.com. Or, if you’ve found anything better, leave a comment and let me know!

25
Jan

The Likert scale – Or “How I learnt to stop worrying, and ‘strongly enjoy’ the bomb”.

As a practitioner of usability or user experience, a common way that you will attempt to investigate a user (or player, or customer)’s perceptions is through designing and implementing a survey. In designing a survey, its important to consider the format that questions come in, especially with common question types such as “How frustrating did you find this level?.” Today we’ll look at one of the most common question formats, the Likert scale, and the implications that using it has on your studies.

What is the Likert scale?

Lets start with an example.

Most people have seen a Likert scale before. Do you agree with this statement?

  • Strongly agree
  • Agree
  • Neither agree or disagree
  • Disagree
  • Strongly disagree

And the responses should be balanced... unless you have an agenda

Often used to gauge opinions, they are especially important for people involved with measuring usability or player experience, as they can help quantify subjective things like a user’s experiences. They are usually in the form of a statement, followed by a selection of statements, to indicate how far someone agrees with the statement. They can often be used to quantify things like ease-of-use, or fun, which would be impossible to quantify through other methods. Hence they are of particularly important for us, since user experience is essentially abstract.

Different kinds of Likert scales.

The essential question when it comes to implementing a Likert scale, is how many responses to offer.

‘Forced Choice’ scales are those which have an even number of options. Essentially this means missing out the ‘neither agree or disagree’ option, and forcing the participant to make a selection (see what they did with the name? very clever!). This would be done to force participants to show an opinion, but there are dangers inherent with this. Forcing a response may give a larger degree of ‘static’ in the responses, reducing their accuracy, since the responses may not map their opinions. People who don’t agree or disagree may not be happy about being forced to give an opinion, reducing their likelihood to answer later questions accurately. However if your aim is to support a conclusion that people do/do-not like a system, you may be willing to risk these to prove your point when designing the survey.

Forced choice means its hard to tell who is neutral, and who doesn’t want to participate

If you select to use a scale with an odd number of options, there are a few issues that should be kept in mind when deciding between a five or seven point scale. The most obvious difference is that a finer grain of responses can be analysed from a seven point scale, as it can represent a wider range of views. Also, take into account that it’s been shown participants shy away from the ‘edges’, the extreme like and dislike options offered. This means a five point scale will likely only get responses in the ‘slightly’ columns from all except the most ardent fanatics. Again, you have to consider whether a wider range of responses is useful to the topic you are exploring.

Should you use a Likert Scale

Ultimately if you are trying to track opinions, a Likert scale is a good method of accessing this data. There is no all-encompassing correct answer for which scale is appropriate, the context of use and what you want to find out will all affect this. As long as you keep in mind that not only the phrasing of the question, but the range and number of responses you offer will affect the results, and anticipate this affect, you can’t really go wrong. Happy surveying!