A common question for user researchers is how many participants are required to find ‘true’ results. Clients can say “but that was only one user” as a reason to disregard a finding. A key part of our role is to explain that one user can be very significant, for example in a 5 user study, that 1 user could represent 20% of the audience.
However this doesn’t mean that 1 user is always significant for every question. In this post I will look at how opinion findings differ to usability findings, the risks to be aware of when reporting them, and some best practices when working with opinions.
Usability vs. Opinion
With usability issues (whether a participant could understand or perform an action), the literature is very definitive. Jakob Nielsen explained in 2000 that 5 users will find 80% of usability issues (that will affect atleast 31% of the audience), and has since reiterated that number in subsequent blog posts. However not all issues are usability issues.
Opinions are what players thought about the game, or an aspect of the game, for example:
- “I liked this game”
- “My favourite level was the swamp world”
- “The character sounds stupid”
- And more…
The difference between opinions and usability issues, for us as researchers, is that opinions are not cumulative and can differ. One player can like something another player hates. This makes it difficult to see where the consensus lies on opinions.
The dangers of reporting opinions
Because we typically test with 5 users, we have very little insight into how representative the opinions we uncover are of the opinions of the wider audience. For example, we might have just happened to pick the only 5 people in the world who liked level 1 best, whereas everyone else likes level 2.
This means we are restricted in how we can report opinion findings in the following ways.
- We have no confidence in prioritising or weighting the issues
- We have no ability to highlight which opinions are “true”
- We cannot disregard any of the opinions either
So at best all we can do is present the opinions, unfiltered, grouped by subject. However even this can be misleading, because we do not know if we saw all of the possible opinions that the full audience would have expressed. So returning to our example, the team may get rid of level 2 because we reported that participants liked level 1 best, but if we’d asked every player in the world, level 2 may have been preferred to level 1.
What’s the solution?
Testing with a larger audience helps researchers ensure that they are hearing all of the opinions that could be expressed about a game. Researchers can then get greater confidence in the range of opinions expressed, though it would still be risky to prioritise or quantify them without huge sample groups.
A statistical tool that can help researchers with opinions is the Wald Test. It helps tell, from your sample size, where the true value would lie in a larger sample group. This allows us to quantify qualitative opinions, and see how representative it is of a wider audience, based on how many people said it. This online tool does all the hard work for you, and can help you learn – for example – that if 1 out of 20 people expressed an opinion, the true value is that somewhere between 0% and 25% of people may share the opinion.
Even with these in mind, we still need to be very careful about how opinions are reported, due to the risk of misunderstanding. The client may want to use the numbers to quantify or prioritise issues (e.g. 15/20 people said this, so it must be true), and this can be risky, because the opinions we received may not be representative of the wider audience’s true feelings. As such, it’s important to educate your clients to this risk, or even not report opinion data at all because it can be (and often is) misleading.