Evaluation: Thinking Aloud

Empirical Evaluation Script

    Hi, I'm Lorenz and I welcome you to evaluation practice!


    Now, we get to the information you need to conduct an evaluation yourselves. First, we will speak about empirical evaluation. Here, you can see what we mean by empirical evaluation in contrast to analytical evaluations. Empirical evaluations include users while the analytical ones work with experts. Different methods rank among empirical evaluations, for example focus groups, interviews, field studies and experiments. For detailed information check the literature link provided below this video.


    This is the empirical cycle based on A.D. de Groot. Normally, we start with an observation, generating for example qualitative data, which can help us forming hypotheses. In the induction phase we find explanations for the observed phenomena and formulate the hypotheses. In the deduction phase we think of ways and methods to test our hypotheses, so we deduct consequences of the hypotheses as testable predictions. Then, we test the hypotheses and collect new empirical material, which we evaluate and interpret afterwards. The evaluation can lead to a theory, which should be the most reasonable explanation for the observed phenomena as a result of the experiment.


    You might wonder what I mean by saying experiment. It is a method to test hypotheses or gather new information or to discover new knowledge. By using experiments we can investigate the correlation or relationship between different variables. A usability test for example, is applied experimentation.


    One form of experiments is called task-based experiment. Before applying the method, we have to define a task that participants have to perform. The task we choose is dependent on the research question we try to answer by means of this experiment. We can measure different objective data, for example how many errors are made, how long does the task take or we measure subjective data like how satisfied were the users performing the task. Even though we defined the task, the participants have to perform. This is a natural kind of testing since usually the tasks are made up in a way so they closely resemble tasks that real users would perform while using the systems. One downside of the task-based experiment is that it requires the layers used for the task to be implemented or at least a wizard-of-oz setup.


    We can use a wizard-of-oz setup in an experiment when we want to simulate a degree of maturity of a prototype. For example, we can simulate an intelligent system with speech output even though we have not yet implemented this. The test persons in this experiment think they interact with, for example, an AI but in reality they interact with another user or an expert acting as the system. In this setup, it is important that we choose tasks that can be sufficiently performed by a human instead of a computer. An additional caveat of this method is that we have to be careful with interpretation, since the wizard might perform better, for example in the speech recognition than an AI would. This could lead to unrealistic expectations towards the system's capabilities.


    Another good way to gain insights from different people into different topics are interviews. For example, we could interview users about the clarity of a software function, presentation or how they like the interaction with the software. Moreover, we can find out which difficulties users face while using the system. But we can also interview for example stakeholders to find out what they attach importance to. For further information on interviews we recommend you take another look in the analysis section.


    Field studies: another way of collecting data can be conducted when we want to observe users in their natural environment using our system to find out which problems they might face during regular usage under realistic conditions. Another interesting use case for field studies is to observe users performing a task before we even have built a prototype to generate ideas and identify opportunities for new technology. We can gather information we can use for defining requirements and therefore lay a foundation for the user-centered design process.


    One way of qualitative evaluation is the focus group discussion, or also called interview. We have already talked about this in the analysis section but here on this slide you can see two important facts about the method. By showing a group of users a prototype we can find out about their opinion, which can be done early in the design process. By interacting with each other, the users might recognize positive or negative aspects about the prototype they maybe would not have noticed alone. But this group dynamic can also influence the subjective opinions which should be kept in mind when performing this method.


Run a thinking aloud experiment, as per Nielsen’s (1994) paradigm Discount Usability Engineering (a small group of 2-5 persons is sufficient). An important rule is established in Pereyra's (2015), principle 10: "Don't grade your own homework". In 10 steps to engaging user experience: involve users as subjects, you’ve not asked yet!


Record the evaluation in a suitable manner and discuss usability issues and possible improvements.

Further reading

Guerrilla HCI: Using discount usability engineering to penetrate the intimidation barrier by Jakob Nielsen (1994), paper

10 steps to engaging user experience by Irene Pereyra, paper

The 20 UX tips you need to know by Jamie Shanks, online article

Task 4

Upload a short usability report including your conclusions (not more than 1 page) to a folder iteration-1/thinking-aloud in the GitHub master branch.