Evaluation: Quantitative Evaluation

Quantitative Evaluation Script

    Now, depending on the type of data that you gather, we generally distinguish between qualitative and quantitative evaluations. While qualitative evaluations, as you can probably guess, focus on qualitative data that is used to generate possibilities for improvement. Quantitative data is usually acquired to rate a certain system's performance.

    In the following we will dive in depth into quantitative evaluations. Everything regarding qualitative evaluation can be found on the website.


    We have many possibilities of gathering quantitative data.

    On this slide you can see an overview of quantitative metrics clustered in task-related, error-related, session-related and engagement-related metrics. One simple example for the first one, task-related, is a measure of how long a user needs to achieve a task. For the error-related metric we could measure how many mistakes the user makes while achieving the task. A session-related metric would be, for example, the time a user spends on different screens within the app and an example for engagement-related tasks could be the number of users that an app has.


    We can gather quantitative data from using questionnaires. They are an easy way to gather a large number of data and as they are, or should be, standardized and validated, they can be analyzed easier than qualitative data. Specifically, by asking closed questions, which means that we define the answer options, we can interpret the data without much further processing. But, as you can imagine, there are not only good things about questionnaires.

    That said, the questionnaire should be standardized and validated, because if this is not the case, we need experts to interpret the outcomes. And even if a questionnaire meets the highest scientific standards, we can never be certain that the study's participants gave honest answers. If we do not have the possibility to use a standardized questionnaire to find answers to our questions, we can design our own questionnaire. But there are a few things to consider.


    We should avoid suggestive questions, otherwise we might influence the users. We should also try to ask questions which are not too ambiguous by including units for response choices regarding frequencies. Furthermore, answer choices should contain all relevant options or a choice called, for example, “other”. Another factor to keep in mind are the response biases. They can occur depending on demographic or wording of questions. Some of the most common biases include the tendency to say “yes” or to say “no” or the tendency to give extreme or rather neutral answers. Another bias is that study participants could deny undesirable traits and behavior when giving answers. Also, the order of the questions could affect the answers in various ways.

    Some ways to minimize those biases are: avoid using emotionally charged words, assure anonymity of the participants, prevent participants from becoming bored by not making questioners too long, show progress tracker or make the question or questions formats rich in variety.


    In summary, as you can see, designing your own questionnaire is rather cumbersome. That is why we suggest that you use standard questionnaires, if possible. For different constructs we want to investigate we can use different questionnaires. Here, you can find a few examples. We suggest that you take a look into the different measures, so you get an overview and an idea which questionnaires you could use for evaluating your prototype in this course.


    Thanks for the attention, see you next time!

Experimental requirements: Hypotheses, test cases and metrics

Read chapter six ("Usability Testing") in Nielsen's Usability Engineering (1993). Prepare requirements for a quantitative and summative evaluation.

Hypotheses

State goals and derive hypotheses that you want to test (e.g. "completing task x with the prototype is faster than with off-the-shelf software", "the prototype achieves at least a rating of x", "the performance for fulfilling the task will be x %" etc.). The hypotheses should be in line with your functional and usability requirements. A good introduction on creating theories and deriving hypotheses is given in Field (2009).


You must be able to explain, why you chose a certain hypothesis (including any predefined values). The easiest way to do so is by referencing relevant literature.

Test cases

Derive the test cases (tasks), from your use cases. Subjects will need to fulfill the test cases in order to reject/prove your hypotheses (some hints on test scenarios are given on usability.gov).

Metrics

Nielsen's (2012) post on User Satisfaction vs. Performance Metrics and refer to "Performance Measurement" (Nielsen, 1993, chapter 6.7). Define objective measures (obtained by logging parameters) such as reaction time, time to perform a task, number of errors, number of interaction steps, etc. Additionally, define subjective measures. Design or use an existing short questionnaire that uses scales to analyze the product (a list of questionnaires can be found on usabilitynet.org, another list of usability and user experience surveys is available from unige.ch, further questionnaires can be found on measuringu.com).

Experimental design: Plan and prepare a quantitative (summative) evaluation

Please operationalize your hypothesis using the metrics. Define quantitative criteria for assessing your prototype subjectively and objectively.


  • Prepare the experimental setup for your summative evaluation.

  • Design the experimental procedure.

  • Setup the data acquisition (direct in-app logging, observation, record, ...).

  • Test the setup, procedure and data acquisition.

Further reading

Usability Engineering Chapter 6: Usability Testing by Jakob Nielsen, book chapter

Discovering statistics using IBM SPSS statistics by Andy Field, book

Mobile Device Testing by usability.gov, online article

User Satisfaction vs. Performance Metrics by Jakob Nielsen, online article

Questionnaires

Usability and user experience surveys by edutech wiki, website

Various UX questionnaires by MeasuringU, website

SUS by Brooke (1996)

ISO 9241-10 by Prümper (1997)

AttrakDiff 2 by Hassenzahl, Burmester & Koller (2003)

UEQ by Schrepp, Hinderks & Thomaschewski (2017)

Emotrack by Garcia & Hammond (2016)

LEMtool by Huisman et al. (2013)

Task 19

Summarize hypotheses, test cases and metrics in a brief report in the folder iteration-4/preparation on the GitHub master branch.


Upload a photo of the setup, as well as a brief listing of the quantitative criteria and the procedure to a folder iteration-4/preparation on the GitHub master branch.