29 July 2017
Evaluation of Test Quality and Constructing Performance-Based Assessments
Part 1: Evaluation of Test Quality (Bas Hemker)
In order to function well, tests need to be of good quality. In this course many factors of the evaluation of tests are discussed. First, an overview of evaluation systems is given, showing their similarities and differences. Secondly, important indicators of test quality are reviewed that can be found in most, if not all, review systems. These include quality of the norms, reliability and validity. Finally we focus in more detail on topics related to the investigation and evaluation of test quality in practice.
The use of quality criteria of test is important is at least two ways. First they are suitable to guide test development in order to obtain the best possible measurement instruments. Secondly, they can be used as an instrument for internal and external audits.
There are a considerable number of tools to assess the quality of tests. They can be distinguished as guidelines, standards, reviews and evaluation systems. Special interest will be given to a number of systems that have a sizeable Dutch influence. Many of the tools to assess the quality of tests consider the same kinds to threats to the quality of the test or test use. In some cases these are presented as criteria on which the tests can be evaluated, such as in the Dutch COTAN system. In other cases the evaluation is structured around key issues such as the American AERA/APA/NCME (2014) Standards do (foundations, operations, applications). Also the quality of tests can be viewed completely from a validity standpoint.
In the last section of the course, many practical issues involving the practical evaluation of the test quality are discussed. For example, when are we evaluating the equivalence of paper and pencil tests with computer based tests, what research would that require? What kind of norms are to be considered relevant for what goals? How do evaluate DIF? When is DIF a problem and does the lack of DIF means there are no issues that can be related to DIF? What measures of reliability can we use, and when do we need to use what measure? What are relevant alternatives to single measures of reliability? How do we evaluate construct validity? What are validity arguments?
Part 2: Constructing Performance-Based Assessments (Carol Myford)
The goal of this training is to help participants become competent in devising and scoring performance-based assessments. These are assessments in which individuals demonstrate their abilities to use their knowledge and skills by creating a product, carrying out a process, or engaging in a performance. Examples of processes (or performances) might include giving oral presentations, carrying out a complex procedure, conducting an experiment, taking part in a debate, holding an interview, engaging in a dialogue, participating in an interactive simulation, repairing a piece of equipment, performing on an instrument or in a play, and so on. Examples of products might include lab reports, posters, videos, spreadsheets, term papers, audio recordings, drawings, models, brochures, business plans, and so on.
Participants will learn how to design the specifications for a performance-based task, discuss different types of performance-based assessments, their strengths and limitations. They will also identify and define the criteria that they want to use to evaluate performance. Participants will learn how to turn criteria into checklists and various types of rating scales (i.e., numerical, graphic, descriptive graphic) and rubrics (i.e., holistic, analytic, generic, task specific) that are useful for both formative (informal) and summative (formal) assessment purposes.
Part 1: Bas Hemker (Evaluation of Test Quality, https://ioe.hse.ru/monitoring/Hemker)
Part 2: Carol Myford (Constructing Performance-Based Assessments, http://education.uic.edu/content/carol-myford-phd)
People working with social measurements;
Master and Ph.D. students.
EUR 670: Includes all materials, meals, accommodation and transportation from Moscow and back.
For several participants who will be selected through competition the participation is free (including meals and accommodation). NRU HSE provides 10 grants covering 100% of participation fee. Additionally, 5 grants, covering 50% of fee.