This study aims to in vestigate the performance of test equating methods extended to mixedformat tests within the framework o f item response theory irt. Observed score equating for mixedformat tests using a simplestructure multidimensional irt framework. Simplestructure multidimensional item response theory. Mixedformat tests often are considered to be superior to tests containing only mc items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under the commonitem nonequivalent groups design cineg. One challenge for mixed format test equating using irt methods with cineg design is how to extend traditional irt equating procedures that were originally developed for single format tests to those appropriate for mixed format tests. The purpose of this study was to examine the impact of dimensionality, commonitem set format, and different scale linking methods on preserving equity property with mixedformat test equating. Frontiers practical consequences of item response theory. Mixedformat tests, containing both multiplechoice and freeresponse items, are widely and increasingly used in many largescale testing programs kolen. The r package plink has been developed to facilitate the linking of mixedformat tests for multiple groups under a common item design using unidimensional and multidimensional irt.
The flexmirt irt software package fits a variety of unidimensional and multidimensional item response theory models also known as item factor analysis models to singlelevel and multilevel data in any number of groups. Combinations of different item formats often allow for the measurement of a broader set of skills than the use of a single format. Equating mixedformat tests with format representative and nonrepresentative common items, in mixedformat tests. Irt scale linking methods for mixedformat tests act research report 2004 5. As stated in the preface of the first volume, beginning in 2007 and continuing through 2011, with funding from the college board. The impact of test dimensionality, commonitem set format. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and a new section on scores for mixed format tests. Observed score equating for mixedformat tests using. One challenge for mixedformat test equating using irt methods with cineg design is how to extend traditional irt equating procedures that were originally developed for singleformat tests to those appropriate for mixedformat tests. Data were simulated according to a twodimensional noncompensatory irt model for both equivalent and nonequivalent groups designs. In the present study, such a test is referred to as a mixedformat test. Equating for longterm scale maintenance of mixed format.
Psychometric properties of raw and scale scores on mixed. This function conducts separate calibration of unidimensional or multidimensional irt single format or mixed format item parameters for multiple groups. Combinations of different item formats often allow for the measurement of a broader set of skills. Few, if any, studies to date have been conducted on the focus of the test level misfit with mixedformat test data, which is a typical case in operational assessment programs nowadays. This paper compares three methods of item calibrationconcurrent calibration, separate calibration with linking, and fixed item parameter calibration that are frequently used for linking item parameters to a base scale.
This function conducts irt true score and observed score equating for unidimensional singleformat or mixedformat item parameters for two or more groups. Common item nonequivalent groups equating design was used, hi this study. Equating of mixedformat tests under a cineg design can be influenced by factors such as attributes of the test, the commonitem set, and examinees. Simple interface to the estimation and plotting of irt models. Several methods have been developed to conduct equating. Pdf effect of noncompensatory multidimensionality on. Effects of test dimensionality and commonitem sets.
The package also includes functions for importing item andor ability parameters from common irt software, conducting irt true score and observed score equating, and plotting item response curvessurfaces, vector plots, information plots, and comparison plots for examining parameter drift. This function conducts separate calibration of unidimensional or multidimensional irt singleformat or mixedformat item parameters for multiple groups. In the present study, such a test is referred to as a mixed format test. Effects of test dimensionality and commonitem sets by yi cao dissertation submitted to the faculty of the graduate school of the university of maryland, college park, in partial fulfillment of the requirements for the degree of doctor of philosophy 2008 advisory committee. Examining two strategies to link mixedformat tests using.
This function conducts irt true score and observed score equating for unidimensional single format or mixed format item parameters for two or more groups. An alternative to the trend scoring method for adjusting scoring shifts in mixedformat tests test equating with constructed response items and mixedformat tests. It also includes functions for importing item andor ability parameters from common irt software, conducting irt truescore and observedscore equating, and. Mixedformat tests containing both multiplechoice mc items and constructedresponse cr items are used in many testing programs.
The book is appealing to anyone interested in the topic of equating, scaling, and linking. Mixedformat tests often are considered to be superior to tests containing only mc items although the use of multiple item formats leads to measurement challenges in the context of equating. Evaluating equating properties for mixedformat tests. The computer programs listed below can be used to conduct many of the equating analyses described in kolen and brennan 2004.
Irt scale linking methods for mixedformat tests act research report 20045. Mixedformat test equating drum university of maryland. The traditional linking method often applied to linking test forms. Test equating methods are used with many standardized tests in education and psychology to ensure that scores from multiple test forms can be used interchangeably. Linking mixedformat tests using irtbased methods in r this became possible. Mixedformat tests university of iowa college of education. Provides a simple common interface to the estimation of item parameters in irt models for binary responses with three different programs icl, bilogmg, and ltm, and a variety of functions useful with irt models. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. Paper presented at annual meeting of the national council on measurement in education, national council on measurement in education. Contrast, corrb, covb, diffs, estimates, invcovb, lsmeans, slices, solutionf, tests1tests3.
Items in the both forms such as a and b form of a test are referred as anchorcommon items. Thus, a demand for a computer program that is more generalized and powerful for various uses in research and test development has grown in the field, and as a result, a window application. Item response theory irt truescore equating tse and irt observedscore equating ose methods were used under commonitem nonequivalent groups. Evaluating equating properties for mixedformat tests by yi he an abstract of a thesis submitted in partial fulfillment of the requirements for the doctor of philosophy degree in psychological and quantitative foundations educational measurement and statistics in the graduate college of the university of iowa may 2011. Bifactor mirt observedscore equating for mixedformat tests. Using data simulated with empirical parameters from a statewide testing program, baldwin and baldwin studied the effect of anchor test length on the recovery of item parameters and increase in ability across four administrations in a mixed. Equating of mixed format tests under a cineg design can be influenced by factors such as attributes of the test, the commonitem set, and examinees. As mentioned above, irt equating procedures have been well developed for singleformat tests. Test equating secures the comparability of test scores across different test administrationsforms. Application that implements irt scaling and equating computer program. Chapter 3 is also a simulation study that compares the equating of mixedformat tests using commonitem sets that contain solely of mc items to commonitem sets that contain both mc and fr items. This option is particularly useful when a mixed format test form is to be simulated. Practical consequences of item response theory model. A mixed format test is a test containing a mixture of different item formats e.
Irt scale linking methods for mixedformat tests1 introduction a test containing a mixture of different item formats is often used in both classroom and largescale assessments. This study aims to in vestigate the performance of test equating methods extended to mixed format tests within the framework o f item response theory irt. Test score equating is used to compare different test scores from different test forms. In addition to statistical procedures, successful equating, scaling and linking involves many aspects of testing, including procedures to develop tests, to administer and score tests and to interpret. Mixedformat tests containing both multiplechoice mc items and constructedresponse cr items are now widely used in many testing programs. The proposed method is a modification of the traditional commonitem nonequivalent groups design. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and. A comparison of irt observed score kernel equating and. The unidimensional methods include the meanmean, meansigma, haebara, and stockinglord methods.
This book provides an introduction to test equating, scaling and linking, including those concepts and practical issues that are critical for developers and all other testing professionals. When you specify the empirical option, proc mixed adjusts all standard errors and test statistics involving the fixedeffects parameters. An item response theorybased equating method is proposed for the longterm scale maintenance of a mixed format test consisting of constructed response items and multiple choice items. The impact of equating method and format representation of. Concurrent and separate calibrations were implemented using bilogmg. Irt scale linking methods for mixed format tests1 introduction a test containing a mixture of different item formats is often used in both classroom and largescale assessments.
Computer programs college of education university of iowa. As mentioned above, irt equating procedures have been well developed for single format tests. Methods and practices is a welcome update to a book which has become a classic in equating and linking. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and a new section on scores for mixedformat tests. Windows software that generates irt parameters and. This study provides new evidence on the performance of different irt models in equating tests. Investigating different item response models in equating the. In almost all highstakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can. A mixedformat test is a test containing a mixture of different item formats e.
In this article, the results of a simulation study comparing the performance of separate and concurrent estimation of a unidimensional item response theory irt model applied to multidimensional noncompensatory data are reported. Windows pc console and graphical user interface gui versions and macintosh os9 console and os10 gui versions are available for at least some of the. Irteq windows application that implements irt scaling and. Comparison of test equating methods based on item response. The purpose of this study was to examine the impact of dimensionality, commonitem set format, and different scale linking methods on preserving equity property with mixed format test equating.
This paper illustrates that the psychometric properties of scores and scales that are used with mixed. The r package plink has been developed to facilitate the linking of mixedformat tests for multiple groups under a common item design using unidimensional and multidimensional irtbased methods. Data sets from this book are included with some of the programs. The new edition of test equating, scaling, and linking. This introduction to the r package plink is a slightly modified version of weeks 2010, published in the journal of statistical software. In the third edition, each chapter contains a reference list, rather than having a single reference list at the end of the volume. An r package for linking mixedformat tests using irt. The noncommercial software r is used throughout the book to illustrate how to perform different equating methods when scores data are collected under different data collection designs, such as equivalent groups design, single group design, counterbalanced design and non equivalent. The use of multiple formats presents a number of measurement challenges, one of which is how to adequately. Chapter 5 examines the influence of irt calibration programs on irt equating results for mixedformat tests using some of the same data sets used in chapter 4. Document resume li, yuan h lissitz, robert w yang, yu. Estimating irt equating coefficients for mixed format tests. Document resume li, yuan h lissitz, robert w yang, yu nu.
For practitioners, the book provides a splendid introduction to the topics considered. The program adopts a matrixsample external anchor equating design and employs mixedformat test data which contain dichotomously scored. A comparison of equatinglinking using the stockinglord method and concurrent calibration with mixedformat tests in the nonequivalent groups commonitem design under irt. Estimating irt equating coefficients for mixedformat tests. Laboratory of psychometric and evaluative research report406. Evaluating equating properties for mixed format tests by yi he an abstract of a thesis submitted in partial fulfillment of the requirements for the doctor of philosophy degree in psychological and quantitative foundations educational measurement and statistics in the graduate college of the university of iowa may 2011. For example, available software cannot handle all the popular irt models being applied to test data, and cannot handle some of the popular equating designs. This function supports all item response models available in plink with the exception of the multiplechoice model. One common equating design used in linking or equating tests from year to year is item response theory irt scaling using a nonequivalent, common item equating design.
Today, irtbased linking is the most commonly used approach for developing vertical scales, and it is being used increasingly for equating particularly in the development of calibrated item banks. An evaluation of linking methods in the presence of year to. Linking mixed format tests using irtbased methods in r this became possible. There has beea steady increase in the n use of mixedformat tests, that is, tests. In this design, there are two different group tests a, b. Perhaps most often, equating occurs in the context of the nonequivalent groups with anchor test neat design, in which a set. Few, if any, studies to date have been conducted on the focus of the test level misfit with mixed format test data, which is a typical case in operational assessment programs nowadays. The r package plink has been developed to facilitate the linking of mixed format tests for multiple groups under a common item design using unidimensional and multidimensional irtbased methods. The formats of items in a mixedformat test are usually categorized into two classes. A comparison of equating linking using the stockinglord method and concurrent calibration with mixed format tests in the nonequivalent groups commonitem design under irt. Jun 01, 2011 this paper illustrates that the psychometric properties of scores and scales that are used with mixed. One of these issues in linking mixed item format tests is score comparability across test administration years.
Test scaling is the process of developing score scales that are used when scores on standardized tests are reported. Misfit in the context of test equating with mixedformat test data. With the help of irteq han, 2007 test equating program, equation equity. The stocking and lord 1983 characteristic curve method of parameter linking was used in conjunction. Comparison of irt linking and equating methods with mixedformat tests.
Investigation of irt parameter recovery and classification. Comparison of item response theory test equating methods for. Psychometric properties with a primary focus on equating volume 1. The results from the study conducted by donoghue 1994 indicated that, on average. The effect of mini and midi anchor tests on test equating.
Practical consequences of item response theory model misfit. Observed score equating for mixed format tests using a simplestructure multidimensional irt framework. Psychometric properties with a primary focus on equating vol. Center for advanced studies in measurement and assessment, university of.
1568 489 648 964 1224 236 1516 747 612 456 1305 1453 965 276 127 1592 732 266 166 720 1614 1290 798 599 1224 49 292 1405 1073 190 162 336 734 303 223 557 1447