Poster session: Thursday, February 4, 12:30 pm-13:30pm (PST)
Title: Rasch modelling of learning patterns: getting more from repeated measure
Author(s): Denis Federiakin
Walstad and Wagner (2016) described an elegant approach for disaggregation of value-added test scores for assessing learning outcomes. Their idea is based on comparing dichotomous item scores in pre- and post- test to capture one of four learning patterns: zero learning (incorrect-incorrect), positive learning (incorrect-correct), negative learning (correct-incorrect), and retained learning (correct-correct). However, their analysis was grounded in Classical Test Theory, which (i) requires administration of the same item set in pre- and post-test, and (ii) severely limits possibilities for skill-wise analysis instead of item-wise or test-wise analysis. Moreover, identification of learning patterns requires comparability of items across pre- and post- test, which is not taken into account in literature to date. In this presentation, we illustrate on simulated datasets the advantages of Item Response Theory (IRT) modelling of learning patterns. Particularly, we use multidimensional Rasch model and Random Weights LLTM (Rijmen & De Boeck, 2002) to model test-wise and skill-wise learning patterns respectively. We discuss interpretation of the proposed modelling setup, as well as possibilities and disadvantages of IRTree (De Boeck & Partchev, 2012) family in modelling learning patterns. We also discuss how IRT-setup can help in identification of learning patterns across different but equitable pre- and post- tests.
Title: Using generalizability & Rasch Measurement Theory to Ensure Rigorous Measurement in an International Development Education Evaluation
Author(s): Louise Marie Bahry
Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) ) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of the Congo. These approaches allow the researcher to identify and select pertinent facets and look at them in relation to one another, allowing us to attribute smaller or larger sources of variability to a particular facet, and using both provides additional insight to instrument development and validation efforts.
Title: Measuring Teacher Competency in Error Analysis: Instrument Development
Author(s): Rebecca McNeil
Preparing teachers for effective mathematics instruction is vital to producing high-quality teachers and improving student learning, and error analysis is a pedagogical process that proves instrumental to making such improvements (McGuire, 2003, Morris, et al., 2009). However, there is a lack of evidence that teachers are skilled in performing systematic error analysis, with pre-service teachers holding many of the same misconceptions that students hold, in addition to a tendency to focus on correcting factual errors, as opposed to conceptual or procedural knowledge (Riccomini, 2005; Ryan & McCrae, 2005; Woodward, Baxter, & Robinson, 1999). As there are few measures to date that focus on mathematics teachers’ competency in error analysis (TCEA), this study aims to develop an instrument for assessing this particular construct. The development of this instrument follows the four building blocks approach (Wilson, 2005) and is informed by multiple error analysis frameworks (Lannin, Townsend, and Barker, 2006; McGuire, 2003). This research is conducted with the intent of offering a valid and reliable instrument with uses both as a diagnostic assessment and as a learning tool for the training and professional development of pre-service and novice teachers; thus, empirical results of a pilot test of the instrument are also discussed.
Title: The DRDP Large Scale Assessment: A Case Study
Author(s): Joshua Sussman
This talk will explore issues related to the ongoing construction and implementation of the DRDP assessment. The DRDP is the current, 3rd generation, version. It is an observational assessment that infant/toddler, preschool, and kindergarten teachers use to assess about 300,000 children yearly in five U.S. states. This presentation describes the DRDP as a large scale, state-supported assessment of early childhood development from a test developer’s (psychometrician’s) perspective. The assessment is assessment is described then three issues related to DRDP implementation will be described: organizational issues, technical issues, and political issues. The discussion will reflect ongoing psychometric development and the use of the DRDP results for both formative and summative assessment. Exploring certain achievements and lessons learned may offer general insights about test construction and use in educational settings.
Title: A Systematic Review of Teachers’ Attributions Toward Students with Disabilities: Integrating the Social and Medical Model of Disability with Attribution Theory
Author(s): Corrine Aramburo; Renee Starowicz
The current review provides an overview of 15 published articles on special and general educators causal attributions for a student with a disability’s academic success or failure. The social and medical model of disability was seen as the underlying orientation for much of a teacher’s attribution. This paper then argues that an integration of attribution theory and the social and medical model would yield a more comprehensive understanding of how teachers’ perceive the academic success or failure of students with intellectual disabilities. The development of a construct map using teacher discourse excerpts from the literature review provided an exploration of a new conceptualization of the social-medical model of disability. The construct map integrates the social-medical model of disability, orienting it as an ordinal, unidimensional model that considers how teachers attribute success or failure to a student with a disability via four teaching values and practices: academic expectations for a child with a disability, responsibility, relationship, and skills or pedagogical practices. The implications of this model and future research are discussed.
Title: Evaluation and redesign of multistage adaptive testing in an English language test
Author(s): Kyoungwon Bishop; Daeryong Seo; Sakine, Gocer-Sahin
Evaluation of current language MST design for ACCESS to advance a new MST design will not only help understand WIDA’s test design but also offer practical ways to identify limitations existing in other MST designs. The current study will also help discover how MST can maximize the advantages of CAT.
Development of a Measure for Self-Directed Reasoning from Evidence
Author(s): Allison Bradford
This poster summarizes work in progress at developing an instrument to evaluate students’ ability to reason with evidence in a self-directed manner, which includes students’ ability to organize evidence, critically evaluate resources, and integrate ideas to develop a conclusion. This paper shares the initial construct conception and instrument development process. Further, results from a pilot study where 50 6th grade sciences students engaged with the instrument after an online inquiry science unit are included. An initial set of 30 items were tested, but 13 were removed from analysis for poor performance. The current instrument returned an EAP reliability of 0.851 and a WLE reliability of 0.846, indicating relatively good separation. Other evidence for the reliability and validity are discussed. The implications for construct revision, continued item development and instrument use are considered.
Title: Reliability and Validity Evidence of the Humor as an Acceptingly Helpful Attitude Measure (HAHA) Using Item Response Theory
Author(s): Bunyong Dejanipont; Weeraphat Suksiri; Mark Wilson
Improved mental health outcomes, lowered stress levels, and positive social interactions are some of the positive effects of humor training. Particularly, perceptions of a potentially stressful situation or negative feelings can be changed and regulated by humorously framing the situation. Therefore, this humorous attitude is an important and useful mindset; however, many instruments that intended to measure humorous attitudes have not addressed their limitations, such as using subjective response options and overly simplifying definitions of humor.
We used the item responses theory to conceptualize Humor as an Acceptingly Helpful Attitude (HAHA) construct, along with its construct map, and create HAHA Measure to measure good-natured and humorous attitude that accepts imperfections and incompetence of self and others, while seeing some humorous aspect and feeling the need to alleviate a personal and interpersonal stressor from the imperfections or incompetence. Theoretically, the HAHA construct map has four levels: Not engaging (the lowest level), Lacking, Developing, and Having HAHA (the highest level). Those who have such good-natured and humorous attitude are characterized as being at the highest level of the construct. Notably, the HAHA Measure is the first measure that designed to measure helpful humor of respondents toward persons (whether self or others) who are most likely to be affected by an imperfection-related stressor; concurrently, the measure was also designed to consider who are there with the respondents during the potentially stressful situation and whether it is happening at the moment or in the past.
We developed 28 situational judgment items (in English) in selected-response format according to the HAHA construct levels and information from focus groups, interviews, and direct observations. The items were administered to students (n=65) who were fluent in English and enrolled at a university level in the United States and Thailand. Data were analyzed using Rasch measurement theory through the UC Berkeley BASS system.
Results support the HAHA Measure’s evidence of reliability and validity. Separation reliability coefficient (r=.90) supports internal consistency evidence, as the measure explains about 90% of variance accounted for by each estimated respondent humorous attitude. Alternate forms reliability coefficient with Spearman-Brown adjustment (r=.88) supports the consistency of the measure’s items across different circumstances. Content validity evidence was established based on the (a) definition of HAHA and its construct map, (b) item design, and (c) outcome space. Twenty-one participants participated in exit interviews and generally reported no confusion understanding the situational prompts and items, supporting process validity evidence. Based on results in Wright map, we found that the items generally discriminate respondent humorous attitudes into the four construct map levels—as in the theoretical construct map—with an increment of .5 (in logit) item difficulty. Results of the comparison of the expected and estimated item difficulty indicate that each estimated item difficulty of an item increased in accordance with the expected item difficulty order. Therefore, the results from the Wright map and comparison support internal structure validity evidence at the instrument level. In general, the items were consistent with the overall measure, as the majority of estimated mean respondent location increased as the number of score categories increased. Results of item fits support the use of partial credit Rasch model to analyze these data, as all estimated infits for item step locations for the HAHA data were fitting within acceptable bounds. Consequently, the results support internal structure validity evidence at the item level. Results of DIF support evidence that the items of the HAHA Measure have no DIF in relation to gender.
Based on the psychometric properties of the HAHA Measure, the measure is potentially useful for humor research to better understand the humorous attitude. Nonetheless, further studies are needed to revise the measure and draw a stronger conclusion about persons’ humorous attitude.
Title: Analysis of the Intensive Parenting Questionnaire Using Rasch Modeling
Author(s): Courtney Donovan; Shani O'Brien; Lisa Forbes; Margaret Lamar
This report describes the analysis of the Intensive Parenting Attitudes Questionnaire (IPAQ) using Rasch modeling via the Winsteps software (Linacre, 2019). The IPAQ was developed in 2013 and subsequently modified and revalidated in 2017 using Classical Test Theory techniques (Liss, Schiffrin, Mackintosh, Miles-McLean, & Erchull, 2013; Loyal, Dallay & Rascle, 2017). The current sample includes 525 mothers responding from across the United States of America. Although originally demonstrating five factors (Liss et al., 2013), our data supported three factors: Rewarding Parenting (the fulfilling aspects of parenting and the importance of intellectual stimulation, education and play), Motherhood (mothers as better parents than fathers), and Challenges (the drawbacks of parenting). Three Rasch models are presented. We recommend a modified 4 point scale and note DIF on all but one item.
Title: On the Axiom of Local Independence
Author(s): Ernesto San Martin
In this talk, I intend to discuss three topics related to the Axiom of Local Independence:
1. Rasch model is specified under a fixed-effect set-up. We show that the Axiom of Local Independence is not part of the model specification.
2. Local independence makes sense in Lord’s approach to IRT models. Nevertheless, the question is about its meaning: we show that local independence is an identification restriction leading to ensure that empirical Bayes representation of the latent variable is meaningful provided the axiom is written in a minimal form. This leads us to consider measurement from a geometric perspective (Greek tradition), not from an arithmetic perspective (Arabic tradition).
3. Finally, we explore what happens if local independence is not written in a minimal form and, therefore, we look for partial identification results.
Title: Exploring the Psychometric Properties of the Self-Efficacy for High School Students
Author(s): Yuan Ge; Stefanie Wind
In previous studies, researchers have focused on the development and interpretation of measurement tools related to self-efficacy. However, researchers have seldom investigated whether these instruments demonstrate acceptable psychometric properties, including similar item interpretations between subgroups of respondents. The purpose of this study is to explore the extent to which a self-efficacy measure has a consistent interpretation for two self-reported gender subgroups. The researcher utilized Rasch analysis to offer guidance to the design of self-efficacy related surveys and questionnaires. Results suggested gender difference was detected in certain self-efficacy items. Furthermore, suggestions are also provided for instruction and enhancing self-efficacy for future students.
Keyword: Self-efficacy, Gender difference, Rasch measurement; differential item functioning
Title: A Hierarchy of Construct Theories: Their Focus and Manifestations
Author(s): Jeanette Melin; William Fisher; Leslie Pendrill
Explanatory and predictive construct theories enable more fit-for-purpose, better targeted, and better administered measures. The significance of extending and adapting traditional metrological concepts and methods from the physical sciences towards social, psychological and health measurements is a growing topic of current focus in the scientific literature. Construct specification equations (CSEs) provide the highest level of construct theory in social, psychological and health measurements and resemble ‘recipes for certified reference materials’ for traceability in chemistry. In this work we elaborate on construct theories for both item attributes and person characteristics as means of developing qualitative, ordinal, and confirmatory theories hand in hand with a quantitative theory, en route to experimentally validated unit standards.
Title: A Simulation Study: Comparing FDR Correction Methods in Using Rasch Trees Modeling for DIF Detection
Author(s): Chunling Niu; Michael Toland; David Duber; Nan Li
Compared to other measurement invariance tests within the Rasch framework, Rasch trees (RT), based on model-based recursive partitioning, can detect DIF resulting from multiple covariates among the non-pre-specified groups. However, since the recursive partitioning steps involves multiple testing with multiple covariates, inflation of Type I errors needs to be controlled by adjusting the raw p-values. Presently, Bonferroni correction has been used as the default method; yet previous simulation studies show that it can be unnecessarily conservative. Thus this simulation study attempts to examine the comparative effects of using six FDR p-value correction methods on the performance (i.e., Type I error, Type II error, and power) of RT modeling in detecting DIF simultaneously with multiple covariates. Preliminary results show the Benjamini & Hochberg method demonstrates low Type I error rate, the lowest Type II error rate, and the highest power in RT DIF detection.
Title: Explanatory Item Response Modeling of a Reading Comprehension Assessment
Author(s): Sunhi Park
This research intended to examine the association of the textual and person factors with the item difficulty and different responses on a reading comprehension assessment, the Multiple-choice Online Causal Coherence Assessment (MOCCA). Under explanatory item response modeling, the linear logistic test model was applied to examining the effects of textual factors on item difficulty of MOCCA, finding the text length, word knowledge, background knowledge, and goal identification were significant predictors of item difficulty. The latent regression analysis was applied to examining the effects of person characteristics, finding socioeconomic status, special education status, and EL status having significant relationship with the responses of MOCCA.
DIF analysis was conducted to detect if there were any items functioning differently between ELs and non-ELs on the responses of MOCCA. Twelve items were flagged as showing DIF, and explored by means of the textual features used as predictors in the linear logistic test modeling.
Title: An Investigation of the Combined Effects of Parent Involvement, Language Use and SES in Predicting HS Graduation for ELL Students
Author(s): Nathaniel Shannon; Qingzhou Shi; Honorine Ntoh Yuh
In the United States, students identified as English Language Learners (ELLs) are increasingly falling behind in their likelihood to graduate from high school (HS). These students, typically non-native English speakers, are placed in remedial classes that are generally focused on improving language skills rather than learning standard grade-level material. This results in decreased learning objectives and graduation rates for ELLs. While there is ample work investigating the role of socioeconomic status (SES) in predicting HS graduation among ELLs, there is less research investigating the role of parent involvement and students’ use of language at home and in school. Using binary logistic regression, we will investigate the predictive power of parental involvement, language use at home (English/non-English), the type of school (public/private), and SES in predicting HS graduation among ELLs. The results of this study have implications for determining how to most efficiently allocate resources to the ELL population.
Title: Network Analysis of Didactic Examination Outcomes: Interrelationships of Accuracy, Response Time, Pace, and Fluency (Speed-Accuracy Tradeoff)
Author(s): James Thompson
With the advent of computerized testing, ordinary didactic exams can capture both answer accuracy and answer response time. From this raw data, person ability, person speed, question difficulty, question time intensity, pace, fluency (speed-accuracy tradeoff), person skill for fluency, and question load for fluency can be derived. It would be useful to have a comprehensive view of the interrelationships of these observables. This proposal suggests that pairwise partial correlations between the observables can be considered to form a network. This network was predictive of the constituent variables at both the population and person levels. Interestingly, conventional person variables were not required for these predictions. Question difficulty was important to both global network strength and global structure impact. Question time intensity was also important in global strength while fluency was a major contributor to structure impact.
Title: Multidimensional Rasch Analysis for Validating a Measure of Mathematical Proficiency through Digital Technology
Author(s): Putcharee Junpeng; Metta Marwiang; Samruan Chinjunthuk; Prapawadee Suwannatrai; Nuchwana Luanganggoon; Kanokporn Chanayota; Jenrop Krotha; Keow Ngang Tang; Mark Wilson
This study was aimed to validate a measure of mathematical proficiency (MP) in the Number and Algebra strand of 1,504 Thai seventh-grade students through digital technology. A construct modeling approach and design-based research method were adopted to create a tool which consists of four components, namely register system, input data, process system, and diagnostic feedback report. Researchers employed a multidimensional approach, an extension of the Rasch model to measure its quality. The MRCMLM was used to examine the internal structure based on the comparison of model fit to ensure that the MP measure in two dimensions is fit better than one dimension. A Wright map was used to support the validation tool. The low standard error of measurement and the acceptable values of infit and outfit means would determine whether digital technology has accuracy, consistency, and stability to measure in the multiple proficiencies.