A semiannual overview of recent research contributions in cognitive diagnosis, forced-choice formats, and methodological advances in psychology and psychometrics. Each paper is summarized in plain language with key insights highlighted for a broad academic and applied audience.


📚 2025 (second half) Publications Summary

🔹 Cognitive Diagnosis and Forced-Choice Models

🧾 Escudero, Vázquez-Lira, Leenen, & Sorrel (2026)

“Issues and possible solutions in cognitive diagnosis modeling applications: The case of a large-scale educational assessment in Mexico”
Annals of Psychology

This study examines the practical challenges that arise when applying cognitive diagnosis models (CDMs) to real data, using a large-scale educational assessment of high-school teachers in Mexico. By identifying common problems encountered in empirical applications, the authors propose methodological and applied solutions that integrate psychometric analyses with expert judgment.

Key points:

  • Applies CDMs to a real large-scale teacher assessment, addressing the gap between theory and empirical use.
  • Identifies five major issues (e.g., Q-matrix identifiability, misspecification, model fit ambiguity, dimensionality, latent class extremity).
  • Proposes practical solutions combining statistical tools in R with content-expert review.
  • Shows that mixed and higher-order CDMs can better capture the underlying structure of complex assessments.
  • Highlights the importance of strong theoretical foundations and careful test design for valid CDM applications.

🧾 Nájera, Abad, Chiu, & Sorrel (2026)

“Variable-length cognitive diagnostic computerized adaptive testing in small-scale assessments”
Journal of Educational and Behavioral Statistics

This study proposes and evaluates Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) procedures specifically designed for small-sample assessments, where traditional parametric CDM-based methods (e.g., DINA) tend to overfit and overestimate reliability.

Innovations:

  • Introduces R-DINA–based CD-CAT methods for small-sample settings, allowing variable-length adaptive testing.
  • Shows that R-GDI and R-NPS outperform traditional DINA-based CD-CAT in posterior probability recovery, classification accuracy, and item bank usage.
  • Finds that calibration-free approaches work well overall but may overestimate reliability with low-quality items; Bayes modal estimation reduces early stopping.
  • Provides practical guidance for formative assessment: start with calibration-free methods and move to calibrated R-DINA procedures as data accumulate.

🔹 Questionnaire Design and Response Bias

🧾 Graña, Kreitchmann, Sorrel, Garrido, & Abad (2026)

“Dimensionality assessment in forced-choice questionnaires: First steps toward an exploratory framework”
Educational and Psychological Measurement

This paper examines how to accurately assess dimensionality in forced-choice (FC) questionnaires, a key challenge due to ipsativity and the inherently multidimensional nature of FC blocks. Through a large Monte Carlo simulation, the authors evaluate common dimensionality detection methods under realistic FC design conditions.

Contributions:

  • Shows that Parallel Analysis (PA) and the Maximal Kaiser Criterion (MKC) outperform other methods in recovering the true number of dimensions in FC data.
  • Demonstrates that test design matters: including heteropolar or unidimensional blocks and increasing test length substantially improves dimensionality recovery.
  • Provides practical guidance for building an exploratory framework for FC questionnaires, complementing existing confirmatory IRT approaches.

🔹 Predictive Modeling in Psychology

🧾 Iglesias, Sorrel, & Olmos (2026)

“Evaluating the performance of R-Squared measures in multilevel models”
Multivariate Behavioral Research

This paper evaluates how different R² measures for multilevel models (MLMs) perform in finite samples, focusing on the integrative framework proposed by Rights and Sterba. Using extensive Monte Carlo simulations, it examines how sample size, model complexity, ICC, and estimation method (ML vs. REML) affect bias and accuracy.

Contributions:

  • Shows that R² estimates for level-2 effects are especially sensitive to the number of clusters and model complexity.
  • Demonstrates that unbiased parameter estimates do not guarantee unbiased MLM R² values.
  • Provides practical guidance on when MLM R² measures can be interpreted reliably in applied research.

📚 2025 (first half) Publications Summary

🔹 Cognitive Diagnosis and Forced-Choice Models

🧾 Nájera, Ma, Sorrel, & Abad (2025b)

“Assessing item-level fit for the sequential G-DINA model”
Behaviormetrika

This paper addresses a gap in diagnostic classification modeling: how to assess whether each item in a test fits the assumed model when responses are graded or sequential (e.g., multi-step open-ended tasks). The authors adapt three fit indices from classical test theory—the chi-squared statistic, likelihood-ratio statistic, and power-divergence index—to work with the Sequential G-DINA model.

Key points:

  • The model handles multi-category (polytomous) responses that depend on a latent sequence of cognitive steps.
  • Fit statistics are computed using posterior pseudo-counts and tested via parametric bootstrap.
  • Simulation results show the proposed methods are conservative but powerful when detecting major misspecifications.

🧾 Nájera, Kreitchmann, Escudero, Abad, de la Torre, & Sorrel (2025a)

“A general diagnostic modelling framework for forced-choice assessments”
British Journal of Mathematical and Statistical Psychology

This paper proposes an extension of cognitive diagnosis models to handle forced-choice (FC) formats, which are used to reduce response biases (e.g., social desirability). It adapts the G-DINA model to handle paired statements, allowing each to measure a different latent trait.

Innovations:

  • Provides a general model for binary forced-choice blocks that improves on Huang’s (2023) FC-DCM by allowing more flexible response patterns.
  • Accommodates heteropolar and homopolar blocks, enabling normative interpretation of traits.
  • Supports practical implementation with Q-matrix design guidelines, Bayesian estimation, and software integration via the GDINA R package.

🔹 Questionnaire Design and Response Bias

🧾 Graña, Kreitchmann, Abad, & Sorrel (2024)

“Equally vs. unequally keyed blocks in forced-choice questionnaires: Implications on validity and reliability”
Journal of Personality Assessment

This experimental study compares equally-keyed (homopolar) vs. unequally-keyed (heteropolar) blocks in FC questionnaires measuring the Big Five. Using IRT-based models (specifically MUPP-2PL), they assess how item keying direction impacts reliability, criterion validity, and ipsativity.

Findings:

  • No consistent psychometric advantage for heteropolar blocks.
  • Slight increases in reliability and validity in specific traits, but small overall effect sizes.
  • Practical difficulties in constructing heteropolar blocks with matched social desirability ratings.
  • Recommendations: prefer equally-keyed designs unless strong justification exists for heteropolar use.

🔹 Predictive Modeling in Psychology

🧾 Iglesias, Sorrel, & Olmos (2025)

“Cross-validation and predictive metrics in psychological research: Do not leave out the leave-one-out”
Behavior Research Methods

This methodological paper critiques standard practices for estimating predictive accuracy in regression models. It proposes a reformulated leave-one-out (LOO) cross-validation approach that computes the out-of-sample R² via a pooled error term (PRESS/MST), solving known biases in conventional CV implementations.

Contributions:

  • Shows that LOO offers more stable and less biased R² estimates than 5-fold or 10-fold CV, especially in small samples.
  • Implements methods in the R package OutR2.
  • Simulations and real data (Many Labs Replication Project) confirm the robustness of the approach.

🔗 Closing Thoughts

These works share a common goal: enhancing the precision, interpretability, and fairness of psychological measurement—whether by improving test models, detecting fit issues, or refining how predictions are evaluated. Each contribution balances rigorous methodology with clear application potential in education, clinical, and organizational contexts.

All articles available upon request or via journal links. Summaries using AI by M. A. Sorrel. For collaboration, contact .