Validity Directions

I’ve been spending more time focused on trying to figure out what problems neeed to be solved with validity. If we don’t know the problems, we aren’t going to work towards the solution and may not even recognize it.

To the best of my sense right now, there are 3 fundamental problems in validity detection efforts.

First, we have no agreed upon effect metric interpretation. To determine when a theorized domain of over reporting is more or less associated with a given scale, we seem to mostly eyeball it. For instance, the RBS scale is theoretically more associated with cognitive complaints than psychological complaints. When we compare Cohen’s D or hedges G effect sizes between RBS and F/Fp/etc, however, there are no strict guidelines when that effect is a sufficiently different to the degree required to support that hypothesis. It could be some Fischer transformation, but then I’m not sure if we should expect medium effect differences in all fairness, so what is reasonable?

Second, we do not understand the mechanisms of variation, which explain why some experimentally designed feigning studies produce vastly different results than other experimental failure groups (e.g., Morris et al., 2021; Whitman et al., 2020). This problem in variation suggests that we do not understand enough about the motivational and violence factors underlying performance, even if the effects do not generalize. In experimental conditions, which supports are difficulty in making complicated conclusions about the same sorts of Valence and motivational factors, which are present in different ways within clinical samples, where experimental studies should generalize.  Thus, the whole validity = square root of reliability issue underlies, at least potentially, some validity issues in ecologically valid samples.

Third, we have yet to innovate all possible iterations of validity detection strategies, or to integrate them together, even amongst those that we already know exist (see Rogers’ work and description of various techniques). Lots of growth in this area over the last 10 years or so, but there is a lot yet to be considered. Studies are overly simplistic and haven’t evolved in design sophistication, which is likely a partial problem.

If we do not understand why, and have not assessed how very well, when it comes to the issue of “why do scores on validity scales change”. We remain in the infancy of validity detection.  We must solve first factor elevation issues, prove (if applicable) our theory of response styles, and deal with the cyclical relationship between stress and invalidity. 

Obolsky, M. A., Resch, Z. J., Fellin, T. J., Cerny, B. M., Khan, H., Bing-Canar, H., … & Soble, J. R. (2023). Concordance of performance and symptom validity tests within an electrical injury sample. Psychological Injury and Law, 16(1), 73-82.

This recent article by Obolsky et al highlights my concerns. When you look at their patterns of effect in SVT and PVT groups, we see again the problem that SVT are not distinct in their prediction of outcome (see also, Shura et al., 2023).

Shura, R. D., Ingram, P. B., Miskey, H. M., Martindale, S. L., Rowland, J. A., & Armistead-Jehle, P. (2022). Validation of the Personality Assessment Inventory (PAI) Cognitive Bias (CBS) and Cognitive Bias Scale of Scales (CB-SOS) in a post-deployment veteran sample. The Clinical Neuropsychologist, 1-18.

Until we refine what we are predicting (our criterion) and are able to do so, it’s unclear to me how well we can lean on or support existing theory of validity based on existing data. The trends are right – in general it looks pretty good – but even when just looking at PVT v SVT (not even getting into the other issues in specific reference to a given measure) we dont have the refined measurement we need to ‘best’ predictive capacity. Said another way, using another example, some of these scales (NIM, F, CBS, RBS, etc.) measure over-reporting by how we describe them, but its their NPP and specificity which are highest. As a function of what they measure, we are measuring engaged responding but not necessarily who is over-reporting (we don’t tend to catch them; Ingram & Ternes, 2016; Sharf et al., 2017).

Published by Dr. Ingram's Psychology Research Lab

I'm an assistant professor of counseling psychology at Texas Tech University and an active researcher of psychological assessment, veterans, and treatment engagement. I am also in private practice here in Lubbock Texas.

Leave a comment