Personality Assessment Inventory (PAI) Over-reporting Scale Effectiveness

Another first author pub by the most excellent Nicole Morris!

Nicole did some excellent work to build on the limited literature on PAI validity scales, evaluating their effectiveness in a military sample evaluated within a neuropsychology clinic. We used performance validity (PVT; MSVT, NVMSVT) to compare group differences. Limited work has been done prior to now on some of the PAI validity scales (see McCredie & Morey, 2018), so expanding this literature for one of the most popular and widely used personality measures (Ingram et al., 2019, 2022; Mihura et al., 2017; Wright et al., 2017) is critical. I’ve reproduced the classification accuracy statistics below for ease. The entire paper may be downloaded HERE.

Undergraduate Research: MMPI-3 scales are similar in-person and virtual

One of my fantastic undergraduates conducted a study using existing MMPI-3 study data (Morris et al., 2021; Reeves et al., 2022) to compare the effectiveness of the over-reporting scales across in person and virtual administrations. Given the guidelines put out about telehealth assessment (Corey & Ben-Porath, 2020) and the expanding research on general comparative effects of virtual psychological interventions, we expected that the scales would perform equally. Indeed, that was what we found. One implication of our findings is that future meta-analyses of the MMPI-3 validity scales will likely not need to consider this element of study design as a potential moderator for scale effectiveness.

Click HERE to download a copy of the study poster.

Webinar on Internship: Get the Materials here!

Today (in about 30 minutes from my writing this) I’m going to be presenting to Division 12 (Clinical Psychology)’s section for Students and Early Career Professions (formally called Section Ten). I’m super excited to help demystify the internship process and help applicants maximize their success and desired career trajectories. I also want to make sure the materials are available from the talk

Click the slide below to download a PowerPoint version of the talk

After the talk, I will also be updating this post to include a video of the talk. Stay tuned!

Assessment competency: What is it, who has it, and how do we move the needle for masters level providers in psych science?

Over the last few years I’ve started to delve into competency research, particularly around psychological assessment. In brief, research over the last two decades has clearly detailed that students do not have a sufficient amount of training in psychological assessment to effectively and efficiently conduct the higher-order conceptual tasks associated with diagnosis and behavior prediction via a finalized integrative report. A recent predoctoral internship match survey also highlights this insufficient training with the median number of assessment hours is 100 while the median number of therapy hours is over 600.

My work has also shown how poorly perceived competence is related to performance-based competence (Ingram et al., 2019, 2020), and how prediction of wanting to engage in assessment is a function of perceived competency (Preprint: Bergquist et al., 2019). Some pre-doctoral training programs dont even expect this core competency (Ingram et al., 2021). Research on developing competency in assessment also lags far behind psychotherapy training, with substantially less research. The research creates a cohesive message: doctoral psychologists are not sufficiently trained in this core component of our identity and professional practice.

At the same time, APA is adapting the times and pushing an accredited masters program framework in health psychology (some states have LPA licenses, but not specific accreditation standards of psychology on which to rely). This is a good move by APA and helps them ensure clinical psychological practice has a foothold in the discussion of what constitutes good treatment. We need this seat at the table – this is the newest iteration of the old battle that led to psychologists being able to do therapy, much to the resistance of psychiatrists at the time.

The problem is “if doctoral psychologists are not competent at the end of their program, how can masters level folks do the same?”. This is a big problem and a major question. Recently a set of proposed guidance was released and public comments were allowed. The purpose of this guidance was to setup the scope of practice for these providers. I had the opportunity to help prepare a set of shared comments from APA Division 12 (Clinical Psychology)’s section for Assessment Psychology. These (Click to download a draft of the document and HERE for the final version) are some takeaways about how to make sure assessment conducted by the MA provider is good practice not just practice from my perspective/contribution to the document.

  • Restrict scope of practice within psychological assessment
  • Ensure strong conceptual understanding of diagnostic, psychometric, and socio-cultural theory necessary to effectively produce integrative interpretations
  • Specify the type of training sequence required similar to the explicitness of doctoral program requirement
  • Require specific training, including supervised applied practica
  • Expand research on training and competency development in assessment

Clinical Judgement in Suicide Assessment: Using outpatient Veteran intakes to test the viability of high impact decision making

Suicide is a low frequency, high impact behavior. By extension, assessment of suicide is a critical component to effective mental health intervention. Research on suicide has expanded (see Bryan’s [2021] Rethinking Suicide: Why Prevention Fails, and how We Can Do Better, for instance) to suggest that future prediction of risk is also more complicated that the historical linear pathway we prescribe (ideation leads to planning and planning leads to attempts). In short, suicide is a high impact behavior which we are poorly able to predict.

Further complicating the assessment of suicide risk is the robust research base which has found clinical judgement limited. Consistently, clinical decision making underperforms relative to statistical/actuarial methods (see Meehl, 1954 or Ægisdóttir et al., 2006 for a meta-analytic review). Such findings reflect that we are poor prognosticators of future behavior based on our understanding of past behavior.

We wondered about the over-simplification of the type of prediction. In short, are all tasks (regardless of seriousness) equally poor at being predicted by clinical judgement OR are all actuarial measures created equal within the framework of measurement based care (Meehl’s original work was in support of the MMPI, rather than brief and less robustly validated measures which dominate clinical monitoring). Thus, our study looks at suicide risk in a sample of outpatient mental health Veteran patients, including both agreement between actuarial/judgement at intake and risk assessment over time. Big shoutout to Keegan Deihl (a recently adopted grad student) and Tristan Herring (undergrad lab member) for their work on this project, which is being presented at the 2021 Combat PTSD Conference.

Click here to download the poster PDF.

Click here to see the pre-recorded video presented to the 2021 Combat PTSD Conference (with presentation by PATS’ very own Tristan Herring)

As an aside on this project, I’m super proud of Tristan. He has been with the lab for a little under a year as an undergraduate research assistant and he is a rock star on this poster and one another paper currently submitted for publication. He has tackled learning confusion matrixes and classification statistics that have played a critical role on both projects. Grad programs, you can have him next year – if you’re lucky! A big hats off to Keegan as well. This is my first project working with him as my graduate student and I’m looking forward to more.

Sharing Research: Internship Application Process and Outcomes

I’m excited that the work out of the PATS lab is reaching the folks who can use it. Earlier this year, I published an article looking at internship site competitiveness for health service psychology graduate students from the perspective of Directors of Training. Next month, I will be presenting to the section for students and early career professionals in Society of Clinical Psychology (APA Division 12) as part of a webinar about this work. I can’t wait!

The Personality Assessment Inventory’s Cognitive Bias Scale: Validation of the Scale of Scale Formats in a Military Sample

The Cognitive Bias Scale (CBS) was recently developed for the PAI (see Gaasedelen et al., 2019). The rationale behind the CBS’s development was that the PAI lacked any over-reporting indicators which assessed cognitive performance, and other personality inventories (i.e., MMPI-2-RF and MMPI-3) had such measures (e.g., Response Bias Scale [RBS]). Using similar methodology that was utilized during the development of the RBS (items were identified based on failed PVT performance and then combined into a scale), Gaasedelen and colleagues created a new validity scale for the PAI using a mixed neuropsychological sample. Subsequently, Armistead-Jehle, Ingram, and Morris replicated the scale in a military sample. In both cases, CBS worked well for identifying those with concurrently failed PVT. Check out the link above to see the article by myself and Nicole Morris of the PATS lab.

Subsequently, Boress et al (2021) examined alternative formats to create the CBS scale using the same, mixed clinical sample of patients on which the original CBS was calculated. They created three distinct scales, called scale of scales (CBS-SOS) because of their use of scale T-scores rather than item-level responses. In their paper, the CBS-SOS each performed well and provided support for the scale level versions of the CBS (AUC ranging from .72 to .75 for CBS-SOS-1 to CBS-SOS-3, respectively).

CBS-SOS Calculation Formulas

CBS-SOS-1 = (NIM + SOM + DEP + ANX + SCZ + SUI) / 6

CBS-SOS-2 = [(NIM*.015246) + (SOM*.033504)+(ANX*.017804)+(DEP*.010947) + (SCZ*-.002386) + (SUI*-.006888)] / 6

CBS-SOS-3 = (NIM + SCZ + SOM-C + SOM-S + DEP-P + ANX-P + PAR-R) / 7

As a follow-up to their work, we once again examined these CBS derived values within a military sample and contrasted performance to the CBS scale (not done in the CBS-SOS validation paper). This work is being presented at this year’s National Academy of Neuropsychology (NAN) conference and is being written up for publication now [Click to Download the Poster]. Here is what we (Armistead, Ingram, & Morris) found:

In case you are wondering why there is not sensitivity, specificity, PPP, and NPP for CBS-sos-2, it was an intentional decision at the time of this poster draft given that its performance its not unique from the other scale of scale forms. Given the more demanding calculation, I opted not to do so. I feel guilty about excluding it so will likely calculate it for the final poster (and certainly will for the paper we are writing up this fall on this project).
  • AUC values for each of the CBS-SOS scales (and CBS) approximated large effects and offered approximately the same overall AUC classification value (~.70),
  • CBS is highly correlated with all forms of the CBS-SOS (.83 to .84)
  • Mean differences are medium in effect (Cohen, 1988) for the CBS and CBS-SOS scales, with effects all ranging from .72 to .75 in magnitude. The CBS-SOS-1 and CBS-SOS-3 had mean differences which were clinically meaningful (i.e., T-score difference of 5+ points)
  • Cut values are slightly different in a military sample than in the non-military mixed neuropsychological sample on which the CBS-SOS formats were initially validated. The sensitivity took the largest dive with values for CB-SOS1 and CB-SOS3 below .05 when specificity was set at a .90 threshold.

MMPI-3 Over-reporting Scales: A Simulation Study of PTSD, mild TBI, and co-morbid Presentation

New Paper Alert, with not 1 but THREE of my advisees!

Download the paper HERE

Study Context The MMPI-3 is the latest revision in the line of the MMPI family of instruments, and includes updated norms and scale revisions. Included within the revisions on the MMPI-3 are changes to several of the well-validated MMPI-2-RF over-reporting scales. Three scales include new or reworked items, in addition to the renorming process. In this study (just accepted for publication in The Clinical Neuropsychologist) we examine how effective they were in a simulation, symptom coached design.

We picked a four condition design PTSD, mTBI, comorbid PTSD+mTBI since validity scales on the MMPI are designed to detect different symptom sets of invalid responding (e.g., infrequent psychopathology on F and Fp or infrequent somatic/neurological concerns on Fs, FBS, or RBS). PTSD offers a largely internalizing pathology symptom-set while TBI is largely somatic/cognitively focused. Few studies have evaluated comorbid conditions in validity scale feigning, and symptom sets have previously moderated scale effectiveness (both in simulation designs and in meta-analytic reviews). Given the high frequency of PTSD and mTBI overlap in military/veteran samples, this provided a great context for us to examine the MMPI-3’s scale utility. We coached participants on symptoms via a brief verbal description they and a written symptom description derived from Wikipedia on each condition.

Results Across the four conditions, the scales had effect sizes similar to those in other studies on symptom validity test effectiveness (e.g., other over-reporting scales on measures like the MMPI-2, MMPI-2-RF, or PAI) compared to control (d ~ 1.0 to 1.5) but negligible effects between diagnostic conditions. Our effects are different than the other simulation study out there (Whitman et al., 2021); however, ours are closer to what would expect. In our original paper we listed both the MMPI-2-RF and the MMPI-3, as well as incremental analyses but our final report is only the MMPI-3. I’ve provided both analyses below. Results are similar across instruments, which isn’t surprising given the correlation between the scales. In general, using the MMPI-3 over-reporting scales at their recommended cut scores means that you can be confident about those invalidating as being most likely exaggerating or misrepresenting symptoms but you may miss many others who are misrepresenting their symptoms.

We also calculated sensitivity and specificity across a variety of scale cut scores and, in general, scale performance was consistent with past work on broadband validity scales (high specificity, low sensitivity, and mean scores below recommended cut values).

The lack of effect between diagnoses of dissimilar symptom sets (e.g., somatic/cognitive versus psychopathology) was unexpected given past study on moderation. Likewise, FBS was distinct in its performance relative to the other scales – people elevated less on it and rarely invalidated it. This is curious since its designed for head injury litigants. Our FBS findings may reflect the simulation study design; however, since it is the only scale with that pattern of results, further research on a reason why this may have occurred is also warranted. Replication with military and veteran samples given the high relevance to referral concerns is also needed.

Service Era – Does it Matter for Assessment?

New Paper Alert!

Service era is a huge part of how military/Veterans identify themselves, and it varies during their wartime and homecoming experiences. When it comes to psychological assessment, the question is do these variations in experience become important considerations to ensure culturally competent and responsible assessment practices. This has been investigated a little, but the results have been interpreted as contradictory due to the overlap in (and inherent revisions between) the instruments compared.

A brief history: Glenn et al (2002) started asking this question with the MMPI-2. They compared response endorsement in a sample (Vietnam v Gulf 1) of those receiving PTSD Clinical Team (PCT) outpatient care and concluded that the wartime experience was different and important. Conversely, Ingram et al (2020) used the MMPI-2-RF in a nationally drawn PCT sample & found no differences (Vietnam v Gulf 1+2). I found Glenn’s differences may be a function of measure error & scale quality given the scale changes in the MMPI-2-RF. Why did I combine Gulf 1+2 given that they are considered different ears by the VA, you may ask? Because of how the data was gathered, service era was classified how it is reported within the electronic medical record system. A definite shortcoming when looking at eras. So studies on service era assessment have: (i) excluded Post-9/11 because of study age (Glenn) and (ii) Combined Gulf 1 and Post-9/11 (Gulf 2) into a single sample, despite the substantial variation in service experience

They have also only focused on the MMPI (2/2-RF). While popular, the PAI is equally widely used (Ingram et al., in press; Wright et al., 2017) and doesn’t have the same problem of scale version revisions as a potential explanation for findings (see Ingram et al, 2020) So what did we do about this shortage in the literature? We sampled Veterans from PTSD Treatment Teams (PCTs) and compared Vets from all three eras (Vietnam, Gulf, and Post-9/11) on the PAI, after controlling for Gender and Combat Exposure Severity. And what did we find?

These results are a (non-comprehensive) sample of the scales analyzed, differences interpreted were statistically important – and also clinically meaningful (i.e., greater than a medium effect / 5T points). We didn’t have item level data so couldn’t evaluate some aspects in question. Its important to note the high frequency of mean scores at or above T70, which is the PAI cut-score for clinical severity – a frequency more pronounced in Vietnam/Post-9/11.