Clinical Judgement in Suicide Assessment: Using outpatient Veteran intakes to test the viability of high impact decision making

Suicide is a low frequency, high impact behavior. By extension, assessment of suicide is a critical component to effective mental health intervention. Research on suicide has expanded (see Bryan’s [2021] Rethinking Suicide: Why Prevention Fails, and how We Can Do Better, for instance) to suggest that future prediction of risk is also more complicated that the historical linear pathway we prescribe (ideation leads to planning and planning leads to attempts). In short, suicide is a high impact behavior which we are poorly able to predict.

Further complicating the assessment of suicide risk is the robust research base which has found clinical judgement limited. Consistently, clinical decision making underperforms relative to statistical/actuarial methods (see Meehl, 1954 or Ægisdóttir et al., 2006 for a meta-analytic review). Such findings reflect that we are poor prognosticators of future behavior based on our understanding of past behavior.

We wondered about the over-simplification of the type of prediction. In short, are all tasks (regardless of seriousness) equally poor at being predicted by clinical judgement OR are all actuarial measures created equal within the framework of measurement based care (Meehl’s original work was in support of the MMPI, rather than brief and less robustly validated measures which dominate clinical monitoring). Thus, our study looks at suicide risk in a sample of outpatient mental health Veteran patients, including both agreement between actuarial/judgement at intake and risk assessment over time. Big shoutout to Keegan Deihl (a recently adopted grad student) and Tristan Herring (undergrad lab member) for their work on this project, which is being presented at the 2021 Combat PTSD Conference.

Click here to download the poster PDF.

Click here to see the pre-recorded video presented to the 2021 Combat PTSD Conference (with presentation by PATS’ very own Tristan Herring)

As an aside on this project, I’m super proud of Tristan. He has been with the lab for a little under a year as an undergraduate research assistant and he is a rock star on this poster and one another paper currently submitted for publication. He has tackled learning confusion matrixes and classification statistics that have played a critical role on both projects. Grad programs, you can have him next year – if you’re lucky! A big hats off to Keegan as well. This is my first project working with him as my graduate student and I’m looking forward to more.

Sharing Research: Internship Application Process and Outcomes

I’m excited that the work out of the PATS lab is reaching the folks who can use it. Earlier this year, I published an article looking at internship site competitiveness for health service psychology graduate students from the perspective of Directors of Training. Next month, I will be presenting to the section for students and early career professionals in Society of Clinical Psychology (APA Division 12) as part of a webinar about this work. I can’t wait!

The Personality Assessment Inventory’s Cognitive Bias Scale: Validation of the Scale of Scale Formats in a Military Sample

The Cognitive Bias Scale (CBS) was recently developed for the PAI (see Gaasedelen et al., 2019). The rationale behind the CBS’s development was that the PAI lacked any over-reporting indicators which assessed cognitive performance, and other personality inventories (i.e., MMPI-2-RF and MMPI-3) had such measures (e.g., Response Bias Scale [RBS]). Using similar methodology that was utilized during the development of the RBS (items were identified based on failed PVT performance and then combined into a scale), Gaasedelen and colleagues created a new validity scale for the PAI using a mixed neuropsychological sample. Subsequently, Armistead-Jehle, Ingram, and Morris replicated the scale in a military sample. In both cases, CBS worked well for identifying those with concurrently failed PVT. Check out the link above to see the article by myself and Nicole Morris of the PATS lab.

Subsequently, Boress et al (2021) examined alternative formats to create the CBS scale using the same, mixed clinical sample of patients on which the original CBS was calculated. They created three distinct scales, called scale of scales (CBS-SOS) because of their use of scale T-scores rather than item-level responses. In their paper, the CBS-SOS each performed well and provided support for the scale level versions of the CBS (AUC ranging from .72 to .75 for CBS-SOS-1 to CBS-SOS-3, respectively).

CBS-SOS Calculation Formulas

CBS-SOS-1 = (NIM + SOM + DEP + ANX + SCZ + SUI) / 6

CBS-SOS-2 = [(NIM*.015246) + (SOM*.033504)+(ANX*.017804)+(DEP*.010947) + (SCZ*-.002386) + (SUI*-.006888)] / 6

CBS-SOS-3 = (NIM + SCZ + SOM-C + SOM-S + DEP-P + ANX-P + PAR-R) / 7

As a follow-up to their work, we once again examined these CBS derived values within a military sample and contrasted performance to the CBS scale (not done in the CBS-SOS validation paper). This work is being presented at this year’s National Academy of Neuropsychology (NAN) conference and is being written up for publication now [Click to Download the Poster]. Here is what we (Armistead, Ingram, & Morris) found:

In case you are wondering why there is not sensitivity, specificity, PPP, and NPP for CBS-sos-2, it was an intentional decision at the time of this poster draft given that its performance its not unique from the other scale of scale forms. Given the more demanding calculation, I opted not to do so. I feel guilty about excluding it so will likely calculate it for the final poster (and certainly will for the paper we are writing up this fall on this project).
  • AUC values for each of the CBS-SOS scales (and CBS) approximated large effects and offered approximately the same overall AUC classification value (~.70),
  • CBS is highly correlated with all forms of the CBS-SOS (.83 to .84)
  • Mean differences are medium in effect (Cohen, 1988) for the CBS and CBS-SOS scales, with effects all ranging from .72 to .75 in magnitude. The CBS-SOS-1 and CBS-SOS-3 had mean differences which were clinically meaningful (i.e., T-score difference of 5+ points)
  • Cut values are slightly different in a military sample than in the non-military mixed neuropsychological sample on which the CBS-SOS formats were initially validated. The sensitivity took the largest dive with values for CB-SOS1 and CB-SOS3 below .05 when specificity was set at a .90 threshold.

MMPI-3 Over-reporting Scales: A Simulation Study of PTSD, mild TBI, and co-morbid Presentation

New Paper Alert, with not 1 but THREE of my advisees!

Download the paper HERE

Study Context The MMPI-3 is the latest revision in the line of the MMPI family of instruments, and includes updated norms and scale revisions. Included within the revisions on the MMPI-3 are changes to several of the well-validated MMPI-2-RF over-reporting scales. Three scales include new or reworked items, in addition to the renorming process. In this study (just accepted for publication in The Clinical Neuropsychologist) we examine how effective they were in a simulation, symptom coached design.

We picked a four condition design PTSD, mTBI, comorbid PTSD+mTBI since validity scales on the MMPI are designed to detect different symptom sets of invalid responding (e.g., infrequent psychopathology on F and Fp or infrequent somatic/neurological concerns on Fs, FBS, or RBS). PTSD offers a largely internalizing pathology symptom-set while TBI is largely somatic/cognitively focused. Few studies have evaluated comorbid conditions in validity scale feigning, and symptom sets have previously moderated scale effectiveness (both in simulation designs and in meta-analytic reviews). Given the high frequency of PTSD and mTBI overlap in military/veteran samples, this provided a great context for us to examine the MMPI-3’s scale utility. We coached participants on symptoms via a brief verbal description they and a written symptom description derived from Wikipedia on each condition.

Results Across the four conditions, the scales had effect sizes similar to those in other studies on symptom validity test effectiveness (e.g., other over-reporting scales on measures like the MMPI-2, MMPI-2-RF, or PAI) compared to control (d ~ 1.0 to 1.5) but negligible effects between diagnostic conditions. Our effects are different than the other simulation study out there (Whitman et al., 2021); however, ours are closer to what would expect. In our original paper we listed both the MMPI-2-RF and the MMPI-3, as well as incremental analyses but our final report is only the MMPI-3. I’ve provided both analyses below. Results are similar across instruments, which isn’t surprising given the correlation between the scales. In general, using the MMPI-3 over-reporting scales at their recommended cut scores means that you can be confident about those invalidating as being most likely exaggerating or misrepresenting symptoms but you may miss many others who are misrepresenting their symptoms.

We also calculated sensitivity and specificity across a variety of scale cut scores and, in general, scale performance was consistent with past work on broadband validity scales (high specificity, low sensitivity, and mean scores below recommended cut values).

The lack of effect between diagnoses of dissimilar symptom sets (e.g., somatic/cognitive versus psychopathology) was unexpected given past study on moderation. Likewise, FBS was distinct in its performance relative to the other scales – people elevated less on it and rarely invalidated it. This is curious since its designed for head injury litigants. Our FBS findings may reflect the simulation study design; however, since it is the only scale with that pattern of results, further research on a reason why this may have occurred is also warranted. Replication with military and veteran samples given the high relevance to referral concerns is also needed.

Service Era – Does it Matter for Assessment?

New Paper Alert!

Service era is a huge part of how military/Veterans identify themselves, and it varies during their wartime and homecoming experiences. When it comes to psychological assessment, the question is do these variations in experience become important considerations to ensure culturally competent and responsible assessment practices. This has been investigated a little, but the results have been interpreted as contradictory due to the overlap in (and inherent revisions between) the instruments compared.

A brief history: Glenn et al (2002) started asking this question with the MMPI-2. They compared response endorsement in a sample (Vietnam v Gulf 1) of those receiving PTSD Clinical Team (PCT) outpatient care and concluded that the wartime experience was different and important. Conversely, Ingram et al (2020) used the MMPI-2-RF in a nationally drawn PCT sample & found no differences (Vietnam v Gulf 1+2). I found Glenn’s differences may be a function of measure error & scale quality given the scale changes in the MMPI-2-RF. Why did I combine Gulf 1+2 given that they are considered different ears by the VA, you may ask? Because of how the data was gathered, service era was classified how it is reported within the electronic medical record system. A definite shortcoming when looking at eras. So studies on service era assessment have: (i) excluded Post-9/11 because of study age (Glenn) and (ii) Combined Gulf 1 and Post-9/11 (Gulf 2) into a single sample, despite the substantial variation in service experience

They have also only focused on the MMPI (2/2-RF). While popular, the PAI is equally widely used (Ingram et al., in press; Wright et al., 2017) and doesn’t have the same problem of scale version revisions as a potential explanation for findings (see Ingram et al, 2020) So what did we do about this shortage in the literature? We sampled Veterans from PTSD Treatment Teams (PCTs) and compared Vets from all three eras (Vietnam, Gulf, and Post-9/11) on the PAI, after controlling for Gender and Combat Exposure Severity. And what did we find?

These results are a (non-comprehensive) sample of the scales analyzed, differences interpreted were statistically important – and also clinically meaningful (i.e., greater than a medium effect / 5T points). We didn’t have item level data so couldn’t evaluate some aspects in question. Its important to note the high frequency of mean scores at or above T70, which is the PAI cut-score for clinical severity – a frequency more pronounced in Vietnam/Post-9/11.

Emma (PATS lab undergrad lab RA) presented on the MMPI-3’s EAT scale at her first national conference!

Emma did such a fantastic job during her presentation. The blitz talk was 3 minutes and she was presenting to the most knowledgeable group of individuals about psychological assessment and the MMPI specifically that I can imagine being assembled. This high quality of a talk is evidence of not only how great a student Emma is, but also how great of a mentor she had with Nicole Morris. Great job to both of them.

Not only am I excited about how great her talk was, but it also got me even more excited for July when we are going to work on putting together some papers based on this project. Want to see a sneak peak of the first paper – an expanded interpretation of the MMPI-3 EAT scale in college students? Check out her presentation below (Click to download the PowerPoint).

A study of the APPIC Match

Click here to download the pre-Print

Two years ago I started working with Dr. Adam Schmidt on another training related study, just accepted for publication in the Journal of Clinical Psychology. This time, we were curious about factors leading to successful internship application and match at various sites. There has been some work on this in the past (of which Ginkel et al., 2010 is the most recent), but research on training in health service psychology is limited. Studies on internship, for instance, frequently have examined lumped together sites (VA, Medical Center, College Counseling Centers) despite those sites being extremely distinct in their needs and goals of treatment. As such, we examined what Training Directors value (e.g., believe lead to better outcomes) during interview offer stage, and in the applicant ranking stage. We compared across site types defined by APPIC using 186 (~30%) of training directors. Below we have the pre and post-interview criteria.

Here are a few major stand out standout points to us:

(1) publications are valued less than conference presentations (WHAT?) for interview, and research is valued minimally in generally perhaps reflecting a research practice divide. A low value on research is also associated with less emphasis on EST, which has some unique implications for the professional development of future psychologists (see APA, 2006).

(2) differences in criteria are frequently on thing which take longer to become involved with (research output, assessment given the year in program for intellectual/personality coursework, etc.), meaning that for maximum flexibility in training those things must be started earlier. Said another way, it will be difficult to shift towards an internship that values those later (such as an AMC) while it would be easier to shift away. This has implications for program priorities in the time of offerings and progression.

(3) At a ranking determination, people value the in person interview performance A LOT. Even attendance of an in person interview varies between sites. This is strange given that in person interviews are notoriously bad predictors of subsequent work behaviors. With the virtual interviews this last year during COVID-19, this begs for some interesting and important follow up studies. The potential impact on trainees is huge.

We also asked the question “what is fit” and used qualitative methods to identify themes, and see how they differed between sites and what intersections in listed themes there were. So what is fit? Training directors tend to identify three patterns of characteristics, and how these differ should direct trainees about what they should emphasize, and when.

Copyright APA 2021

Graduation Weekend for the PATS lab

Chris Estes, Brittney Golden [Grad Student], Brittany Leva, Dani Taylor, Nicole Morris [Grad Student], Me (Top Row)
Tristan Herring, Mia Chu, Kassidy Wilson, and Will Derrick (Bottom Row)

WOW! I can’t believe so many of the awesome undergrads are leaving the lab all at once (Kassidy, Brittany, Dani, and Will.. and Liz Morger left us in December). We had such a huge group for the last two years and they all helped the lab feel like a little family. Now, suddenly, HALF are gone this year! As sad as it is to see them go, I’m excited to see what they do next and can’t wait to hear about all the successes to come. It was also super awesome to get to meet the families of the graduating seniors. PATS alumni will be spread all around the country – Watch out!

A few members couldn’t make it to the cookout this year sadly, and they were missed. For those starting in the lab next year, know that we will definitely do this again!

Incoming PATS lab members!

I’m thrilled to have Megan (left) and Bryce (right) joining the PATS lab here at Texas Tech this fall. Megan is interested in assessment of psychopathology and will be focusing on projects related to the PAI and MMPI, and the implementation of contemporary models of diagnoses. Bryce’s passions revolve around military and veteran mental health service and will be helping with projects involving those individuals. I can’t wait to see them in Lubbock, and to start working with them on the projects that match their interests.

A related thought: It was such a remarkable year for applicants and although I’m happy to have these two joining me, I also want to say that there were so many other amazing applicants who I was unable to interview or accept. I’m proud of the PATS lab, as well as each of those who sought out a place in our lab – even they didn’t ultimately wind up with us.