Are methamphetamine users compulsive? Faulty reinforcement learning, not inflexibility, underlies decision making in people with methamphetamine use disorder Robinson, Alex H. Perales López, José César Volpe, Isabelle Chong, Trevor T-J. Verdejo-García, Antonio Cognitive inflexibility Compulsivity Methamphetamine use disorder Reinforcement learning Reversal learning Methamphetamine use disorder involves continued use of the drug despite negative consequences. Such ‘compulsivity’ can be measured by reversal learning tasks, which involve participants learning action-outcome task contingencies (acquisition-contingency) and then updating their behaviour when the contingencies change (reversal). Using these paradigms, animal models suggest that people with methamphetamine use disorder (PwMUD) may struggle to avoid repeating actions that were previously rewarded but are now punished (inflexibility). However, difficulties in learning task contingencies (reinforcement learning) may offer an alternative explanation, with meaningful treatment implications. We aimed to disentangle inflexibility and reinforcement learning deficits in 35 PwMUD and 32 controls with similar sociodemographic characteristics, using novel trial-by-trial analyses on a probabilistic reversal learning task. Inflexibility was defined as (a) weaker reversal phase performance, compared with the acquisition-contingency phases, and (b) persistence with the same choice despite repeated punishments. Conversely, reinforcement learning deficits were defined as (a) poor performance across both acquisition-contingency and reversal phases and (b) inconsistent postfeedback behaviour (i.e., switching after reward). Compared with controls, PwMUD exhibited weaker learning (odds ratio [OR] = 0.69, 95% confidence interval [CI] [0.63–0.77], p < .001), though no greater accuracy reduction during reversal. Furthermore, PwMUD were more likely to switch responses after one reward/punishment (OR = 0.83, 95% CI [0.77–0.89], p < .001; OR = 0.82, 95% CI [0.72–0.93], p = .002) but just as likely to switch after repeated punishments (OR = 1.03, 95% CI [0.73–1.45], p = .853). These results indicate that PwMUD's reversal learning deficits are driven by weaker reinforcement learning, not inflexibility. 2024-01-30T10:55:05Z 2024-01-30T10:55:05Z 2021-01 journal article https://hdl.handle.net/10481/87616 https://doi.org/10.1111/adb.12999 open access