Abstract
Identifying predictors of treatment response to repetitive transcranial magnetic stimulation (rTMS) remain elusive in treatment-resistant depression (TRD). Leveraging electronic medical records (EMR), this retrospective cohort study applied supervised machine learning (ML) to sociodemographic, clinical, and treatment-related data to predict depressive symptom response (>50% reduction on PHQ-9) and remission (PHQ-9 < 5) following rTMS in 232 patients with TRD (mean age: 54.5, 63.4% women) treated at the University of California, San Diego Interventional Psychiatry Program between 2017 and 2023. ML models were internally validated using nested cross-validation and Shapley values were calculated to quantify contributions of each feature to response prediction. The best-fit models proved reasonably accurate at discriminating treatment responders (Area under the curve (AUC): 0.689 [0.638, 0.740], p < 0.01) and remitters (AUC 0.745 [0.692, 0.797], p < 0.01), though only the response model was well-calibrated. Both models were associated with significant net benefits, indicating their potential utility for clinical decision-making. Shapley values revealed that patients with comorbid anxiety, obesity, concurrent benzodiazepine or antipsychotic use, and more chronic TRD were less likely to respond or remit following rTMS. Patients with trauma and former tobacco users were more likely to respond. Furthermore, delivery of intermittent theta burst stimulation and more rTMS sessions were associated with superior outcomes. These findings highlight the potential of ML-guided techniques to guide clinical decision-making for rTMS treatment in patients with TRD to optimize therapeutic outcomes.
Similar content being viewed by others
Introduction
Major depressive disorder (MDD) is a prevalent and debilitating condition characterized by persistent feelings of sadness, hopelessness, and loss of interest or pleasure in daily activities [1]. An estimated 322 million individuals worldwide meet criteria for MDD, representing a substantial public health challenge impacting quality-of-life, productivity, and overall well-being [2]. While various pharmacological, psychotherapeutic, and neuromodulatory treatments are available, 30–40% of individuals with MDD experience treatment-resistant depression (TRD), defined as non-response to two or more trials of first-line interventions [3]. Because MDD is a highly heterogeneous disorder, with 277 possible symptom combinations meeting DSM-5 criteria [4], it is possible that different subtypes or symptom clusters of MDD may respond differentially to various treatments [5]. Therefore, aligning optimal treatment strategies to symptom presentations may be valuable for addressing TRD. This first requires understanding the mechanisms and predictors of treatment response in MDD, as this knowledge can inform the development of more personalized treatments that are optimal for different individuals based on their unique symptom profiles.
Repetitive transcranial magnetic stimulation (rTMS) is an established clinical intervention for the treatment of TRD, with response rates of approximately 40–60%, providing a valuable treatment option for patients not responding to first line psychotherapy and pharmacotherapy [6, 7]. rTMS involves the application of repetitive magnetic pulses to the cortex, with the dorsolateral prefrontal cortex (DLPFC) a common target for MDD due to its role in mood regulation, cognitive control, executive functioning, and emotion processing [8]. The DLPFC is also implicated in the pathophysiology of depression due to its connections with the limbic system, while acute elevations in mood following rTMS provide further evidence of DLPFC dysfunction in MDD [9,10,11,12]. Ultimately, rTMS is thought to exert its therapeutic effects through modulation of cortical excitability, neurotransmitter systems, and neuroplastic mechanisms [13], but understanding the predictors of treatment response is crucial for tailoring treatment strategies and enhancing clinical outcomes in individuals with TRD.
In recent years, advances in machine learning (ML) have provided new opportunities for leveraging large-scale datasets, such as electronic medical records (EMR), to improve healthcare delivery and inform clinical decision-making [14]. For instance, ML prediction models can be used as structured tools to more accurately stratify patients into subgroups or individualized treatment pathways [15]. In psychiatry, particularly in TRD and rTMS, ML can reveal latent multifactorial relationships that conventional methods might overlook. Recent studies utilizing large clinical registries have demonstrated the effectiveness of predictive and inferential models of treatment response and remission, highlighting factors such as the influence of treatment protocols, patient demographics, and psychiatric comorbidities on outcomes [16, 17]. Despite the rapid growth of ML in psychiatric research, the generalizability and clinical utility of many ML models published to date remain poor [18]. Motivated by the potential of ML to enhance understanding of rTMS treatment for patients with TRD, this study performed a comprehensive retrospective analysis to investigate demographic, clinical, and treatment-related factors associated with treatment response and remission.
This retrospective approach offers several strengths, including a moderately sized sample, the ability to capture real-world clinical practices, and identify long-term treatment outcomes [19] without the resource demands of a clinical trial. Specifically, this study explores 1) demographic and clinical characteristics of TRD patients receiving rTMS treatment, 2) various protocols and parameters utilized in rTMS administration, and 3) pre-treatment and treatment-related factors predictive of treatment response or remission. This paper aims to address the following questions: whether ML can predict response or remission following rTMS based on pre-treatment patient features and treatment-related factors better than chance; which features distinguish individuals responding favorably to rTMS from those who do not; and whether the ML models may offer clinical benefit. In generating ML models to predict rTMS treatment response and identifying the features that drive both positive and negative outcomes, this study aims to advance understanding of TRD, optimize treatment strategies, and ultimately improve patient outcomes.
Materials and methods
Design
A retrospective study was conducted, including EMR (Epic, Verona, WI, USA) of TRD patients, collected from the clinical database of the UC San Diego Interventional Psychiatry Program (UCSD-IPP). All contributing procedures comply with the ethical standards of the relevant national and international committees on human subjects’ research and the Helsinki Declaration and were approved by the UCSD Human Research Protections Program.
Patients
Included patients were prescribed rTMS treatment from UCSD psychiatrists credentialed in delivering rTMS. rTMS was initiated in outpatients who met criteria for TRD, did not present with any treatment contraindications, had adequate insurance coverage, and provided consent. As illustrated in Fig. 1, EMR records for 419 patients seeking rTMS in 2017–2023 at UCSD were screened for the following inclusion criteria: (i) primary diagnosis of MDD; (ii) aged 18 years and older; (iii) completion of at least 15 sessions by the date of final PHQ-9; (iv) clinically meaningful depressive symptoms, as indicated by PHQ-9 total score >4; (v) failure to respond to at least 2 antidepressant pharmacotherapy trials; and (vi) completion of clinical evaluations prior to and following the acute treatment phase.
A total of 246 patients began a rTMS treatment course and 239 completed at least 15 sessions before their final PHQ-9 assessment. Two participants had baseline PHQ-9 scores <5, indicating no clinically significant depressive symptoms, and were excluded from analyses. Five patients were excluded for not having received an MDD diagnosis, yielding a final analytical sample of 232 MDD patients. All patients had previously failed to respond to at least two antidepressant treatments and 90% reported having an MDD diagnosis >10 years.
Treatment procedure
All rTMS treatment was performed in the UCSD-IPP clinic by technicians trained in protocol administration with oversight by an attending psychiatrist. Overseeing psychiatrists reviewed the treatment plans after each session. rTMS was delivered using either a Magventure MagPro with a B70 coil or a BrainsWay H1-coil system. For all patients, the resting motor threshold (rMT) was identified by finding the motor hotspot and utilizing the lowest level of stimulation intensity to elicit movement in the opposite abductor pollicis brevis (APB) muscle on 50% of stimulation attempts. Once rMT was obtained, treatment session parameters were determined. For Magventure treatments, the coil position was localized using the Beam F3 method, and treatment was delivered per standard practice [20]. Patients who were initiated on the iTBS protocol (at 120% rMT), involving 600 total pulses in bursts of 3 with 8-s inter-train-intervals, were transitioned to either BiTBS (90% rMT; 3600 total pulses, 1800 delivered via 3-iTBS and 1800 delivered cTBS 120% rMT protocol to the left and right hemispheres, respectively) or 3-iTBS (iTBS delivered three times consecutively, per Li et al. [21] if there was deemed poor efficacy or issues with tolerability). High-frequency left (HFL; 19 min) and low-frequency right (LFR; 10 min) protocols were used less commonly, both at 120% rMT. The obsessive-compulsive disorder (OCD) protocol uses a B70 coil and delivers 2000 pulses per session via 50, 2-s trains to dorsomedial prefrontal cortex (DMPFC), at 100% rMT of the foot with the Magventure B70 coil. For the BrainsWay H1-coil system either iTBS or the standard FDA-approved 18 Hz protocol was used at 120% rMT. The end of each patient’s acute treatment period was clinician-determined and coincided with reduced frequency of rTMS treatment to <2 sessions per week. Sequencing of TMS protocols across all patients included in the analysis is depicted in Supplementary Fig. 1.
Clinical assessment
The PHQ-9 is a nine-item self-report scale used to assess the presence and severity of depressive symptoms based on the Diagnostic and Statistical Manual (DSM) criteria for a Major Depressive Episode (MDE). Questions are on a four-point scale related to the frequency of symptoms presented on each item, including “not at all,” “several days,” “more than half the days,” and “nearly every day.” Total scores are calculated by adding scores to all nine items (total range: 0–27), and previously established cutoff scores for severity are: 0–4 (no symptoms), 5–9 (mild symptoms), 10–14 (moderate symptoms), 15–19 (moderately severe symptoms), and 20–27 (severe symptoms). The PHQ-9 was administered pre-treatment and at rTMS visits throughout the acute treatment phase to monitor changes in depressive symptoms. Only pre-treatment and post-acute treatment phase scores were used in the present analyses.
Predictive features
Pre-treatment features
A total of 32 EMR-derived features were extracted from patient records, including: age, gender, ethnicity, religious affiliation, educational attainment, employment status, relationship status, body mass index (BMI) category (obesity: ≥ 30 kg/m2; overweight: 25 ≤ kg/m2 < 30; normal 18.5 < kg/m2 < 25), alcohol, tobacco, and cannabis use (each operationalized as 3-level factor: current, former, or never), family history of mood disorders, 1st degree relatives with mood or psychiatric disorder diagnoses, prior ECT treatment, prior rTMS treatment, history of psychological trauma, non-suicidal self-injury (NSSI), history of suicidal ideation (SI) or suicide attempts, psychiatric hospitalizations, and duration of current MDE. Psychiatric comorbidities were also recorded, including anxiety disorders, trauma-related disorders, substance use disorders, OCD, somatic symptom and related disorders, personality disorders, autism spectrum disorders, eating disorders, and neurocognitive disorders. Comorbid diagnoses were extracted from physician notes. Psychiatric medication use, including benzodiazepines, non-benzodiazepine sedatives, anxiolytics, antipsychotics, and psychostimulants, was recorded, and antidepressant medication burden was translated into daily defined dose (DDD) equivalents. A detailed description of the coding for each drug category can be found in the Supplement. Certain continuous variables were categorized to align with clinically relevant thresholds, enhancing interpretability in the context of TRD treatment. Age was grouped into age brackets (<45, 45–64, ≥65 years), PHQ-9 scores were categorized by established severity levels, BMI was grouped by WHO classifications (normal, overweight, obese), and MDE duration was categorized as <6 months, 6–12 months, and >12 months to capture differences in chronicity. While this approach may reduce statistical power, it provides a meaningful, clinically interpretable framework for evaluating treatment outcomes. Features were extracted by manual search through individual patient’s EMR. Variables were primarily collected via initial treatment consultation note completed by the prescribing psychiatrist. In the case where specific data points were not immediately available in the consultation note, searches were conducted to identify the nearest timepoint in which that information was available.
Treatment-related features
To determine the role of treatment-related factors in depression outcomes, the following five parameters were recorded: protocol type (deep TMS, iTBS, bilateral TBS, or others that included HFL, LFR, 3-iTBS, or OCD protocols); number of total rTMS sessions; cumulative number of pulses received throughout the treatment course; and, whether patients switched rTMS protocols during treatment.
Statistical analysis
R statistical software (v.4.3.1) was used for data analysis. Independent sample t-tests and chi-square tests were used to compare demographic and clinical characteristics between rTMS non-responders and responders, defined as patients with ≥50% decrease in post-treatment PHQ-9 scores from pre-treatment scores. Pairwise post-hoc tests for factors with more than two levels were false discovery rate (FDR)-corrected. To mitigate low variance within categorical variables, variables were combined and/or operationalized as follows: rTMS protocols: 18 Hz deep TMS (dTMS), BiTBS, or iTBS, made up the majority (86.6%) of rTMS protocols in the sample; therefore, other protocols, including 3-iTBS, 3-iTBS/OCD, OCD, and HFL were combined into a single factor level (“Other”). Race/ethnicity: patients were coded as non-Hispanic, white, or “Other”. Psychiatric comorbidities: patients were coded with 1 if they had at least one comorbid diagnosis for each individual comorbidity of interest. Duration of current MDE: patients were coded as <6 months, 6–12 months, or >12 months in duration. Patients were binarized indicating completion of a 4-year degree or not. Relationship status was coded as: currently single/unpartnered, partnered/married, or separated/divorced/widowed. Non- antidepressant psychiatric medication use was coded 1 or 0 for each drug category.
To identify pre-treatment patient characteristics and treatment-associated features that best discriminate response and remission groups, a series of classification models were trained using the caret package: (i) regularized logistic regression (elastic net regression; ENet); (ii) support vector machine (SVM) with radial basis function; (iii) random forest (RF); and (iv) stochastic gradient boosting machines (GBM). All features were filtered for near-zero-variance (<5%) and linear combinations. Nested cross-validation was used for hyperparameter tuning and model performance evaluation using nestedcv with 10 inner folds and 20 outer folds. RF parameters optimized in the nested grid search included mtry (number of variables to split), splitrule (quality of each node split), and min.node.size (minimum number of samples in a node), and GBM parameters included interaction.depth (highest interaction per tree), n.minobsinnode (minimum number of samples in a node to split), n.trees (total number of trees), and shrinkage (learning rate), whereas the default tuning grids in caret were applied for ENet and SVM. To mitigate class imbalance in the dataset, Synthetic Minority Over-sampling TEchnique (SMOTE) was applied to equalize the class ratio (1:1) [22]. Tree-based classification models, such as gradient boosted trees and random forest, detect interactions by optimizing decision tree paths in which the influence of a given variable is conditioned on the values of preceding variables in the path. It is possible to visualize these relationships from the global model via SHAP interaction values.
Feature selection via a univariate filter was performed on each outer fold prior to hyperparameter tuning, and a Relief-based algorithm (ReliefFbestK in CORElearn package; k = 10) was applied in parallel for comparison with the univariate filter. Generally, performance was superior for models trained after univariate filtering (Supplementary Fig. 2), with 10 features selected on average (range: 7–13) for all models. Optimization criteria for each model’s hyperparameter tuning procedure were receiver operating characteristic area under the curve (ROC AUC). Continuous features were standardized, and categorical features were one-hot encoded, with the first level dropped as reference. Missing variables were treated as missing completely at random and nonparametric random forest-based imputation was performed using missForest.
Generalization performance and 95% confidence intervals were generated by averaging AUC across the best tune of all outer folds. Predicted probabilities across all outer folds were merged to generate additional performance metrics (F-score, sensitivity, specificity) and models with the best performance for an outcome were selected for downstream analysis. Significance of AUC values were determined by permutation testing (B = 99) of class labels, re-running the procedure for each model, and calculating the proportion of permuted models with AUC values greater than the observed data. Alpha level was set at 0.05. The contributions of predictive features were determined using SHapley Additive exPlanations (SHAP) analysis in the fastshap R package, and directionality of associations between features and outcome variables was ascertained by visualizing individual SHAP values and their dependencies with the shapviz package in R. SHAP values explain the output of machine learning models by quantifying the contribution of each feature to a specific prediction, helping to identify which factors positively or negatively influence the model’s outcome.
Using the best performing models, the predicted probabilities of treatment response and remission were derived. Model calibration was assessed with the Hosmer-Lemeshow (HL) goodness-of-fit test by binning predicted probabilities into deciles. Probability estimates were updated using beta regression, which has been shown to outperform other calibration methods [23], and recalibrated probabilities were reassessed using HL test. Decision curve analysis (DCA) was applied using dcurves to assess the clinical utility of prediction models through estimation of net benefit across thresholds of predicted risks using recalibrated probabilities from both “response” and “remission” classification models. Net benefit is equivalent to the percentage of individuals appropriately treated (“true positives”, responders and remitters) minus a weighted percentage of those inappropriately treated (“false positives”).
Results
Patient and treatment-related characteristics
Sociodemographic and clinical characteristics of patients, and treatment-related characteristics included in the analysis are outlined in Table 1. Across all patients, post-treatment PHQ-9 scores were significantly reduced compared to pre-treatment values (mean difference = −7.56, t231 = −17.6, d = 1.16, 95% CI = 0.99–1.32). Eighteen patients had baseline PHQ-9 scores < 10, indicative of minimal depressive symptoms, and response rates within this subgroup were lower than patients with PHQ-9 ≥ 10 at baseline (38.9 vs. 43.9%) but did not significantly differ (X2 = 0.17, p = 0.68). In total, 101 patients (44%) responded to rTMS, (mean difference = −13.0, t100 = −29.4, d = 2.94, 95% CI = 2.48–3.39; Fig. 2) and 53 patients (23%) remitted. The average patient age was 55.1 ± 16.9 and 54.1 ± 17.0 years in the responder and non-responder groups, respectively. Patient sex and ethnicity did not differ between groups; however, responders were more likely to have been former smokers than non-responders, and less likely to have never used alcohol. Most patients were naïve to rTMS (88.4%) and ECT (83.3%) treatment, and the majority (60%) had MDEs lasting longer than 12 months at treatment outset. Response and remission rates for patients in this study with a history of rTMS were 48.1 and 18.5%, respectively. Relative to patients without a history of rTMS, there were no differences with response to response (X2 = 0.09, df = 1, p = 0.76) or remission (X2 = 0.11, df = 1, p = 0.74). Baseline PHQ-9 scores did not differ between responders (17.9 ± 5.4) and non-responders (17.6 ± 5.3). Responders were more likely to have a history of trauma and to be diagnosed with trauma disorder compared to non-responders, but were less likely to have an anxiety disorder or be prescribed benzodiazepines or antipsychotic medications. The ratio of patients receiving sequential iTBS-BiTBS to iTBS-only treatment was higher among non-responders than responders (padj = 0.04), and non-responders received fewer rTMS sessions on average compared to responders.
Each point represents an individual patient’s PHQ-9 score at baseline and the conclusion of their acute treatment phase, grouped by response status. Horizontal dashed line at PHQ-9 = 5 indicates the threshold for depression remission. Each point represents an individual patient’s PHQ-9 score at baseline and the conclusion of their acute treatment phase, grouped by response status. Horizontal dashed line at PHQ-9 = 5 indicates the threshold for depression remission.
Classification results
Treatment response
Across the four models tested, the SVM model predicted treatment response significantly above chance (p < 0.01; Supplementary Fig. 3) with modest discrimination performance (mean AUC = 0.689, 95% CI = 0.638–0.740), outperforming RF (0.654, 0.594–0.715), ENet (0.668, 0.616–0.719), and GBM (0.617, 0.563–0.671) models (Supplementary Fig. 2). Thus, the SVM model was evaluated further. The sensitivity and specificity of the model to discriminate treatment responders from non-responders were 72.3 and 53.4%, respectively, with an F-score of 0.64. Misclassification rates varied across rTMS protocols: BiTBS had a misclassification rate of 27.8%, iTBS 47.8%, dTMS 41.7%, and Other protocols 41.9%. Following recalibration, observed and predicted probabilities of treatment response were similar (HL test: \({\chi }_{8}^{2}\) = 5.51, p = 0.70, RMSE = 1.73; Supplementary Fig. 4). In descending order, the most important variables in prediction of treatment response were anxiety disorder, former tobacco use, obesity, trauma disorder, antipsychotic use, benzodiazepine use, iTBS-BiTBS protocol, >12-month duration of current MDE, history of trauma, number of rTMS sessions, and iTBS protocol (Fig. 3A). Partial dependence plots illustrating the relationship between the top-3 features, the total number of rTMS sessions, and treatment response are shown in Fig. 3B, and the expected response probability for a representative patient is shown in Supplementary Fig. 5). DCA indicated that for threshold probabilities between 25 and 55%, corresponding to a number needed-to-treat (NNT) between 4 and 2 respectively, the net benefit of applying the model to aid in determining whether a patient should undergo rTMS is higher than either a “treat all” or “treat none” approach (Fig. 4A).
A Ranked feature importance for treatment response classification model. Grey bars indicate negative association with treatment response (worse outcome), and black bars indicate positive association (better outcome). B Partial dependence plots illustrating directional associations between selected features and treatment response. Larger positive SHAP values indicate greater probabilities of treatment response, whereas more negative values indicate lower probabilities of response.
Decision curve analysis for recalibrated (A) treatment response and (B) depression remission models. Threshold probability represents the point at which a positive response is equally valued to avoiding unnecessary treatment. Decision curves for each model are shown in blue, compared to a “treat all” approach shown in red and a “treat none” approach shown in green.
Depression remission
Across the four models tested, the GBM model predicted remission following rTMS treatment significantly above chance (p < 0.01; Supplementary Fig. 3) with good discrimination performance (mean AUC = 0.745, 95% CI = 0.692–0.797), outperforming RF (0.735, 0.680–0.791), ENet (0.740, 0.668–0.813), and SVM (0.683, 0.627–0.738) models (Supplementary Fig. 2). Thus, the GBM model was evaluated further. The sensitivity and specificity of the model to discriminate remitters from non-remitters were 58.5 and 77.7%, respectively, with an F-score of 0.50. Misclassification rates for remission also differed by rTMS protocol: BiTBS had a rate of 33.3%, iTBS 30.4%, dTMS 23.8%, and Other protocols 29.0%. Following recalibration, the observed and predicted probabilities of treatment response remained dissimilar (HL test: \({\chi }_{8}^{2}\) = 19.1, p = 0.01, RMSE = 2.73; Supplementary Fig. 4). In descending order, the most important variables in the prediction of depression remission were pre-treatment PHQ-9 score, obesity, antipsychotic use, current MDE greater than 12 months, benzodiazepine use, anxiety, prior ECT, and iTBS-BiTBS protocol (Fig. 5A). Partial dependence plots illustrating the relationship between the top-four features and treatment response are shown in Fig. 5B. DCA indicated that for threshold probabilities between 15 and 40%, corresponding to a NNT between approximately 7 and 3, respectively, using the model to determine whether a patient should undergo rTMS yielded greater net benefit than employing either a “treat all” or “treat none” strategy (Fig. 4).
Discussion
Findings presented in this study underscore the substantial variability in depression responses and remission, despite rTMS efficacy, and the promise of ML to predict treatment outcomes using EMR data alone. Importantly, both ML models for response and remission identified psychiatric comorbidities, depressive episode duration, and specific rTMS protocols as predictive features, with trauma history also associated with response outcomes.
Predictive features of treatment outcomes
Past research investigating the relationship between psychiatric comorbidities and treatment response has yielded valuable insights into the complex interplay between mental health conditions and their respective treatments [24]. This literature presents mixed results regarding impact of trauma history on rTMS. Experiences of childhood trauma and PTSD diagnosis have been reported to negatively affect rTMS outcomes in MDD patients [25, 26] or to have no significant impact [27]. Positive findings, as presented here, may point to a consequence of trauma-induced neuroplastic changes that sensitize the brain to neurostimulation [28]. For example, the therapeutic impact of rTMS could be modulated by individual differences in neural circuitry affected by trauma, which are not always uniformly engaged by standard rTMS protocols, such as 10 Hz rTMS to the left DLPFC as done in Hu et al. [26] and Yesavage et al. [25].
Former tobacco use was identified as a positive predictor of treatment response. Past literature on tobacco use in rTMS outcomes is mixed [29, 30], but nicotine’s complex neurobiological effects may enhance efficacy. However, the lack of association with current tobacco use suggests that other unaccounted factors may be influencing this finding, or it may be a statistical anomaly. Further investigation is needed to determine whether former tobacco use truly enhances rTMS response, if it serves as a proxy for another predictor, or if it is not directly related to treatment outcomes. Given the current findings, and coupled with trauma history, these associations suggest past experiences and lifestyle modifications may prime the brain’s receptivity to rTMS.
Features including comorbid anxiety disorder, concomitant benzodiazepine or antipsychotic medications, and a longer duration of current MDE were linked to lower probabilities of treatment response and remission. Anxiety is highly comorbid with depression, and this specific comorbidity may reflect distinct neurobiological underpinnings, affecting the response to rTMS [31, 32]. For instance, hyperactivity in the amygdala and its connectivity with the PFC, can be more pronounced in individuals with comorbid anxiety [33]. An extensive body of research supports that comorbid anxiety can complicate the course of depression, often leading to a more chronic and treatment-resistant condition [34].
Concurrent use of benzodiazepines or antipsychotics was associated with less favorable responses. These medications exert sedative effects and reduce cortical excitability, potentially impeding operating mechanisms of rTMS. Benzodiazepines, for instance, enhance GABAergic inhibition, which may diminish excitatory processes [33]. Antipsychotics, particularly those blocking dopamine receptors, have been shown to attenuate rTMS outcomes [34], potentially disrupting the dopaminergic modulation crucial for rTMS-induced neuroplasticity. These findings suggest patients prescribed benzodiazepines or antipsychotics may require adjusted protocols, perhaps with increased intensity or frequency to overcome medication-induced reductions in cortical excitability, or timing treatment sessions relative to medication dosing.
The observation that longer depressive episodes correlated with poorer outcomes is consistent research indicating depression chronicity can negatively impact the efficacy of various treatments. Chronic depression is associated with more severe neurobiological changes, including alterations in the hippocampus and PFC [35]. Earlier intervention might enhance response by addressing such changes before they become entrenched. Fava (2003) highlighted the predictive role of untreated depression duration in treatment resistance [36] and underscoring the possibility that adjusting standard protocols, or combining with other interventions, may increase efficacy for these patients.
The observed absence of sociodemographic variables, such as age, sex, and ethnicity, impacting outcomes aligns with past studies [37, 38], yet diverges from others. For example, Hanlon and colleagues (2022) found younger age and female sex were associated with better rTMS response rates, suggesting demographic characteristics may influence treatment success [39]. Slan and colleagues (2024) recently found iTBS was more effective for males, while 10 Hz favored females, and rTMS overall being most effective in patients over 50, particularly females [40]. The contrasting evidence indicates a complex relationship between sociodemographic and rTMS efficacy, warranting further investigation to clarify their roles in treatment personalization and optimization.
Accrued effects from patient history, such as trauma and anxiety, and concurrent benzodiazepine use may influence rTMS outcomes. The potential cumulative effect of these factors on response could be linked to their disparate impact on brain neuroplasticity and neurotransmission. For instance, a history of trauma and relatively lower anxiety might enhance receptivity to rTMS via induced neuroplastic changes, whereas the suppressive effect of benzodiazepines on cortical excitability could counteract these benefits, requiring adjustments in treatment protocols to optimize efficacy. Recent studies demonstrate that while rTMS produces generalized improvement across depressive symptoms, anxiety-related symptoms are less responsive, with outcomes influenced more by baseline symptom profiles than protocol variations [41, 42]. Further investigation into the interaction of these factors may provide deeper insights into personalized rTMS strategies for patients with complex clinical profiles.
Regarding treatment-related factors, the number of rTMS sessions was a positive predictor, with SHAP values indicating a positive linear dose-response relationship. This relationship aligns with notions that additional sessions sustain antidepressant effects, possibly through cumulative neuroplastic changes. However, this is not consistently upheld, as a meta-analysis reported higher doses of rTMS, measured by total pulses, were not always associated with depression improvement [38]. Adherence to the iTBS protocol was also among the strongest predictors of response. Notably, iTBS demonstrated superiority over iTBS-BiTBS in our sample, challenging some of the existing literature that suggests bilateral approaches can be advantageous for certain patient populations [43]. Despite this, the current findings likely relate to iTBS-BiTBS patients experiencing greater treatment-resistance at baseline, with providers transitioning protocols from unilateral to bilateral in cases of inadequate symptom relief.
Machine learning, electronic medical records, and clinical decision-making
Recent reviews of ML prediction models in psychiatric research have found that despite relatively ‘good’ discrimination performance (AUC~0.65–0.80), most studies had methodological weaknesses that limited their generalizability and utility in clinical decision-making [18, 44]. While AUC values below 0.7 are generally considered modest, they can still offer meaningful insights in complex psychiatric conditions like TRD. These models help identify relevant predictors and inform future strategies for refining and validating predictive approaches [44, 45]. The authors recommended more comprehensive reporting of ML methods and results, more robust validation methods, model calibration, and formal assessment of clinical utility, such as DCA [44, 46, 47]. In our analyses, DCA demonstrated that the response and remission models outperformed “treat all” and “treat none” strategies within probability threshold ranges of 25–55 and 15–40%, or NNT between 2 and 4 patients, and 3 and 7 patients, respectively. In other words, the NNT results suggest that for every 2–4 patients treated based on the response model, or every 3–7 patients treated based on the remission model, one additional patient would experience a superior outcome from rTMS compared to treating all patients or none. While immediately applying this analysis to real-world clinical-decision making is not warranted at this time, we included it here to advocate for the use of DCA in precision psychiatry prediction and prognostic models, aiming to illustrate its potential future utility. Both models indicate net clinical benefit within their respective threshold ranges, though more refined models incorporating additional patient features, such as neurobiological markers, polygenic risk scores, and psychosocial determinants, may confer additional benefit. Considering the use of internal cross-validation and novel methods, this model is best positioned as a preliminary predictive tool, offering foundational insights that can inform the development of more refined predictive models with future external validation. Despite demonstrating good discrimination performance, the remission model failed recalibration, as evidenced by a significant Hosmer-Lemeshow test (p = 0.01), indicating a mismatch between predicted and observed probabilities. This lack of calibration reduces the model’s reliability in clinical settings, as it may overestimate or underestimate remission probabilities, ultimately limiting its immediate utility for decision-making.
The implications of these findings are multi-fold. Firstly, they re-affirm the clinical utility of rTMS in managing TRD, especially when considering individual patient histories and comorbidities. Secondly, these results lend support for the predictive power of ML algorithms derived from EMR for refining treatment approaches, suggesting that these tools could become integral to psychiatric precision medicine. Future research should aim to validate these predictive models in diverse clinical settings, ensuring that findings are replicable and robust.
Limitations
The retrospective study design provides a robust dataset but holds inherent limitations. Firstly, while our tree-based models implicitly account for interaction effects, we did not explicitly analyze these due to sample size constraints, which may have limited our ability to detect and interpret nuanced interactions among predictors. Secondly, this study is susceptible to selection bias since it involves patients who initiated and completed treatment, which may exclude those with different response trajectories, possibly skewing findings towards patients more likely to complete treatment. Moreover, the sample in this study was predominantly highly educated and non-Hispanic White, limiting generalization to other demographics and pointing more towards the need to address disparities in access to TMS [48]. Third, information bias may arise from EMR data availability, accuracy, and completeness, with additional variability in the timing of data collection vis-à-vis treatment, possibly creating discrepancies between recorded and actual patient conditions. Reliance on EMR-extracted data presents challenges particularly in terms of incomplete entries, inconsistent documentation, and potential coding errors, which may affect the robustness of the model’s predictions. Further, our predictive models rely on several key assumptions. Tree-based models inherently capture non-linear relationships and interactions among predictors, eliminating the need for explicit interaction terms. Missing data were addressed through single imputation, assuming that data were missing at random. Overfitting was mitigated through feature selection, with checks for highly correlated predictors during preprocessing. These assumptions and limitations underscore the preliminary nature of this model, which serves as a foundation for future refinements. Future research should address these limitations by implementing prospective designs, standardizing rTMS protocols, timely data collection, and diverse populations.
Conclusion
This study elucidates predictors of treatment response to rTMS in TRD patients and highlights the importance of treatment approaches tailored to individual characteristics. By leveraging EMR and ML techniques, this study identified demographic, clinical, and treatment-related factors associated with outcomes, providing valuable insights for clinical practice and research.
Data availability
The code used for the analysis conducted is available from the corresponding author upon reasonable request.
Change history
05 June 2025
The Funding information section was missing from this article and should have read ‘Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under Award Number K23MH138746. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.’
References
American Psychiatric Association, American Psychiatric Association, editors. Diagnostic and statistical manual of mental disorders: DSM-5. 5th ed. Washington, D.C: American Psychiatric Association; 2013. p. 947.
(WHO) WHO. Mental disorders [Internet]. https://www.who.int/news-room/fact-sheets/detail/mental-disorders.
Zhdanava M, Pilon D, Ghelerter I, Chow W, Joshi K, Lefebvre P, et al. The prevalence and national burden of treatment-resistant depression and major depressive disorder in the United States. J Clin Psychiatry. 2021;82:20m13699.
Zimmerman M, Ellison W, Young D, Chelminski I, Dalrymple K. How many different ways do patients meet the diagnostic criteria for major depressive disorder? Compr Psychiatry. 2015;56:29–34.
Lynch CJ, Gunning FM, Liston C. Causes and consequences of diagnostic heterogeneity in depression: paths to discovering novel biological depression subtypes. Biol Psychiatry. 2020;88:83–94.
Benster LL, Weissman CR, Stolz LA, Daskalakis ZJ, Appelbaum LG. Pre-clinical indications of brain stimulation treatments for non-affective psychiatric disorders, a status update. Transl Psychiatry. 2023;13:1–12.
Aaronson S. Overview of a large transcranial magnetic stimulation patient outcomes database. Brain Stimul. 2021;14:1717.
Fitzgerald PB, Oxley TJ, Laird AR, Kulkarni J, Egan GF, Daskalakis ZJ. An analysis of functional neuroimaging studies of dorsolateral prefrontal cortical activity in depression. Psychiatry Res. 2006;148:33–45.
Mayberg HS. Limbic-cortical dysregulation: a proposed model of depression. J Neuropsychiatry Clin Neurosci. 1997;9:471–81.
George MS, Lisanby SH, Avery D, McDonald WM, Durkalski V, Pavlicova M, et al. Daily left prefrontal transcranial magnetic stimulation therapy for major depressive disorder: a sham-controlled randomized trial. Arch Gen Psychiatry. 2010;67:507–16.
Cole EJ, Stimpson KH, Bentzley BS, Gulser M, Cherian K, Tischler C, et al. Stanford accelerated intelligent neuromodulation therapy for treatment-resistant depression. Am J Psychiatry. 2020;177:716–26.
Chung SW, Hoy KE, Fitzgerald PB. Theta-burst stimulation: a new form of TMS treatment for depression? Depress Anxiety. 2015;32:182–92.
Peng Z, Zhou C, Xue S, Bai J, Yu S, Li X, et al. Mechanism of repetitive transcranial magnetic stimulation for depression. Shanghai Arch Psychiatry. 2018;30:84–92.
Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319:1317–8.
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58.
Sackeim HA, Aaronson ST, Carpenter LL, Hutton TM, Mina M, Pages K, et al. Clinical outcomes in a large registry of patients with major depressive disorder treated with transcranial magnetic stimulation. J Affect Disord. 2020;277:65–74.
Blumberger DM, Mulsant BH, Thorpe KE, McClintock SM, Konstantinou GN, Lee HH, et al. Effectiveness of standard sequential bilateral repetitive transcranial magnetic stimulation vs bilateral theta burst stimulation in older adults with depression: the FOUR-D randomized noninferiority clinical trial. JAMA Psychiatry. 2022;79:1065–73.
Meehan AJ, Lewis SJ, Fazel S, Fusar-Poli P, Steyerberg EW, Stahl D, et al. Clinical prediction models in psychiatry: a systematic review of two decades of progress and challenges. Mol Psychiatry. 2022;27:2700–8.
Rassen JA, Bartels DB, Schneeweiss S, Patrick AR, Murk W. Measuring prevalence and incidence of chronic conditions in claims and electronic health record databases. Clin Epidemiol. 2018;11:1–15.
Beam W, Borckardt JJ, Reeves ST, George MS. An efficient and accurate new method for locating the F3 position for prefrontal TMS applications. Brain Stimul. 2009;2:50–4.
Li CT, Chen MH, Juan CH, Huang HH, Chen LF, Hsieh JC, et al. Efficacy of prefrontal theta-burst stimulation in refractory depression: a randomized sham-controlled study. Brain. 2014;137:2088–98.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Ojeda FM, Jansen ML, Thiéry A, Blankenberg S, Weimar C, Schmid M, et al. Calibrating machine learning approaches for probability estimation: a comprehensive comparison. Stat Med. 2023;42:5451–78.
Krishnan KRR. Comorbidity and depression treatment. Biol Psychiatry. 2003;53:701–6.
Yesavage JA, Fairchild JK, Mi Z, Biswas K, Davis-Karim A, Phibbs CS, et al. Effect of repetitive transcranial magnetic stimulation on treatment-resistant major depression in US Veterans. JAMA Psychiatry. 2018;75:884–93.
Hu YT, Hu XW, Han JF, Zhang JF, Wang YY, Wolff A, et al. Childhood trauma mediates repetitive transcranial magnetic stimulation efficacy in major depressive disorder. Eur Arch Psychiatry Clin Neurosci. 2021;271:1255–63.
Ng E, Wong EHY, Lipsman N, Nestor SM, Giacobbe P. Adverse childhood experiences and repetitive transcranial magnetic stimulation outcomes for depression. J Affect Disord. 2023;320:716–24.
Karsen EF, Watts BV, Holtzheimer PE. Review of the effectiveness of transcranial magnetic stimulation for post-traumatic stress disorder. Brain Stimul. 2014;7:151–7.
Abdelrahman AA, Noaman M, Fawzy M, Moheb A, Karim AA, Khedr EM. A double-blind randomized clinical trial of high frequency rTMS over the DLPFC on nicotine dependence, anxiety and depression. Sci Rep. 2021;11:1640.
Poulet E, Galvao F, Haffen E, Szekely D, Brault C, Haesebaert F, et al. Effects of smoking status and MADRS retardation factor on response to low frequency repetitive transcranial magnetic stimulation for depression. Eur Psychiatry. 2016;38:40–4.
Shin LM, Liberzon I. The neurocircuitry of fear, stress, and anxiety disorders. Neuropsychopharmacology. 2010;35:169–91.
Comorbid anxiety and depression in adults: epidemiology, clinical manifestations, and diagnosis - UpToDate [Internet]. https://www.uptodate.com/contents/comorbid-anxiety-and-depression-in-adults-epidemiology-clinical-manifestations-and-diagnosis.
Wilson S, Olsen S, Sullivan C, Cooper D, Somez I, Widge AS, et al. Concurrent benzodiazepine use and TMS clinical outcomes. Biol Psychiatry. 2021;89:S288.
Hebel T, Abdelnaim M, Deppe M, Langguth B, Schecklmann M. Attenuation of antidepressive effects of transcranial magnetic stimulation in patients whose medication includes drugs for psychosis. J Psychopharmacol. 2020;34:1119–24.
Sheline YI, Gado MH, Kraemer HC. Untreated depression and hippocampal volume loss. Am J Psychiatry. 2003;160:1516–8.
Fava M. Diagnosis and definition of treatment-resistant depression. Biol Psychiatry. 2003;53:649–59.
Leuchter M, Citrenbaum C, Wilson A, Tibbe T, Jackson N, Krantz D, et al. The effect of older age on outcomes of rTMS treatment for treatment-resistant depression. Int Psychogeriatr. 2024;36:1070–5.
Hanlon CA, McCalley DM. Sex/gender as a factor that influences transcranial magnetic stimulation treatment outcome: three potential biological explanations. Front Psychiatry. 2022;13:869070.
Yu CL, Kao YC, Thompson T, Brunoni AR, Hsu CW, Carvalho AF, et al. The association of total pulses with the efficacy of repetitive transcranial magnetic stimulation for treatment-resistant major depression: a dose-response meta-analysis. Asian J Psychiatr. 2024;92:103891.
Slan AR, Citrenbaum C, Corlier J, Ngo D, Vince-Cruz N, Jackson NJ, et al. The role of sex and age in the differential efficacy of 10 Hz and intermittent theta-burst (iTBS) repetitive transcranial magnetic stimulation (rTMS) treatment of major depressive disorder (MDD). J Affect Disord. 2024;366:106–12.
Hutton T, Aaronson ST, Carpenter LL, Pages K, Rush AJ, Sackheim HA. The profile of symptom change with transcranial magnetic stimulation for major depressive disorder. Transcranial Magnetic Stimulation. 2024;1:100074.
Kaster TS, Downar J, Vila-Rodriguez F, Baribeau DA, Thorpe KE, Daskalakis ZJ, et al. Differential symptom cluster responses to repetitive transcranial magnetic stimulation treatment in depression. EClinicalMedicine. 2022;2:101765.
Elnazali M, Veerakumar A, Blair M, Pearce EL, Kim N, Sebastian S, et al. Unilateral and bilateral theta burst stimulation for treatment-resistant depression: follow up on a naturalistic observation study. J Psychiatr Res. 2024;180:387–93.
Nie Z, Vairavan S, Narayan VA, Ye J, Li QS. Predictive modeling of treatment resistant depression using data from STAR*D and an independent clinical study. PLoS ONE. 2018;13:e0197268.
Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology. 2024;49:285–90.
Benrimoh DA, Armstrong C, Mehltretter J, Fratila R, Perlman K, Israel S, et al. Development and validation of a deep-learning model for differential treatment benefit prediction for adults with major depressive disorder deployed in the artificial intelligence in depression medication enhancement (AIDME) study. ArXiv, abs/2406.04993.
Vickers AJ, Calster BV, Steyerberg EW. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ. 2016;352:i6.
Nagra DS, Stolz LA, Weissman CR, Appelbaum LG. Access to interventional psychiatric treatments in the United States: disparities and proposed solutions. Glob Health Econ Sustain. 2024;2:2456.
Funding
This research was funded by the University of California San Diego Health Sciences Research Award #RG114131 to author LGA, R25 MH101072 to author KT, and generous support from the Kreutzkamp Family Foundation. Research reported in this publication was supported by the National Institute of Mental Health of the National Institutes of Health under Award Number K23MH138746. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
LLB, JNK, CRW, FS, ZJD, and LGA conceived and designed the study. LLB, JNK, FS, KT, HA, VT, NS, and LS were responsible for data collection and curation. JNK, LLB, and LGA performed and analyzed statistical analyses and machine learning modeling. LLB, JNK, CRW, MP, ZJD, and LGA interpreted the results and provided clinical insights. All authors contributed to manuscript revisions and provided critical feedback. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Benster, L.L., Weissman, C.R., Suprani, F. et al. Predictive modeling of response to repetitive transcranial magnetic stimulation in treatment-resistant depression. Transl Psychiatry 15, 160 (2025). https://doi.org/10.1038/s41398-025-03380-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41398-025-03380-w