Introduction

Depression is marked by persistent and profound emotional distress, and it has become an increasingly serious global mental health issue. It leads to a significant reduction in physical functioning and quality of life, while contributing to a rise in disease incidence and mortality rates. In 2017, approximately 17.3 million adults in the United States, aged 18 and older, experienced at least one major depressive episode, representing a prevalence of around 7.1%1. According to the World Health Organization’s (WHO) 2017 report, “Depression and Other Common Mental Disorders: Global Health Estimates”, there were 322 million individuals worldwide living with depression. Nearly half of these cases are concentrated in Southeast Asia and the Western Pacific regions, including countries like China and India. In China, depressive disorders have been identified as the second leading cause of years lived with disability (YLDs)2. The prevalence of depression varies by age, with rates peaking in the elderly. Among women aged 55–74, the estimated prevalence exceeds 7.5%.

Given the low cure rates and ineffective treatments, identifying risk factors for depression is crucial. Early prevention and intervention can effectively slow the progression of the disorder. Various factors have been linked to depression, including age, gender, occupation, and lifestyle3. As a leading cause of disability worldwide, depression also has been linked to various environmental factors, including exposure to heavy metals.

Human health, including the occurrence of depression, is often influenced by the combined impact of multiple metals. However, most studies have focused solely on specific metal exposures4,5,6, utilizing traditional statistical or ML analyses7,8,9. To more effectively explore the relationship between depression and heavy metal exposure, a novel analytical approach is required that accounts for the combined effects of multiple metal exposures.

Traditional methodologies for disease identification involve numerous stringent standards for preparing datasets. However, with advancements in computer science and the growing volume of information, researchers are increasingly challenged to uncover hidden insights from big data10. Machine learning (ML), with its black-box nature, requires fewer preprocessing standards, thereby enhancing the ability to analyze large volumes of information. This capability supports hazard identification and other health-related decision-making processes11.

Recent research leveraging machine learning techniques has provided novel insights into the relationship between heavy metals and depression, particularly among aging populations. This review synthesizes findings from the past 5 years, focusing on the role of heavy metals in the etiology of depression. Xia et al.12 conducted a study using machine learning algorithms to analyze data from the National Health and Nutrition Examination Survey (NHANES) 2017–2018, revealing significant associations between depression and specific heavy metals. Their research indicated that cadmium (Cd), ethyl mercury (EtHg), and mercury (Hg) were particularly associated with depression, with Cd and EtHg showing positive correlations and Hg a negative one. This study contributes to a broader literature on the environmental determinants of mental health. For example, Berk et al.13 found associations between persistent organic pollutants and depressive symptoms, while Scinicariello et al.3 reported links between hearing loss and depression, which may be influenced by heavy metal exposure. The neurotoxic effects of Cd are well-established, with mechanisms including oxidative stress and interference with essential minerals like zinc and calcium14. Mercury’s role in neuropsychiatric disorders is also recognized, with EtHg and inorganic mercury species being of particular concern due to their ability to cross the blood-brain barrier15. The inverse relationship between Hg and depression, as noted by Xia et al., is intriguing and suggests a complex interplay between environmental toxins and mental health. This finding aligns with other studies that suggest a protective effect of fish consumption against depression, likely due to the nutritional benefits of omega-3 fatty acids outweighing the risks associated with mercury exposure1.

In our study, we analyzed datasets from the NHANES (2013–March 2020) to explore the relationship between depression and heavy metal exposure. We developed 5 machine learning (ML) models to identify depression based on heavy metal exposure and compared their performance metrics. The best-performing model was further optimized using a GA to enhance its efficiency. Additionally, we incorporated advanced electronic medical record (EMR) mining techniques, specifically SHAP16 and LIME17, to evaluate the contribution of heavy metals in the identification of depression. This approach highlights potential associations that could inform future research and guide epidemiological investigations, contributing to a better understanding of early risk factors for intervention.

Methods

Participants

The US NHANES investigates the US population using various survey strategies to collect demographics, dietary, examination, laboratory, and questionnaire data. All data are available on the American Centers for Disease Control and Prevention website (https://www.cdc.gov/nchs/nhanes). Our study sample comprised three contiguous cycles of NHANES data from 2013 to March 2020.

The study applied specific inclusion criteria: participants were required to be over 20 years old, have completed both blood and urine tests for heavy metals, and provided responses to the NHANES questionnaire, which included information on their depression status. The exclusion criteria included participants with more than 10% missing data or any contradictory information. As a result, 19,368 participants were included in the final analysis.

Data collection

Demographics characteristics of the study participants

Participants’ demographic and relevant characteristics were gathered from NHANES, including gender, age (in years at screening), Race/Hispanic origin w/ NH Asian, education level (college or above, high school or equivalent, and less than high school), poverty-to-income ratio (PIR) (≤ 1, 1–4, and ≥ 4)18, and body mass index (BMI, kg/m2).

Heavy metals

Our analysis incorporated the urinary and blood concentrations of 16 heavy metals. The National Center for Environmental Health implemented strict quality control protocols to ensure accurate detection of all heavy metal levels19.

Outcome ascertainment

Since the 2013–2014 data release cycle, professional physicians have diagnosed major depressive disorder in NHANES using the codes F32.9 and F33.9, in accordance with the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10)20.

Pre-processing of features

In our study, we selected 22 variables (referred to as features in the field of machine learning), including 19 continuous and 3 categorical variables. After splitting the data into training and test sets, we applied data preprocessing only to the training set, ensuring the independence of the test set and preventing data leakage. We excluded data with a loss rate of 10% or higher. Missing values in continuous variables were imputed with the median, unordered categorical variables with the mode, and ordinal categorical variables with the nearest neighbor values. Features were standardized using the Standard Scaler, and categorical variables were transformed using one-hot encoding21. We employed Principal Component Analysis (PCA) and the Select K Best (SKB) algorithm for feature extraction22. Variables contributing little to the model were removed during preprocessing to prevent overfitting.

The entire process of feature selection and preprocessing was strictly conducted within the training data and did not involve the test set at any stage. The test set remained completely independent and was not used for any feature selection or preprocessing steps.

Model establishment

Repeated K-Fold cross-validation was applied on the training set to construct and evaluate the model23. We employed 5 ML algorithms commonly used in the field of EMR mining8,9,24, including Deep Neural Networks (DNN), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), Decision Tree (DT), and eXtreme Gradient Boosting (XGB), to establish models for identifying depression based on heavy metal exposure. Each of these 5 models has distinct characteristics. DNN: Typically offers higher accuracy with a simple structure for data training but possesses strong black-box characteristics, making it difficult to understand its decision-making principles25. SVM: Robust to data variations and capable of handling nonlinear, multidimensional datasets26. GNB: Performs well on small-scale data, supports multiple classification tasks, and is suitable for incremental training, though it may introduce noise and redundancy27,28. DT: Supports visual analytics, is easy to understand and interpret, but is prone to overfitting29. XGB: An optimized library designed to increase distributed gradient boosting, offering high efficiency, flexibility, and portability30. However, it has numerous parameters that need adjustment for optimal performance31. To mitigate the class imbalance issue, we applied the built-in class weighting function in the model, which assigns higher importance to the minority class, enhancing the model’s ability to detect cases of depression.

Initially, each algorithm’s mean performance was evaluated on the training set using K-fold cross-validation, where hyperparameters were tuned to achieve the most stable performance on the validation set (derived from the training set). The most effective machine learning algorithm was then selected based on its performance on an independent test set. We then used a Genetic Algorithm (GA) to fine-tune the parameters of the chosen model to overcome its limitations. SHAP and LIME were applied to interpret the model by highlighting relevant risk variables for identifying depression in participants from 2013 to March 202032. SHAP provided an overall interpretation of the model, while LIME was used for more localized, partial interpretations.

Statistical analysis

Continuous variables were presented as medians with interquartile ranges, while categorical variables were described as counts with percentages. The chi-square test was used to compare group-specific characteristics. Heavy metal concentrations were expressed as geometric means with geometric standard deviations. Trends over the 8 + years (across 3 data release cycles) were analyzed using the Mann-Kendall test.

Model effectiveness was evaluated using several indicators, including average area under the curve (AAUC)35 and 95% confidence intervals (95%CI), best AUC (BAUC), average precision score (APS), average recall, average f1 score, average accuracy, average Brier score loss, average cross-entropy loss, average Jaccard index, and average Cohen’s kappa of each model by repeated K-Fold cross-validation. Focusing on these metrics is more appropriate for imbalanced datasets and provides a more comprehensive evaluation of how well the model identifies cases of depression.

All analyses were conducted using Python 3.9.7, with the majority of the modeling and evaluation processes implemented using the scikit-learn library. A significance level was set at P < 0.05. An overview of the methodology is presented in Fig. 1.

Fig. 1
figure 1

Overview plot.

Results

Participants’ demographics characteristics

The characteristics of the study participants are presented in Table 1. A total of 19,368 individuals were included in the analysis. Of these, 555 were diagnosed with major depressive disorder. The cohort consisted of 9,397 men (48.5%), and the median age of participants was 57 years (33, 69). Those with major depressive disorder were more likely to be women, younger, have a higher BMI, and be non-Hispanic white (all P < 0.05).

Table 1 The study participants’ characteristics in NHANES (2013–2020.3).

Heavy metals’ concentrations

The heavy metal concentrations in urine and blood for each data release cycle are described in Table 2. Across the data release cycles, significant trends were observed for Barium, Cadmium, Cobalt, Cesium, Manganese, Lead, Antimony, Tin, Thallium, and Tungsten in urine, as well as Lead, Cadmium, Mercury, Selenium, and Manganese in blood (all Pfor trend< 0.05).

Table 2 Mean values of heavy metal concentration by each NHANES (2013–2020.3) data release cycle.

Models’ preprocessing

In the feature selection process, PCA determined that at least 18 variables were needed to retain over 90% of the original information. The SKB feature scores ranged from 0.01 to 1083.44. We selected the top 18 features based on these scores to optimize our ML models. Five machine learning algorithms were then applied to the NHANES datasets using repeated K-Fold cross-validation for model training.

Models’ performance

The XGB model exhibited optimal performance with an AAUC of 0.686 (95% CI: 0.68–0.69), a BAUC of 0.942, and an APS of 0.062, all significantly higher than the AUC values of the other four models (P < 0.05). To improve AAUC and APS for depression identification, we utilized a GA for parameter tuning, which resulted in the GA-XGB model achieving the best performance. The receiver operating characteristic (ROC) curves and precision-recall curves for all six machine learning models, including GA-XGB, are displayed in Fig. 2. The models demonstrated good accuracy in identifying depression: DNN (97.2%), SVM (97.2%), DT (93.6%), XGB (97.1%), and GA-XGB (97.4%).

Fig. 2
figure 2

The best receiver operating characteristic curve and precision-recall curve for models.

Models’ comparison

Table 3 compares the performance of the machine learning models, including metrics such as AAUC, BAUC, APS, average recall, average F1 score, average accuracy, average Brier score loss, average cross-entropy loss, average Jaccard index, and average Cohen’s kappa for all 5 models. The XGB model achieved the highest scores in 6 out of the 10 performance indicators, demonstrating its superior performance in depression identification. Subsequently, we used GA to optimize the parameters of the XGB model, further enhancing its effectiveness, as shown in the far right of Table 3. Specifically, the GA-XGB model achieved the highest scores in 7 out of the 10 discrimination characteristics. The GA-XGB model’s performance metrics were AAUC (AUC: 0.669; 95% CI: 0.663–0.676), BAUC (0.97), and APS (0.068).

Table 3 Comparison of ML models’ performance.

Feature importance visualization

SHAP and LIME were employed to visualize the influence of features on depression identification in the GA-XGB model. The SHAP and LIME summary plot illustrates the impact of each selected feature on the model’s performance in identifying depression (Fig. 3).

Fig. 3
figure 3

The SHAP&LIME-GA-XGB summary plot.

The SHAP value plot on the left side of Fig. 3 globally indicates that Cadmium (20.636) in blood positively influenced the model, while Barium (− 30.558), Thallium (− 11.242), Tin (− 12.339), Manganese (− 17.385), Antimony (− 19.088), Lead (− 23.989), Tungsten (− 21.126) in urine, and Lead (− 111.499), Cadmium (− 35.003), Mercury (− 70.835), Selenium (− 10.389), Manganese (− 16.206) in blood negatively influenced the model. Additionally, the SHAP and LIME summary plot with statistical tests shows that being women, younger, non-Hispanic, and having a lower BMI are associated with a higher risk of depression. The SHAP interaction value plot, located on the upper right of Fig. 3, demonstrates the interactions between key features. The LIME value plot, on the lower right of Fig. 3, locally indicates the feature importance for a single sample (the 14,000th sample). SHAP values illustrate the contributions of each feature to the model’s ability to identify depression.

Prediction interpretation

In the SHAP decision plot on the right side of Fig. 4, each line represents an individual participant, with all lines converging at a single point, 0.971. The features are arranged in descending order based on their impact on the observations. On the left side of Fig. 4, the tree plot illustrates the optimal decision logic used for discrimination, representing one of the fundamental trees in the model’s decision-making process.

Fig. 4
figure 4

The SHAP-GA-XGB decision plot.

Discussion

In our study, we developed a ML strategy to identify depression in the 2013–2020.3 NHANES data, with a focus on its relationship with heavy metal exposure. The GA-XGB model was selected for its superior performance among the five ML algorithms tested, achieving an average AUC of 0.959 and an accuracy of 0.968. To improve the interpretability of these algorithms, we combined the SHAP game theory method with LIME, enabling more comprehensive feature interpretation on both global and local scales through summary and decision plots. Our findings suggest that the SHAP & LIME-enhanced GA-XGB model shows promising potential for identifying depression associated with heavy metal exposure.

This research builds on previous studies that used machine learning (ML) algorithms for disease prediction, highlighting the advantages of advanced classification techniques in enhancing prediction accuracy. ML, a branch of artificial intelligence, employs mathematical algorithms to detect patterns in diverse datasets, thereby aiding in the decision-making process34. However, the complexity of ML algorithms often limits their interpretability, making it challenging to apply them effectively in medical decision-making35.

Our SHAP & LIME-GA-XGB model utilizes multi-source NHANES data, including demographics, examinations, laboratory results, and questionnaires, eliminating the need for additional data collection. Since 2013, significant focus has been placed on heavy metal exposure in the United States36, coinciding with the implementation of ICD-10 for recording NHANES disease data. 39 We analyzed extensive data, particularly the concentrations of heavy metals in participants’ urine and blood samples. The GA-XGB model demonstrated high efficiency, outperforming six tested ML algorithms in terms of classification robustness, supported by repeated K-Fold cross-validation to prevent overfitting38. SHAP and LIME analyses further enhanced the interpretability of the GA-XGB model, emphasizing the importance of various features in identifying depression.

The findings from SHAP were consistent with those of previous studies, which primarily investigated the impact of heavy metal exposure on depression. Notably, the relationship between cadmium exposure and depression is particularly significant. Cybulska39 and Buser40 found that higher blood cadmium levels were associated with an increased risk of depressive symptoms. However, Gao41 and Rhee42 found that lower levels of serum uric acid, which can be influenced by cadmium exposure, were associated with depression. These findings suggest a complex relationship between cadmium exposure, depression, and potential moderating factors.

Future research should focus on monitoring and analyzing key features to help experts draw informed conclusions, rather than relying solely on algorithmic predictions. Expanding the dataset and incorporating clinical expertise could further enhance the model’s validity and performance43.

Limitations

Our study has several limitations. Firstly, due to computational constraints, we were unable to explore other potentially dynamic correlations within the limited dataset. Secondly, the self-reported nature of depression diagnoses in the NHANES questionnaire, despite following ICD-10 standards, may introduce information bias. Thirdly, the strict inclusion criteria resulted in substantial missing data, potentially leading to bias. Fourthly, the complexity of the model’s interpretation may impact the reproducibility of our findings. Fifthly, the integration of machine learning into epidemiological research offers a powerful tool for identifying patterns in large and complex datasets. However, the cross-sectional nature of the NHANES data limits the ability to establish causality, and the study’s focus on heavy metals means that other potential risk factors for depression are not considered. Lastly, in this study, feature selection and preprocessing were conducted strictly within the training data, with the test set kept entirely independent to ensure unbiased evaluation. However, feature selection was not embedded within each fold of the cross-validation process, which may risk slight overestimation of performance during internal validation. While this approach improves processing consistency and efficiency, embedding feature selection within folds would provide a more rigorous methodology. Future studies could adopt this workflow to further enhance robustness. Nevertheless, the final performance evaluation relied exclusively on an independent test set, mitigating the risk of data leakage and ensuring reliable generalizability.

Conclusion

In our study among US NHANES 2013–2020.3 participants, the SHAP&LIME-GA-XGB model was identified as an interpretable, efficient, and robust machine learning model for detecting depression based on heavy metal exposure. Cadmium in blood positively contribute to depression, while Barium, Thallium, Tin, Manganese, Antimony, Lead, Tungsten in urine, and Lead, Cadmium, Mercury, Selenium, Manganese in blood negatively contribute to depression.