Introduction

Obstructive sleep apnea (OSA) is a disorder that is characterized by obstructive apneas and hypopneas due to repetitive collapse of the upper airway during sleep [1]. OSA is often associated with excessive daytime sleepiness, impaired daytime function, metabolic dysfunction, and an increased risk of cardiovascular disease and mortality [2]. Several studies have shown significant correlations between OSA and metabolic abnormalities. These correlations have been found to cause or worsen various metabolic disorders. Diabetes mellitus (DM) is a common metabolic disorder [3]. In recent years, both domestic and international research has extensively investigated the mechanisms underlying the association between OSA and DM. Factors such as increased sympathetic nervous activity, intermittent hypoxia, hypothalamic-pituitary-adrenal axis dysfunction, systemic inflammation, and adipocytokines due to OSA contribute to heightened insulin resistance [4]. Additionally, autonomic dysfunction in diabetic patients is linked to increased central chemoreceptor sensitivity to CO2 and decreased peripheral chemoreceptor sensitivity to CO2 [5]. Only about 30% of these patients may exhibit OSA, but without periodic breathing or central sleep apnea. Therefore, a more profound exploration of the pathophysiological mechanisms between OSA and DM is needed to guide clinical management and treatment [6].

The International Diabetes Federation underscored the close relationship between obstructive sleep apnea (OSA) and diabetes mellitus (DM) in 2008. Studies have demonstrated that 30% of OSA patients concurrently suffer from diabetes, and the prevalence of OSA among individuals with DM can reach up to 80% [7]. The combined damage to target organs from both conditions exacerbates treatment complexity, as monotherapy often fails to achieve satisfactory therapeutic outcomes [8, 9]. In the last decade, it has been recognized that OSA is very common in patients with DM and that metabolic disorders such as insulin resistance, glucose tolerance abnormalities, and DMs are also common in patients with OSAs [10].The intermittent hypoxemia and repetitive awakenings of OSA trigger a series of pathophysiological events, including activation of the sympathetic nervous system, increase in oxidative stress, alterations of pro-adrenocorticotropic function, and inflammatory reason of adipocytokines [11].These pathophysiologic changes alter normal glucose homeostasis and may increase the risk of developing DM [12, 13]. Conversely, DM results in abnormalities in ventilation and upper airway neural control, and causes peripheral neuropathy, which accelerates the progression of OSA [14]. Furthermore, OSA may be exacerbated by oxidative stress activation, inflammatory pathways, and aberrant autonomic nervous system activity, which are linked to DM [15].Consequently, the development of a more thorough diagnostic approach is crucial for the early detection of DM illnesses linked to OSA patients.

With abundant resources, machine learning is a young subject that can handle massive, complicated, and varied amounts of data [16]. In recent studies, machine learning has provided significant insights into sleep research, neurophysiology, and the diagnosis and treatment of diseases [17]. Our capacity to identify pertinent aspects from gene expression profiles in big, high-dimensional data has steadily improved. In this work, we gathered two OSA datasets and two DM datasets from the Gene Expression Omnibus (GEO) database, and we used a variety of integrated bioinformatics methods to identify the key genes and putative processes of OSA-associated DM. Furthermore, we verified the pivotal gene’s expression pattern and used machine learning to create a diagnostic column-line graph model for OSA prediction based on the pivotal gene (STK17A) found in the OSA-associated pathogenic genes. In order to identify the relationship between the critical gene and the immunological environment, we lastly investigated the immune cellular signature of OSA.

Materials and methods

Microarray data

The two OSAs datasets GSE135917, GSE75097 and the DMs datasets GSE41762, GSE25724 were obtained from the NCBI Gene Expression Synthesis (GEO) database, which is available from https://www.ncbi.nlm.nih.gov/geo/.

Microarray data processing

After preparing the data for each disease, the Limma (A package used to identify differentially expressed genes) was used to correct, normalize, and log2 transform the original microarrays from the GSE135917 and GSE41762 datasets. And screened the differentially expressed genes in OSA and DM in the experimental group compared with the control group. In this analysis, the cutoff values for screening differentially expressed genes were set at P < 0.05 and|log2 fold change (FC)| > 0.585.

Gene Ontology (GO) enrichment analysis of differential genes

Functional classification of genomic data, such as biological processes, cellular components, and molecular functions, can be obtained using the GO database. GO analysis is thus a method for defining the role of genes and proteins.

Gene set enrichment analysis (GSEA) enrichment analysis

GSEA approach uses a gene list sorted by phenotypic significance to evaluate the trend of gene distribution in a preset gene set. This helps clarify whether the genes in these groups show significant enrichment in biologically relevant processes.

Machine learning

Four machine learning techniques were employed to further hone in on the possible genes connected to DM that OSA patients should consider. Because of its unrestricted variable conditions and exceptional accuracy, sensitivity, and specificity. Random Forest (RF) is a good choice for continuous variables and consistently produces reliable predictions. Support Vector Machine Recursive Feature Elimination (SVM-RFE) is based on the SVM maximum interval principle. In the first iteration, all of the dataset’s feature sets are optimized for SVM model training, and after that, scores are calculated for each feature in descending order. The gradient boosting technique is the foundation of the integrated learning model eXtremeGradient Boosting (XGB). By repeatedly training several weak classifiers and using gradient descent to optimize the loss function, it builds powerful splitters. XGB handles structured data and large-scale datasets with accuracy, efficiency, and good performance. Generalized linear modeling (GLM) is a popular nonparametric statistical technique for modeling discrete data. The link between continuous and categorical variables, as well as situations in which the dependent variable is a binary, multivariate, or counting variable, can all be handled by GLM.

Construction of Nomograms and evaluation of diagnostic marker prediction models

Nomogram is an effective tool for integrating multiple indicators to predict the occurrence and progression of diseases. It was constructed using the ‘RMS’ package based on the hub gene. Area curves under receiver operating characteristic (ROC) were plotted to assess the performance of the hub gene and nomogram in the diagnosis of OSA. In addition, ROC curves were performed to determine whether decision-making based on nomograms favored the diagnosis of OSA. Finally, calibration curve and decision curve analysis (DCA) were employed to evaluate the efficiency of the nomogram’s prediction for DM related to OSA.

Immune infiltration analysis

The degree of immune cell infiltration in the OSA gene expression profile was evaluated using the CIBERSORT software. Wilcoxon test was used to compare the proportion of 22 kinds of immune cells between OSA and non-OSA groups, and P < 0.05 was statistically significant. Finally, a Spearman’s rank correlation coefficient analysis showed a statistically significant correlation between the expression of diagnostic biomarkers and the number of invading immune cells at P < 0.05.

Results

Identification of differential genes

In GSE135917,446 DEGs were screened from OSA samples and normal controls, of which 263 up-regulated genes and 183 down-regulated genes were screened. Meanwhile, in the GSE41762 dataset, 2506 differentially expressed genes, including 1152 upregulated genes and 1354 downregulated genes, were obtained by analyzing DM samples and normal controls. Then, the overall distribution of the two data sets and the DEG are represented by principal component analysis, volcano and heat maps (Fig. 1.A-F), respectively. There were 32 overlapping genes in the two data sets, including 22 up-regulated genes and 10 down-regulated genes (Fig. 1.G-H).

Fig. 1
figure 1

Identification of differentially expressed genes. (A-F) Principal component analysis, Volcano maps and Heatmaps showed differentially expressed genes in GSE135917. (D-F) Principal component analysis, Volcano maps and heat maps showed differentially expressed genes in GSE41762. (G) 10 down-regulated overlapping genes. (H) 22 up-regulated overlapping genes

Functional enrichment of OSA-related DM-related disease-causing genes

We used GO functional enrichment on the differential genes that the disease shares in order to gain a deeper understanding of the roles and particular mechanisms of the causative genes. The majority of the disease-causing genes in OSA-associated diabetes mellitus were shown to be enriched for negative regulation of neurogenesis and positive regulation of embryonic growth, according to an analysis of biological processes (BP) under the gene ontology (GO) term. When it comes to the cellular components (CC) that GO keywords examine, the majority of the causal genes are found in the exocytic and synaptic vesicle membranes. The most significant item in the molecular function (MF) analysis was growth factor receptor binding (Fig. 2).

Fig. 2
figure 2

Functional enrichment analysis of differentially expressed genes. (A) GO Circle. (B) GO Bar chart

Screening pivotal genes with diagnostic value by machine learning

Four well-established machine learning models, namely Random Forest Model (RF), Support Vector Machine Model (SVM), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGB), were applied to the OSA and DM datasets. The aim was to identify disease genes related to OSA-associated DM disorders with high diagnostic potential. The feature genes of each model were ranked based on their root mean square error (RMSE) (Fig. 3.A, E), with the RF and SVM machine learning models generating relatively low residuals (Fig. 3.B-C). In addition, The discriminant performance of four machine learning algorithms on the test set (Fig. 3.D) was evaluated by calculating the receiver operating characteristic (ROC) curve using five-fold cross-validation. According to these results, the 5 most important variables (STK17A, SNORD115_32, FGF9, CRLF3 and MMP7) were selected from the SVM model would be explored in greater depth in OSA dataset. The XGB model had the lowest residuals in the DM data set (Fig. 3.F-G), and the area under the ROC curve (AUC) of the four models was higher (Fig. 3.H).Thus, the first five characteristic genes with the lowest residuals (HDDC2, STK17A, LDAF1, PARP12 and NPR3) were identified as predictive genes.

Fig. 3
figure 3

Machine learning model construction. Important features in the OSA dataset for the RF, SVM, GLM, and XGB machine models. (B) The OSA dataset’s inverse cumulative distribution of residuals. (C) The OSA dataset’s cumulative residual distribution. (D) Analysis of the OSA dataset’s receiver operator characteristic (ROC). Important features in the DM dataset’s RF, SVM, GLM, and XGB machine models (E). (F) The residuals’ inverse cumulative distribution in the DM dataset. (G) The DM dataset’s cumulative residual distribution. (H) DM dataset receiver operator characteristic (ROC) analysis

We next used the dataset to evaluate the five-gene SVM and XGB diagnostic models. The ROC curves demonstrated the five-gene diagnostic model’s good performance, with an AUC value of 1 in GSE25724 (Fig. 4.A) and 0.917 in GSE75097 (Fig. 4.B).This shows that our five-gene-based prediction model is workable even if the dataset’s small sample size results in a poor general prediction performance. To ensure its validity, more testing with a larger, independent cohort is necessary. After overlaying the five candidate genes from SVM and the five potential genes from XGB, STK17A was the only overlapping gene in both subgroups (Fig. 4.C).

Fig. 4
figure 4

Validation of machine learning models and the acquisition of core genes. (A) Analysis of diagnostic models based on 5 genes in GSE75097. (B) Analysis of diagnostic models based on 5 genes in GSE25724. (C) Venn diagram shows the numbers of overlapping genes

Construction of a diagnostic model for OSA-related DM

In order to have a better performance in diagnosis and prediction, we constructed a nomogram based on the central gene STK17A by analyzing it through logistic regression (Fig. 5.A).The calibration graph shows that the predictive power of the nomogram diagnostic model was close to that of the ideal model (Fig. 5.B).In addition, the DCA analysis shows that the decision-making based on Nomo graph model may be beneficial to the diagnosis of OSA-related DM (Fig. 5.C).

Fig. 5
figure 5

Construction and validation of nomograms. (A) Nomogram used to predict the risk of OSA. (B) Calibration curve. (C) DCA

Characterization of core genes

We used GSEA enrichment analysis to look into the STK17A expression trend in the pathway (Fig. 6.A-B).The findings suggested that the coagulation cascades, complement system, and NcRNA metabolic pathway might be significant factors in the onset of OSA. Furthermore, we conducted a thorough analysis of the immune cell infiltration features between the OSA group and the healthy control group. The proportions of 22 immune cell types were shown to differ significantly between the OSA and normal control groups based on the CIBERSORT algorithm. The OSA group had greater proportions of memory B cells, CD8 T cells, M0-type macrophages, and mast cells.On the other hand, the OSA group exhibited negative correlations with activated B cells, plasma cells, CD4 memory T cells, regulatory T cells, activated NK cells, monocytes, and M2-type macrophages (Fig. 6.C).Correlation analysis showed that STK17A, along with neutrophils and plasma cells, were associated with OSA immune cell accumulation (Fig. 6.D).The expression level of the selected feature is shown in the figure (Fig. 6.E-F), STK17A expressed higher both in OSA and DM.

Fig. 6
figure 6

GSEA and immune infiltration analysis. (A-B) GSEA enrichment analysis mediated by STK17A. (C) Violin plots show differences in the infiltration of 22 types of immune cells between OSA and healthy controls. (D) Correlation analysis between STK17A and 22 kinds of immune cells. (E) The expression levels of STK17A in OSA. (F) The expression levels of STK17A in DM

Discussion

OSA is a prevalent clinical illness that is often misdiagnosed, yet it is becoming a severe public health concern [18].It seems that diabetes mellitus (DM) is a separate risk factor for the development and course of OSA. As computational biology and high-throughput sequencing technologies have advanced, numerous studies have suggested predicting gene expression profiles based on different machine learning techniques. Since any machine algorithm we select could be biased, in this study we combined gene expression profiles and used a consensus machine learning algorithm to include the genes found by the more accurate algorithm in the following stage of the investigation. We then conducted external data validation to evaluate the diagnostic model’s viability in various centers. These findings imply that genes chosen using a variety of combinatorial techniques provide insights on disease modifiers and diagnostic traits.

We first screened for differential genes co-expressed by both and identified 32 hub genes, for which we performed a series of bioinformatics analyses in order to clarify the roles of these differential genes in OSA and DM.GO enrichment analysis showed that these DEGs were significantly enriched in, among other things, neurological regulation. Recently novel DNA and RNA chemical modifications have been found to be involved in the regulation of the mammalian central nervous system [19], such as miRNAs that promote neural progenitor cell proliferation by targeting phosphatases and tensin homologs [20], and lncRNA depletion leading to a decrease in newborn neurons [21].Mammalian neurons are strictly dependent on glucose as the main energy source, and energy metabolism is tightly regulated during neuronal differentiation and degeneration. Therefore, mitochondria play a key role in cytoskeletal remodeling, axon growth, dendritic and synaptic activity during neurodevelopment and adult neurogenesis [22], and aerobic glycolysis in neurons also maintains axon elongation and synaptogenesis. The study of lncRNAs in OSA is currently in the initial and preliminary initiation stages, and more research is needed. It has been shown that lncRNAs are closely associated with abnormal blood glucose levels and insulin resistance in diabetic patients and are thought to be important players in the development of diabetes and diabetic complications [23]. MicroRNA (Mirna) is an important regulatory molecule of cell function, which can decrease gene expression. According to a study by Santamaria-Martos et al., patients with OSA have a dysregulated miRNA profile in comparison to individuals without OSA [24], which highlights the significance of mirnas and their regulatory mechanisms. Changes in their expression could cause important genes and pathways to become dysregulated, which would accelerate the onset and course of OSA. MiRNAs are also key in the progression of DM and its associated complications [25], which mainly lead to pancreatic β-cell damage and insulin resistance. From this, we hypothesize that negative regulation of the nervous system affects the central control of respiratory and upper airway neurological reflexes, which promotes sleep apnea and also contributes to the development and progression of DM.

Chronic intermittent hypoxemia increases oxidative stress by enhancing relative oxygen production and oxidative/antioxidative imbalance in OSA patients [26].Oxidative stress permeates the development of diabetes and its complications. Hyperglycemia can carry out free radical production via multiple pathways, and oxidative stress also promotes decreased pancreatic β-cell function and insulin resistance. STK17A (serine/threonine kinase 17 A) involved in the positive regulation of DNA damage response and apoptotic processes [27]and the regulation of reactive oxygen species (ROS) metabolic processes. Excessive accumulation of reactive oxygen species can lead to elevated levels of oxidative stress and altered oxidative modifications of specific DNA/proteins and lipid metabolites, which in turn cause cellular damage to promote disease progression. In our study, we found that this gene is common to both OSA and DM, and we suggested that the mechanism by which DM affects OSA may be related to the oxidative stress process. We then constructed a column-line graph model for OSA diagnosis using STK17A.It was found that the model has exceptional predictive power. Patients can benefit from the nomogram at high risk thresholds of 0 to 1.

By analyzing the GSEA of STK17A, we found that it is closely related in ncrna metabolic process, complement and coagulation cascades and other pathways.

Non-coding RNA molecules known as transfer RNAs (tRNAs) are necessary for the function for protein synthesis. They undergo significant modification to enhance their folding, stability, and function after transcription. TRNA modifications protect or induce the cleavage of tRNAs into repressive small ncRNAs [28].Non-coding RNAs (ncRNAs) comprise long non-coding RNAs (lncRNAs) and small non-coding RNAs (miRNAs), both of which are involved in regulation and are a component of the epigenome [29].This is consistent with our study described above, demonstrating that STK17A influences development in OSA-related DM disease.

Complement and coagulation cascades pathway plays a crucial role in the maintenance of immune health. There is a higher level of complement C3 in the peripheral blood of OSA, which indicates the presence of tissue damage and inflammatory responses in patients [30], whereas complement C3 is the most important component of the complement system and an intermediate link in the activation pathway, the final lysis of the membrane-forming complex [31]occurs through cytolysis. Studies have shown that elevated C3 in patients with OSA may be associated with glucose metabolism [32].This further illustrates the crucial role that the immune system plays in the development and progression of DM and OSA.

To better understand the relationship between immune function and the disease, we conducted a comprehensive examination of immune cell infiltration in OSA. In our study, the proportion of CD8 T-cell infiltration was significantly higher in the OSA group than in the control group, and the proportion of CD4 T-cell infiltration was slightly lower. The primary function of CD4 + T cells is to enhance and amplify the immune response by secreting lymphokines. In contrast, CD8 + T cells suppress the functions of T lymphocytes and B lymphocytes, thereby dampening the immune response to maintain immune balance [33]. Some studies have shown that in patients with OSA, there is a significant increase in peripheral blood CD8 + cells and a decrease in CD4 + cells and the CD4+/CD8 + ratio. However, other research indicates an increase in both CD4 + and CD8 + cell counts in OSA patients, suggesting variability in findings. Nevertheless, it is undeniable that the immune function in OSA patients is impaired [34]. Immune cell infiltration in OSA is strongly linked to the hub gene STK17A, suggesting that potential biomarkers may interact with immunological pathways to exacerbate OSA. Thus, a thorough grasp of the immunological pathways linked to OSA is crucial for the creation of novel prognostic or diagnostic biomarkers as well as therapeutic targets for the condition. Meanwhile, we found that the expression of STK17A was increased in both OSA and DM, suggesting that STK17A may play a role in promoting disease progression, and may become a molecular marker for the treatment of OSA and DM by inhibiting the expression of STK17A. Interestingly, previous studies have identified that targeting and regulating STK17A can inhibit the proliferation, invasion, and migration of cervical cancer cells [35]. Based on these findings, STK17A’s therapeutic potential may make it a viable drug target for treating OSA and DM. It is hypothesized that intervention targeting STK17A could affect disease onset and progression in OSA and DM patients, ultimately improving their prognosis.

Our study also has some shortcomings. The expression level of STK17A may need further verification by protein blotting or immunohistochemistry. And due to the limitation of sample capacity, the column-line diagram model may need further examination before clinical application.

Conclusion

In summary, our study identifies STK17A as a crucial diagnostic biomarker for the shared molecular pathways between OSA and DM. The genes involved are linked to oxidative stress and neuroregulation, suggesting potential therapeutic targets. Immune cell analysis reveals STK17A’s complex role in both conditions. These insights enhance our understanding of the diseases and offer valuable directions for future research and clinical use.