1 Introduction

Lung adenocarcinoma (LUAD) is the most common histological subtype of non-small cell lung cancer (NSCLC), accounting for 38.5% of all lung cancer cases [1]. The high morbidity and fatality rates of LUAD make it a global public health concern [2]. Statistics show that LUAD morbidity and mortality rates are increasing annually worldwide; thus, it poses risks to the health of patients and increases the economic burden on patients [3]. In recent years, the subtype of lung adenocarcinoma (LUAD) has made remarkable progress in the fields of molecular biology, genomics, and clinical medicine. Among these, the most significant typing model is the molecular subtype, such as those based on driver gene mutations (EGFR, KRAS, ALK, ROS1, RET, MET, HER2). Studies have shown that incorporating co-mutated genes such as TP53, STK11, and KEAP1 can further refine these subtypes and influence immunotherapy resistance [4]. Additionally, classification by immune infiltration has been proposed: based on PD-L1 expression, T cell infiltration levels, and macrophage polarization states, tumors are divided into “hot tumors” (with abundant immune infiltration) and “cold tumors” (with immune rejection or desertification), to guide the selection of immune checkpoint inhibitors (such as PD-1/PD-L1 inhibitors) [5]. Metabolic subtypes: Classifications based on glycolysis, fatty acid oxidation, or glutamine metabolic characteristics provide evidence for metabolic targeted therapy (e.g., IDO1 inhibitors, glutaminase inhibitors). AI-based multi-omics model: By integrating genomic, transcriptomic, epigenomic, and proteomic data, deep learning has been used to classify LUAD into four subtypes with significantly different prognoses [6]. In light of the various LUAD classifications, clinical studies are conducted [7, 8].

Each LUAD subtype has different transcriptional profiles and gene epigenetic changes, which can lead to more refined predictions of prognosis and treatment response LUAD. However, there are still difficulties and challenges in LUAD subtyping and prognosis research, and different disciplines need to be combined to further strengthen this research. In particular, combining multiple disciplines, such as bioinformatics and molecular biology, may promote a deeper understanding of the molecular mechanisms of LUAD occurrence and development.

Tumor metabolism is an important area of current tumor research; fatty acid metabolism research has attracted our attention [9]. Fatty acid metabolism is a crucial physiological process of high growth and proliferation in tumor cells. Fatty acids are among the primary energy sources for tumor cell growth and metastasis [10]. During the growth and division of tumor cells, glucose is mainly used to generate ATP, but when the tumor grows to a specific scale, due to factors such as changes in the microenvironment and insufficient blood supply, other energy sources must be found, and fatty acid metabolism will be fully utilized [11]. There is an important relationship between fatty acid metabolism and immune evasion by tumor cells. Fatty acids regulate the immune response, the expression of inflammatory mediators, and apoptosis and are closely related to the immune escape of tumors [12, 13]. Studies have shown that abnormal fatty acid metabolism can promote the immune escape of tumor cells and regulate immune responses in the tumor microenvironment. Targeting of abnormal fatty acid metabolism may represent a new tumor immunotherapy strategy [14, 15].

Continuous science and technology innovation has led to new ideas in tumor research. High-throughput sequencing and bioinformatics technology have become important means of antitumor research, as they can be used to analyze the molecular mechanism of tumor occurrence and development, to define molecular subtypes, and to predict patient prognosis and treatment effects. Their advantages include high efficiency, accuracy, and comprehensiveness related to the processing of large datasets and the quick location of gene and short sequence information [16, 17]. Tumor metabolism is one of the key processes of tumor progression, and immunotherapy has become a key topic in anticancer research. Studies have shown an intrinsic link between tumor metabolism and the immunotherapy response [18]. Bioinformatics techniques can be used to analyze the enrichment of different metabolic pathways and immune-related genes, and these pathways are associated with different subtypes, highlighting the complex characteristics of metabolic regulation and the immune response. The integration of tumor metabolism, immunotherapy, and bioinformatics research is expected to provide new ideas and strategies for personalized treatment and guide clinical treatment decision making.

In this study, using the transcriptome and clinical data of LUAD patients in the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, fatty acid-related subtypes were identified and prognostic models were constructed; moreover, and the role of fatty acid metabolism-related genes (FAMGs) in LUAD was explored.

2 Methods

2.1 Data download

The transcriptome data and clinical data of LUAD patients were downloaded from the TCGA database (https://portal.gdc.cancer.gov/) (541 LUAD patients and 59 normal controls) and the GEO database (https://www.ncbi.nlm.nih.gov/geo/) (GSE50081, 181 LUAD patients). We integrated transcriptome data and clinical data for survival and prognosis analysis. In the subsequent subtyping analysis process, all populations were divided into a training set and a validation set at a ratio of 1:1. A total of 309 FAMGs were collected from the literature for subsequent subtyping and prognosis analyses (Supplementary Table 1).

2.2 Prognostic genes and typing analysis

The “limma” package was used to extract the expression of fatty acid-related genes and screen differentially expressed genes (|LogFC|> 1, p < 0.05). According to the expression of the target gene, the patients were divided into high-risk and low-risk groups, and the survival differences between the two groups were analyzed. Survival analysis was performed using the “survival” and “survminer” packages. Genes with differences in survival between high-risk and low-risk groups were prognostic genes. Correlation tests were performed on prognostic genes, and co-expression analysis of prognostic genes was performed using the ‘igraph’, ‘psych’, ‘reshape2’ and ‘RcolorBrewer’ packages to establish a prognostic network diagram (p < 0.01).

Cluster analysis of LUAD patients was performed based on the expression of fatty acid-related genes. The ‘ConsensusClusterPlus’ package was used to categorize LUAD patients during the analysis process. The optimal number of clusters (k) was selected according to the corresponding algorithm. The best clustering results were comprehensively evaluated using delta area plot, tracling plot, consensus cumulative distribution plot (CDF plot), and PCA plot. After the classification was completed, survival analysis was performed on all subtypes to compare whether there was a difference in prognosis between each subtype. Heat maps were used to show the expression of fatty acid genes in the three categorizations. The “survival” and “survminer” packages were used to draw the survival curves of the categorizations, and the “pheatmap” package was used to draw the gene expression heat maps of the categorizations.

2.3 Enrichment analysis

To understand the functions and pathways associated with gene enrichment among different subtypes, we performed GSVA on the basis of the gene expression results. First, a score for each pathway is obtained from the gene expression data. The subtype and pathway scoring results were subsequently combined to obtain the active pathways between each subtype. Pathways were then differentially analyzed to identify significant differentially enriched pathways for each subtype. The 20 most significant pathways are shown in a heatmap. ssGSEA scores were determined according to the expression data, samples were annotated according to subtype for further analysis. We then analyzed whether there were differences in the degree of immune cell infiltration among the different subtypes. GSVA analysis and immune cell infiltration analysis were performed using the ‘GSEABase’ ‘GSVA’ package.

Differential gene expression analysis was performed on different subtypes to obtain differentially expressed genes between subtypes. The intersection of differentially expressed genes was then taken. GO function analysis and KEGG enrichment analysis were subsequently performed on these intersecting genes. The relationships between the expression of the intersecting genes and disease risk were analyzed through the DisGeNET database. The “limma” package was used to screen differentially expressed genes (|LogFC|> 1, p < 0.05). Then the intersection of differentially expressed genes was taken. GO functional analysis and KEGG enrichment analysis were performed on these intersection genes. GO and KEGG enrichment analysis were completed using ‘clusterProfiler’ ‘org,Hs,eg,db’ ‘enrichplot’. The association between intersection genes and diseases was analyzed using the DisGeNET database.

2.4 Establishment of the risk score model

To understand the relationship between these genes and prognosis, we constructed a LASSO regression model based on LASSO regression and gene expression levels and clinical data. The significant genes in the LASSO regression were subsequently selected according to the point with the slightest cross-validation error to construct the prognostic model. The data for constructing the risk score prognostic model are from the training group. Patients were then divided into high- and low-risk groups on the basis of the median of all risk scores in the training group. Patients in the test group were divided into high- and low-risk groups according to the median value of the training group. The risk score prognostic model was constructed using ‘survival’, ‘caret’, ‘glmnet’, and ‘survminer’.

We next analyzed whether there was a difference in the survival of patients between the high- and low-risk groups. ROC curves were used to evaluate the ability of the model to distinguish patients according to prognosis. According to the expression of LASSO, significant genes in the high- and low-risk groups were observed, and whether the patients changed with the risk score was the proportion of patients who died in the total population. The ROC curve plot was constructed using the ‘timeROC’ package.

Finally, the clinical information and risk score were combined to construct a nomogram for easy use by clinicians. The nomogram can predict patients’ 1-, 3-, and 5-year survival probability. The stability of the nomogram was assessed using DCA curves. Nomogram construction used the ‘survival’, ‘regplot’ and ‘rms’ packages.

2.5 Immunoassay

To explore the LUAD tumor microenvironment, we analyzed the proportions of of immune cells in each sample. Then, according to the content of immune cells and gene expression, we analyzed the correlations between the proportions of immune cells and the expression levels of key genes via LASSO regression; we also analyzed the correlations between the proportions of immune cells and the risk score. Finally, we analyzed the proportion of stem cells in each sample and the correlation of the index with the risk score. Immune cell infiltration analysis was performed using the ‘CIBERSORT’ package.

2.6 Cell line

A549 and 16HBE cells were purchased from the American Type Culture Collection (ATCC, USA). They were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (Gibco, USA), which contained 10% fetal bovine serum (FBS) (Gibco, USA). The culture conditions were 37 °C and 5% CO2.

2.7 Quantitative real-time PCR (qPCR)

We extracted total RNA from A549 and 16HBE cells using TRIpure Reagent (BioTeke, China) according to the manufacturer’s instructions. Total RNA was transcribed into cDNA using cDNA synthesis reagent (Yeasen, China). The qPCR operations were performed with a qPCR mix kit (Bimake, China) and a QuantStudio 3 real-time fluorescence quantitative PCR system (RT‒PCR‒Q3 system) (Applied Biosystems, USA). Fold changes were calculated using the ΔΔCt method, and ACTB mRNA was used as a reference. The primers used were purchased from Sangon Bioengineering Co., Ltd. (Shanghai). The primer sequences are shown in Supplement Table 2.

2.8 Western blotting

The cells were prepared according to the experimental design. Protein extraction, quantification, electrophoresis, membrane transfer, and antibody incubation were conducted following the published experimental protocols [19]. The following antibodies were used in the experiments: primary antibody (RPS4Y1, Proteintech; α-Tubulin, Proteintech) overnight and secondary antibody (rabbit, Proteintech).

2.9 Cell viability assay

The cells were inoculated in 96-well plates at 2000 cells per well and incubated in incubators for 0 h, 24 h, 48 h, or 72 h. A mixture of 110 μL of medium/CCK-8 (Selleck, USA) (10:1) was added to each well, and the OD values at 450 nm were detected after incubation for 2 h.

2.10 Colony formation assay

The cells were inoculated in 6-well plates at 1000 cells per well, shaken well, and incubated in an incubator until blatant cell clones were visible to the naked eye. The culture medium was removed, the cells were washed twice with PBS, fixed with paraformaldehyde (Xavier Biotechnology Co., Ltd., China) for 20 min, washed twice with PBS, and stained with crystal violet (Beyotime, China) for 15 min; then, the stain was removed and the cells were washed. After the wells were dried, pictures were taken, and ImageJ software was used for quantitative analysis.

2.11 Migration assay

After the cells were digested and centrifuged, 1000 cells/100 μL were inoculated into the upper chamber of the Transwell system, and 500 μL of medium containing 30% serum was added to the lower chamber. After 24 h, the upper chamber was removed, washed twice with PBS, fixed with paraformaldehyde (Xavier Biotechnology Co., Ltd., China) for 20 min, washed twice with PBS, stained with crystal violet (Beyotime, China) for 15 min, and then washed with water. The samples were dried at room temperature and photographed, and ImageJ software was used for quantitative analysis.

2.12 Flow apoptosis assay

Each group of cells was digested and collected, and the supernatant was removed via centrifugation at 1000 rpm for 5 min. With 1 mL of 1 × binding buffer suspension, 100 μL of the cell suspension was transferred to a 1.5 mL centrifuge tube. For each tube sample, 2 μL of FITC Annexin V dye (Beyotime, China) was added, and 1 μL of PI L dye (Beyotime, China) was added. The samples were then gently mixed, and incubated in the dark at room temperature for 15 min. Flow cytometry data were analyzed with FlowJo_V10 software.

2.13 Drug sensitivity analysis

To identify drugs likely to be effective for high- and low-risk patients and improve their outcomes, we performed drug sensitivity analysis on high- and low-risk patients. Boxplots are also used to show the drugs to which patients in the high- and low-risk groups are sensitive. Drug susceptibility analysis was done using the ‘pRRophetic’ package.

2.14 Statistical methods

All analyses in this study were performed using the R language. The “limma” package was used to extract the expression levels of FAMGs for differential analysis. The “survival” and “survminer” packages were used for survival analysis. The ‘graph’, ‘psych’, ‘reshape2’, and ‘RcolorBrewer’ packages were used to perform coexpression analysis of prognostic genes to establish a prognostic network map. LUAD patients were subtyped using the ‘ConsensusClusterPlus’ package. GSVA and immune cell infiltration analyses were performed using the ‘GSEABase’ ‘GSVA’ package. GO functional analysis was performed using ‘clusterProfiler’ ‘org, Hs, e.g., db’ ‘enrichplot’. To construct a prognostic risk score model, ‘survival’, ‘caret’, ‘glmnet’, and ‘survminer’ were used. ‘timeROC’ was used to construct a ROC curve. Nomograms were constructed using the ‘survival’ ‘regplot’ ‘rms’ package. Immune cell infiltration assays were performed using the ‘prgTME44.CIBERSORT. R’ package. Drug susceptibility analysis was performed using the ‘pRRophetic’ package. The Wilcoxon test was utilized to screen for differentially expressed genes (DEGs). The unsupervised clustering method known as Consensus Clustering was applied for cluster classification. Cox regression and Kaplan–Meier (K-M) analysis were employed for prognostic analysis. K-M analysis and the two-sided log-rank test were used to compare the overall survival (OS) of patients across different subtypes. The prognostic model was constructed through initial screening with univariate Cox regression, followed by modeling using Lasso regression. Univariate and multivariate Cox regression were utilized to assess the independent prognostic value of the model. The Mann–Whitney test was applied to compare immune cell infiltration and immune pathways between the two groups.

Unpaired Student’s t tests were used to compare differences between two groups, whereas one-way ANOVA was used to compare differences among multiple groups. GraphPad Prism 9.5 was used to analyze the results. Statistical significance was determined when the p value was less than 0.05, with the asterisk denoting the level of difference.

3 Results

3.1 Typing LUAD via prognosis-associated FAMGs

First, we screened 31 FAMGs associated with LUAD prognosis from the literature (the genes are listed in Table S1). These genes included 16 low-risk-related genes (CYP2U1, ALOX15, HMGCS2, ADH1 A, CEL, HSD17B3, ALDH2, LTC4S, ADH1B, INMT, HTGDS, CYP4B1, ACS1, ACADSB, GCDH, ALOXL) and 15 high-risk-related genes (ACOT7, LOXL3, PTGES, NBN, MIF, ELOVL6, VNN1, ALDOA, SMS, LDHA, ACOT11, YWHAH, PTGES3, HCCS, and PPARD). Among the high-risk genes, SMS, ALDOA, MIF, LDHA, NBN, ACOT7, and LOXE3 were strongly correlated with prognosis. There were positive coexpression relationships between LDHA and ALDOA, LDHA and SMS, and between ALDOA and MIF (Fig. 1). Furthermore, according to the expression levels of 31 FAMGs, the LUAD cohort was divided into a high-risk group and a low-risk group. We compared survival times between the two groups to determine the relationships between genes and prognosis (Fig. S1).

Fig. 1
figure 1

Prognosis network diagram of 31 fatty acid-related genes associated with LUAD prognosis. Each dot represents a fatty acid-related gene. The color of the right semicircle of the dot is related to risk, with red indicating high risk and blue indicating low risk. The larger the point, the stronger the correlation with prognosis. A line between two genes indicates that there is a co-expression relationship between the genes, a red line indicates a positive correlation and a blue line indicates a negative correlation

On the basis of the expression levels of 31 FAMGs, we used cluster analysis to divide LUAD into 3 subtypes (subtype A, subtype B, and subtype C). The clustering variable k was increased from 2 to 10, and finally, k = 3 was identified as the optimal number of clusters, at which point the correlation within the group was the highest (Fig. 2A–D). The results of the PCA revealed that the survival of patients with the 3 subtypes differed; this subtype system divided the LUAD cohort into 3 groups according to prognosis, and subtype distinguished prognosis better than traditional parameters (Fig. 2E). Subsequent survival analysis revealed that subtype B patients had a higher 10-year survival rate than patients with subtype A and subtype C disease (Fig. 2F). The heatmap shows the expression of 31 FAMGs in the 3 subtypes (Fig. 2G).

Fig. 2
figure 2

Subtyping of LUAD according to the expression of fatty acid-related genes. AE LUAD was divided into three subtypes according to the degree of expression of 31 fatty acid-related genes. E For the PCA analysis, green indicates samples of subtype A and samples of subtype B, and red indicates samples of subtype C. F The survival of the three subtypes is different. The abscissa represents time, and the ordinate represents the survival rate. G The expression of 31 fatty acid genes in the three subtypes. Red indicates up-regulated expression, and blue indicates down-regulated expression

3.2 Functional enrichment analysis of differentially expressed genes

The results of GSVA revealed that in the subtype A vs. the subtype B, the pathways of base-excision repair, homologous recombination, and the cell cycle were more active in the A subtype. In contrast, the fatty acid metabolism, tyrosine metabolism, and histidine metabolism pathways were more active in the subtype B (Fig. 3A). In general, the activity of each pathway, such as tryptophan metabolism and beta-alanine metabolism, was greater in subtype C than in subtype A (Fig. 3B). B Subtype vs. C subtype, asthma pathways, viral myocarditis, and Fc epsilon RI signaling were more active in subtype B. In contrast, the pathways of the cell cycle, phenylalanine metabolism, and glutathione metabolism were more active in subtype C (Fig. 3C). In general, the differentially expressed genes (DEGs) among the subtypes were enriched mainly in the fatty acid and amino acid metabolism pathways. In addition, we used ssGSEA to analyze the level of immune cell infiltration among the 3 subtypes. The infiltration level of MDScnA was the highest of the 3 subtypes, and the infiltration level of neutrophils was the lowest. Overall, the subtype B had greater immune cell infiltration than subtypes A and C (Fig. 3D).

Fig. 3
figure 3

GSVA enrichment analysis and immune cell infiltration analysis. In the A, B, and C heatmaps, the abscissa represents the sample, and the ordinate represents the pathway’s name. Red indicates high expression, and blue indicates low expression. A KEGG pathways of differential gene enrichment between A and B subtypes. B KEGG pathways of differential gene enrichment between A and C subtypes. C KEGG pathways of differential gene enrichment between B and C subtypes. D Immune cell infiltration analysis between the three assays. The abscissa indicates the type of immune cells, and the ordinate indicates the scoring of immune cells. ***p < 0.001, **p < 0.01, *p < 0.05

For the DEGs of the 3 subtypes, we used a Venn diagram to determine the intersection. We identified a total of 46 common DEGs (co-DEGs) (Fig. 4A). The results of the KEGG enrichment analysis revealed that the DEGs were enriched in 8 pathways, including olefinic compound metabolic processes and peptide metabolic processes (Fig. 4B). The results of the GO enrichment analysis revealed that the DEGs were enriched mainly in the olefinic compound metabolic process at the BP level, the DEGs were enriched mainly in the endocytic vesicle at the CC level, and the DEGs were enriched mainly in oxidoreductase activity at the MF level (Fig. 4C). In addition, we analyzed the associations between DEGs and diseases using the DisGeNET database and found that chronic sinusitis had the strongest correlation with the levels of the intersecting DEGs (Fig. 4D).

Fig. 4
figure 4

Enrichment analysis of differential genes among the three subtypes. There are 46 differential gene intersections among the three types. A KEGG analysis of differential genes through Metascape database. B GO analysis chart. C Analyze the association between intersection genes and diseases through the DisGeNET database

3.3 Establishment of the risk score model and survival curve analysis

On the basis of the 46 co-DEGs, we used LASSO-Cox regression to screen 5 key FAMGs to construct the risk score model (Fig. S2 A, B) with the following formula:

$$\begin{aligned} {\text{Risk score}} =\ & \left( { - 0.{11}*SCGB3A2\,{\text{exp}}.} \right) + \left( { - 0.0{8}*PGC\,{\text{exp}}.} \right) \\ & + \left( { - 0.{22}*ADH7\,{\text{exp}}.} \right) + \left( {0.0{6}*RPS4Y1\,{\text{exp}}.} \right) + \left( {0.0{8}*KRT6A\,{\text{exp}}.} \right) \\ \end{aligned}$$

After the model was constructed according to the median patient risk score, the cohort of LUAD patients was divided into high- and low-risk groups (Fig. S3B). Moreover, the differentially expressed FAMGs between the high- and low-risk groups were identified (Fig. S2 C). The risk heatmap shows the expression of key genes in the high- and low-risk groups. The expression of RPS4Y1 and KRT6 A, which are risk-conferring genes, gradually increased with increasing risk score, whereas the expression of SCGB3 A2, PGC, and ADH7, which are protective genes, gradually decreased with increasing risk score (Fig. S3 A). The risk score survival‒death plots revealed that as the patient’s risk score increased, the number of deaths also increased (Fig. S3 C). The analysis results of the model were also verified in the training set cohort (Fig. S3D, E, F) and test set cohort (Fig. S3G, H, I).

In all populations, training sets, and test sets, the results revealed differences in survival time between the high- and low-risk score groups (p < 0.05), and the survival time of the low-risk group was longer. Accordingly, the model distinguished between high-risk patients and low-risk patients better than traditional parameters (Fig. 5A–C). Time-dependent ROC curve analysis was used to assess the specificity and sensitivity of the prognostic models. The 1-, 3-, and 5-year AUCs of the entire cohort were 0.662, 0.637, and 0.642, respectively (Fig. 5D); the 1-, 3- and 5-year AUCs of the training cohort were 0.685, 0.660 and 0.668, respectively (Fig. 5E); and the 1-, 3- and 5-year AUCs of the test cohort were 0.641, 0.633 and 0.614, respectively (Fig. 5F). The prognostic efficacy of the model was verified.

Fig. 5
figure 5

Survival curve and ROC curve of risk score predicting prognosis. AC Survival analysis of risk score in all populations, train groups, and test groups of LUAD patients. The abscissa represents time, and the ordinate represents the survival rate. p < 0.05 indicates that the survival of high and low-risk groups differs, indicating that the model can distinguish patients in high and low-risk groups. DF ROC curve of risk score predicting 1-, 3-, and 5-year survival of LUAD patients. The abscissa represents the false positive rate, represented by ‘1-specificity’. The vertical axis represents the valid positive rate, expressed as ‘sensitivity’

3.4 Nomogram based on the risk score

Next, we constructed a nomogram based on the risk score to predict the likelihood of certain outcomes in LUAD patients. Because the number of patients with N3 stage disease was too small, the results may have been affected. Therefore, the constructed nomogram eliminates the N3 stage. As shown in Fig. 6A, the red dots in the nomogram represent the predictors (gender, age, stage, risk score) and the predictive value of the outcome event for one of the patients. The individual scores of each predictor are marked with red dots on the top “Points” scale line. The individual scores are as follows: female: 26 points; age: 23 points; M1 stage: 65 points; stage I: 22; low-risk score: 0 points; N2 stage: 54 points; and T2 stage: 46 points. The total score is marked on the “Total points” scale line. The total score is 26 + 23 + 65 + 22 + 0 + 54 + 46.5 = 236. After the total score is obtained, the red arrows correspond to the survival rates on the 1-, 3-, and 5-year scale lines. The 1-year survival rate was 0.775, the 3-year survival rate was 0.441, and the 5-year survival rate was 0.187. The results of the calibration curves show that the actual curves predicted at 1, 3, and 5 years are relatively close to the ideal curves, indicating that the deviation between the model prediction results and actual results is small and that the model effect is favorable (Fig. 6B).

Fig. 6
figure 6

A The nomogram depicts a predictive model that incorporates clinical characteristics (such as gender, age, TNM, and stage) and risk stratification (high/low). Each variable is assigned a corresponding point value, as indicated by the red dotted alignment line (e.g., low risk equals 0 points; T2 stage equals 47 points). The bottom axes show the predicted probabilities of 1-year (0.775), 3-year (0.411), and 5-year (0.187) survival for a representative patient with a total score of approximately 236 points, illustrating the model’s clinical utility in prognosticating long-term outcomes. B The calibration plot assesses the agreement between predicted and observed survival outcomes, with distinct colored lines representing 1-year, 3-year, and 5-year survival rates respectively (for instance, the red line signifies the 5-year survival rate in this example). The x-axis presents model-predicted survival probabilities, and the y-axis illustrates the actual observed survival rates. The extent to which each time-specific curve aligns with the 45-degree reference line (the ideal prediction line) indicates the model's calibration accuracy—the closer a curve is to the reference line, the more precise the predictions

3.5 Immune cell correlation analysis

To understand the tumor immune microenvironment (TIME), we subsequently analyzed the correlation between five key FAMGs and immune cells. Overall, KRT6 A, SCGB3 A2, and PGC are more related to immune cells; the levels of all three genes are significantly correlated with the proportions of multiple immune cells. The strongest positive correlation was between SCGB3 A2 and resting mast cells (p < 0.001), and the strongest negative correlation was between SCGB3 A2 and M0 macrophages (p < 0.001) (Fig. 7). These results indicated that the risk score was significantly correlated with immune cells. These findings may also indicate that these key FAMGs affect the immune microenvironment of AML patients. We also investigated the correlation of the proportions of the top 15 immune cell types with the risk score; the proportions of 7 immune cell types (M0 macrophages; M1 macrophages; activated mast cells; CD4 + T cells; resting NK cells; neutrophils; and CD8 + T cells) were positively correlated with the risk score (p < 0.05), and those of 8 immune cell types (dendritic cells activated; resting dendritic cells; native B cells; monocytes; resting mast cells; resting CD4 + memory T cells; resting T cells; gamma delta T cells; and focal helper T cells) were negatively correlated with the risk score (p < 0.05) (Fig. S4 A‒O). In addition, we further analyzed the number of stem cells in each sample and explored the correlation between the stem cell index and the risk score. The risk score was positively correlated with the stem cell index (R = 0.24, p < 0.05). The higher the risk score was, the greater the stem cell content was (Fig. S4P). These findings may help predict the response to immunotherapy targeting stem cells and determine which patients will respond to such treatments.

Fig. 7
figure 7

Correlation of five key genes with 22 types of immune cells. Red indicates a positive correlation, and blue indicates a negative correlation. *** p < 0.001, ** p < 0.01, * p < 0.05

3.6 Verification of the expression levels of 5 key FAMGs

We performed RT‒qPCR analysis of A549 and 16HBE cells. The results revealed that RPS4Y1 expression was markedly greater in a lung cancer cell line (A549) than in a normal cell line (16HBE), and the expression of KRT6 A, SCGB3 A2, ADH7, and PGC was significantly lower in a lung cancer cell line (A549) (Fig. 8A). Western blotting also verified that RPS4Y1 was highly expressed in tumor cells (Fig. 8B). We subsequently knocked down RPS4Y1 for relevant experimental verification (Fig. 8C, D). The results revealed that RPS4Y1 knockdown significantly inhibited the growth rate of A549 cells (Fig. 8E), reduced colony formation (Fig. 8F), and weakened their ability to migrate (Fig. 8G). Moreover, the flow cytometry results revealed that knocking down RPS4Y1 expression increased the apoptosis of A549 cells (Fig. 8H). These results suggest that RPS4Y1 may play an important role in the development of LUAD.

Fig. 8
figure 8

A Validation of mRNA expression levels of signature genes in lung cancer cell lines. The expression of KRT6 A, RPS4Y1, SCGB3 A2, ADH7, and PGC were detected by Real-Time PCR in lung cancer cell lines. B The protein levels of RPS4Y1 in 16HBE and A549 cells were detected using western blotting. C, D The mRNA and protein levels of RPS4Y1 in A549 cells were detected using qRT-PCR or western blotting after shRPS4Y1. E CCK8 was used to detect the proliferation of A549 cells after shRPS4Y1. F A colony formation assay of A549 cells after shRPS4Y1. G The migration cell numbers of A459 after shRPS4Y1. H Flow cytometry analyses detected the apoptosis rate of A549 cells after shRPS4Y1. All data are shown as mean ± standard deviation (SD). *p < 0.05, **p < 0.01, *** p < 0.001, **** p < 0.0001vs CON group

3.7 Drug sensitivity analysis

Methods for predicting drugs likely to be effective to high- and low-risk patients are needed to improve patient outcomes and facilitated personalized treatment. We predicted drugs likely to be effective for two groups of patients and screened 6 drugs from them. The results revealed that the three drugs likely to be effective for high-risk patients were AKT inhibitors: lapatinib and nutlin.3a (Supplementary Fig. 5 A, B, C); the three drugs likely to be effective for low-risk patients were bortezomib, bosutinib, and camptothecin (Fig. S5D, E, F).

4 Discussion

In this study, based on 31 FAMGs, cluster analysis was used to divide the LUAD cohort into 3 subtypes. Survival differed among patients with different subtypes. The subtype DEGs were enriched mainly in the amino acid metabolism pathway. A total of 46 co-DEGs were subsequently screened on the basis of the DEGs of the three subtypes. On this basis, five key FAMGs were screened by LASSO-Cox regression to construct a risk score model. The risk score can distinguish high-risk patients from low-risk patients better than traditional parameters, and low-risk patients survive longer than high-risk patients. The ROC curve and nomogram verified the prognostic efficacy of the risk score. In addition, we found a significant correlation between the risk score and the proportions of immune cells. These findings indicate that expression of 5 key FAMGs is correlated with TIME features to some extent. Stem cell index analysis revealed that the expression of FAMGs, the active state of fatty acid metabolism, was related to the intensity of the TIME.

Risk scoring models have a wide potential for clinical application. Depending on risk stratification, patients can be treated with different care strategies and dynamic monitoring, such as adjuvant chemotherapy tailored to breast cancer based on the Oncotype DX® recurrence score. However, the application of risk scoring models also requires more efficient allocation of resources to enable close monitoring of high-risk patients. In addition, the risk score can be used for early prevention, such as identifying lifestyle interventions in people with prediabetes to delay the onset of diabetes. Risk scoring models also play an important role in guiding personalized treatment. For example, in early-stage breast cancer, a lower Oncotype DX score may save patients from unnecessary chemotherapy [20]. Genetic markers in risk models also predict drug efficacy (e.g., HLA-B*57:01 screening to avoid abacavir hypersensitivity in HIV [21]), or to guide drug dosage (e.g., DOACs in atrial fibrillation patients with CKD [22]). However, the specific application of the risk score model requires more trials in high-risk patients and more data to improve the validity of the model’s prediction. Our experimental validation confirms the functional roles of model genes: knockdown of RPS4Y1 suppressed tumorigenesis, while downregulation of SCGB3 A2 in LUAD cells underscores its protective role. The risk score correlates with immune infiltration (e.g., Macrophages M0/M1) and stemness, further supporting its utility in guiding immunotherapy. For example, high-risk patients may benefit from combined STAT3/MYC inhibitors and PD-1 blockade, whereas low-risk patients could respond to metabolic modulators (e.g., PPARγ agonists) or proteasome inhibitors. Future work should focus on prospective validation of these stratified approaches.

Tumor classification is closely related to prognosis. Different gene expression profiles can reflect different molecular subtypes of LUAD, and the survival and prognosis of these subtypes also differ [23, 24]. One group constructed a prognostic model of LUAD dependent on N6-methyladenosine (m6 A)-related long noncoding RNAs (lncRNAs). Unlike our study, this study revealed m6 A-related lncRNAs through coexpression [25]. In addition, another group constructed a prognostic model of LUAD on the basis of metabolism-related genes (MRGs). In six lung cancer cell lines, the expression of proteins and mRNAs of key MRGs was verified [26]. The above studies have shown that the features and expression profiles of LUAD patients different among different LUAD subtypes. LUAD subtype can also be used to predict patients’ prognoses and provide more accurate guidance for clinical treatment.

This study identified five key prognostic genes, including RPS4Y1 and KRT6 A, which are risk-conferring genes, and SCGB3 A2, PGC, and ADH7, which are protective genes. Previous studies have shown that keratin 6 A (KRT6 A) is associated with various cancers, such as lung cancer [27], bladder cancer [28] and colon cancer [29]. KRT6 A can regulate glucose-6-phosphate dehydrogenase (G6PD) through c-MYC/MYCN to promote lung cancer cell proliferation and invasion [27]. Elevated KRT6 A expression can be a risk factor for predicting the clinical outcome of patients with NSCLC [27]. Studies have shown that high RPS4Y1 expression is associated with the regulation of the p38 MAPK and STAT3 signaling pathways [30]. In this study, we found a significant correlation with LUAD prognosis. Secretoglobin family 3 A member 2 (SCGB3 A2) is a multifunctional cytokine-like molecule highly expressed in airway club cells [31]. It can improve allergic airway inflammation and pulmonary fibrosis and promote bronchial branching and proliferation during lung development [31, 32]. Its expression levels are sequentially increased in the plasma of healthy subjects, patients with pneumonia, patients with stage I-II adenocarcinoma, and patients with stage III-IV adenocarcinoma [33]. SCGB3 A2 can also participate in pyroptosis induction and exert anticancer effects on lung and colon cancer cell lines. The results obtained by TCGA data analysis revealed that LUAD patients with SCGB3 A2 upregulation had a better survival rate [34, 35]. Similarly, this study revealed that SCGB3 A2 is a protective gene in terms of LUAD prognosis, and its expression level gradually decreased with increasing risk score in the prognostic model. The full name of PGC is the peroxisome proliferator-activated receptor γ coactivator. Current research focuses on the PGC-1α transcription factor. PGC-1α can upregulate the expression of multiple genes involved in the tricarboxylic acid cycle and the mitochondrial fatty acid oxidation pathway, thereby regulating lipid metabolism and long-chain fatty acid oxidation, and is a key metabolic regulator of many critical organs [36]. Many studies have shown that PGC-1α-induced mitochondrial biosynthesis is critical in tumorigenesis and migration/invasion [37, 38]. In different tumors, there is heterogeneity in the expression of PGC-1α [38,39,40,41]. Low PGC-1α levels are associated with breast cancer and a poor prognosis [42]. Alcohol dehydrogenase 7 (ADH7) is a member of the alcohol dehydrogenase family. In the process of exploring the molecular mechanism between miR-136-5p and lung squamous cell cancer (LUSC), one group found that miR-136-5p is significantly downregulated in LUSC patients can target UGT1 A7 and ADH7 and participate in ascorbic acid and aldonic acid metabolism, pentose and glucuronic acid interconversion and retinol metabolism. High expression of UGT1 A7 and ADH7 was associated with better overall survival and disease-free survival in LUSC patients [43]. In addition, another study revealed that ADH7 was enriched in the retinoic acid metabolism and retinol metabolism pathways. It was significantly correlated with the cervical cancer survival rate. Its expression level gradually increases with patient age [44]. The above studies revealed that these five fatty acid metabolism genes may be closely related to LUAD prognosis.

The enrichment analysis revealed that the DEGs among the subtypes were enriched mainly in the fatty acid and amino acid metabolism pathways. Overall, compared with subtypes A and C, subtype B had a greater level of immune cell infiltration. Survival analysis revealed that the subtype B had the highest survival rate. These findings indicate that the high survival rate may be related to high immune cell infiltration levels. The survival analysis results of the high-risk and low-risk groups distinguished by the model in this study demonstrated that the low-risk group survived longer; the stem cell index analysis revealed that the stem cell content was positively correlated with the risk score. Therefore, the group with a high risk score may have a lower survival rate but may be more responsive to immunotherapy targeting stem cells. Li J et al. constructed a prognostic model for LUAD patients on the basis of lipid metabolism-related genes and reported that the high-risk score group had poorer OS and that DEGs were significantly enriched in glycerophospholipid metabolism, fatty acid metabolic processes, and eicosanoid signaling [45]. Tang B et al. constructed a prognostic model for LUAD patients using metabolism-related genes (MRGs) to predict the response to LUAD immunotherapy, and the survival rate of the MRG-high subgroup distinguished by the model was lower than that of the MRG-low subgroup. The MRG-high subgroup showed high immune cell infiltration and a better response to immune checkpoint inhibitors [46]. However, one group developed a risk score based on tumor microenvironment (TME)-related genes to predict the prognosis of LUAD patients; the results revealed that high TME risk was negatively correlated with OS and immune cell abundance. A low TME risk score may be associated with longer OS and a better response to immunotherapy [47]. These studies indicate that FAMGs may regulate LUAD metabolism, affecting immune cell infiltration and the tumor response to immunotherapy. The drugs used for drug sensitivity analysis are mainly kinase, proteasome, and topoisomerase inhibitors, most of which have been applied to lung cancer and other cancers [48,49,50,51]. These findings suggest that these drugs may affect tumor prognosis by affecting tumor fatty acid metabolism.

Fatty acid metabolism can participate in cancer metabolic reprogramming and promote tumor occurrence, development, and drug resistance [52]. In LUAD, colon cancer-associated transcript 1 (CCAT1) promotes tumor cell proliferation and angiogenesis by regulating fatty acid metabolism [53]. Fatty acid binding protein 5 (FABP5) is a regulatory hub for lipid metabolism and tumor progression in LUAD. Its high expression is associated with poor prognosis in LUAD patients [54]. Anlotinib is a vascular endothelial growth factor receptor (VEGFR) inhibitor widely used in advanced lung cancer. Anlotinib can significantly reduce the expression of proteins related to lipid metabolism, inhibiting LUAD [55]. In addition, studies have shown that abnormalities in the TME and nutrient competition signaling pathways can lead to phenotypic reprogramming of fatty acid metabolism and functional changes in tumor-infiltrating immune cells, thereby affecting the therapeutic effects of cancer immunotherapy [56]. T cells are a vital part of the antitumor immune response, and fatty acid metabolism is important for the proliferation and cell fate of T cells in the TME. In gastric adenocarcinoma, CD8 + tissue-resident memory T cells (Trm) rely on fatty acid oxidation (FAO) to generate energy, and when deprived of fatty acids, this leads to Trm cell death. Moreover, the content of Trm cells in patients with metastatic cancer is low. A high proportion of Trm cells is related to a better prognosis for gastric adenocarcinoma [57]. Tumor-associated macrophages (TAMs) can be divided into two subpopulations (M1 and M2) and exhibit distinct metabolic patterns that maintain their functions during tumorigenesis and progression. M1 macrophages kill tumor cells through direct cytotoxicity and antibody-dependent cell-mediated cytotoxicity (ADCC), while M2 macrophages inhibit antitumor immunity and promote tumor cell proliferation and metastasis [58]. A study revealed that TAMs in human and mouse tumor tissues are rich in high levels of lipids. Blocking CD36, knocking out CD36, inhibiting fatty acid oxidation in macrophages, or inhibiting lipid uptake can prevent TAM production, thereby eliminating their tumor-promoting functions in vitro and in vivo [59]. Dendritic cells (DCs) are critical for initiating and maintaining immune responses in cancer, and fatty acid metabolic reprogramming affects DC function. The elevated FAO and abnormal accumulation of fatty acids induced by tumor cells in the TME impair the antigen presentation ability of conventional DCs (cDCs), thereby preventing cDCs from activating T cells to induce antitumor responses [56]. NK cells have powerful anti-infection and antitumor functions and are the key cells involved in tumor immunity monitoring. However, the accumulation of excess fatty acids in NK cells inhibits their cytotoxic function [60]. These findings indicate that fatty acid metabolism is inextricably linked with immune cells and the tumor immune microenvironment. Our risk score was positively correlated with the proportions of immune cells. This finding also explains, to a certain extent, why the active state of fatty acid metabolism is related to the phenotype of the immune microenvironment.

The main shortcomings of this study include the following three points: (1) The sample size of the datasets are small, and more samples need to be added to include more groups of people and races. (2) There was a ack of experimental verification, and the internal mechanism of key genes predicting LUAD patient prognosis is still unclear. (3) Some studies have shown that fatty acid genes are also related to the prognosis of many other diseases. However, the model has not yet been extrapolated to other diseases.

5 Conclusion

In this study, LUAD patients were classified into different subtype groups on the basis of the FAMG expression data. These subtypes have different biological characteristics, molecular mechanisms, and clinical manifestations and are significant indicators of prognosis and treatment response. Key FAMGs were also screened in this study and used to construct a prognostic model that can better group patients according to high or low risk better than traditional parameters; thus, this model plays a positive role in LUAD subtyping and prognosis evaluation.