A novel composite model for distinguishing benign and malignant pulmonary nodules

Zhang, Lei; Xu, Yanhui; Lou, Qinqin; Chen, Fangfang; Li, Fang; Chai, Kun; Gao, Junshun; Tong, Mingjie; Ma, Yan; Xia, Lilong; Zhao, Kaixiang; Gao, Junli; Zhu, Xinhai

doi:10.1007/s10238-025-01672-5

A novel composite model for distinguishing benign and malignant pulmonary nodules

Research
Open access
Published: 14 May 2025

Volume 25, article number 159, (2025)
Cite this article

Download PDF

You have full access to this open access article

Clinical and Experimental Medicine Aims and scope Submit manuscript

A novel composite model for distinguishing benign and malignant pulmonary nodules

Download PDF

Lei Zhang¹,
Yanhui Xu¹,
Qinqin Lou²,
Fangfang Chen²,
Fang Li²,
Kun Chai²,
Junshun Gao²,
Mingjie Tong²,
Yan Ma¹,
Lilong Xia¹,
Kaixiang Zhao¹,
Junli Gao² &
…
Xinhai Zhu¹

555 Accesses
Explore all metrics

Abstract

Previous studies have demonstrated that a four-protein marker panel (4MP), consisting of Pro-SFTPB, CA125, Cyfra21-1, and CEA could be used to identify benign and malignant lung nodules. This study aims to improve the 4MP’s performance by combining clinical characteristics and low-dose chest computed tomography (LDCT) screening features. This study involved 380 patients with pulmonary nodules, diagnosing 91 benign and 289 early-stage lung cancer via postoperative histopathology. Serum levels of Pro-SFTPB, CA125, Cyfra21-1, and CEA were assessed using an immunofluorescence assay. Clinical features were selected using the LassoCV method. A new diagnostic model was developed using logistic regression, incorporating 4MP, clinical characteristics, and LDCT features. The model’s diagnostic performance was compared to the lung cancer biomarker panel (LCBP) nodule risk model, and evaluated through sensitivity, specificity, and the AUC value. The AUC values for distinguishing between benign and malignant pulmonary nodules were 0.612 for the 4MP model. We screened out 7 factors of patient clinical information and CT features of nodules. The composite model (4MP + age + gender + BMI + family history of cancer + nodule size + nodule margin + nodule density) achieved an AUC of 0.808, especially for small nodules (AUC = 0.835 for nodules ≤ 6 mm). Furthermore, within the same validation cohort, the performance of the composite model (AUC = 0.680) surpassed that of the LCBP nodule risk model (AUC = 0.599). The novel composite model accurately diagnoses malignant pulmonary nodules, especially small ones, helping to stratify patients by lung cancer risk.

Development and validation of a risk model with variables related to non-small cell lung cancer in patients with pulmonary nodules: a retrospective study

Article Open access 18 September 2023

Construction and validation of a predictive model of invasive adenocarcinoma in pure ground-glass nodules less than 2 cm in diameter

Article Open access 14 February 2024

Development of a combined radiomics and CT feature-based model for differentiating malignant from benign subcentimeter solid pulmonary nodules

Article Open access 17 January 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Lung cancer is the second most prevalent cancer worldwide and is a leading cause of cancer-related mortality [1]. Statistical data demonstrate that lung cancer presents the highest morbidity and mortality rates in China, with annual fatalities exceeding 600,000 individuals [2, 3]. The prognosis of lung cancer treatment is predominantly influenced by the stage at which the disease is diagnosed [4, 5]. For patients diagnosed with early-stage (stage IA) non-small cell lung cancer (NSCLC) who are candidates for surgical resection followed by adjuvant therapy, the 5-year survival rate ranges from 80 to 93% [6]. However, for patients diagnosed with advanced stage IV lung cancer, the 1-year survival rate persists at below 20%, underscoring the crucial importance of early diagnosis in enhancing prognosis. [6]. Regrettably, merely approximately 16% of patients receive a diagnosis during the early stages of lung cancer [4]. Therefore, enhancing the diagnostic rate of early-stage lung cancer is crucial for the effective and curative treatment of these patients.

Chest computed tomography (CT) screening plays a significant role in the early detection of lung cancer [7]. Previous research from two significant randomized controlled trials, specifically the Nederlands-Leuvens Longkanker Screenings ONderzoek Trial (NELSON) and the National Lung Screening Trial (NLST), has shown that lung cancer screening utilizing low-dose computed tomography (LDCT) is linked to a decrease in mortality rates. [8, 9]. A study conducted in China demonstrated that screening with LDCT resulted in a 31% reduction in lung cancer mortality [10]. With the extensive implementation of CT screening, pulmonary nodules have increasingly been identified as incidental findings [11]. Distinguishing between benign and malignant pulmonary nodules is a crucial objective in the management of patients presenting with these findings [12, 13]. Despite the increasing use of LDCT for lung cancer screening, its efficacy is compromised by a notably high false-positive rate. According to findings from the NLST, only 3.8% of positive results are ultimately confirmed as lung cancer [8]. The differentiation between benign and malignant solid pulmonary nodules is typically based on an analysis of clinical data, CT findings, and tumor biomarker levels specific to each patient [14]. For instance, the Mayo model incorporates three lung nodule features (spiculation, diameter, and upper lobe location) and three clinical characteristics (cigarette-smoking status, age, and history of cancer) [15]. In recent years, studies have suggested that blood tumor biomarkers may have a significant role in the management of patients with indeterminate lung nodules [16]. Yang et al. evaluated the lung cancer biomarker panel (LCBP) within a Chinese cohort and subsequently developed the LCBP nodule risk model, which incorporated patients’ clinical characteristics (age, sex, smoke status), CT features of the nodule (diameter, spiculation), and blood biomarkers pro-gastrin-releasing peptide (Pro-GRP), squamous cell carcinoma antigen (SCC), cytokeratin-19 fragment (Cyfra21-1), carcinoembryonic antigen (CEA) [17]. Hou et al. demonstrated that CEA, CYFRA21-1, and CT radiological scores serve as significant predictors of malignant lung nodules [18]. Previous studies have indicated that a panel of four marker proteins (4MP), comprising cancer antigen 125 (CA125), CEA, the precursor form of surfactant protein B (Pro-SFTPB), and Cyfra21-1, significantly enhances the efficacy of lung cancer risk assessment [19]. In a prior single-center study, our findings indicated that 4MP exhibited exceptional efficacy in distinguishing between benign and malignant nodules [20]. Nevertheless, the specific role of 4MP in differentiating early-stage lung cancer from benign nodules among Chinese patients necessitates additional validation. Consequently, we incorporated a new study cohort to further substantiate the function of 4MP in distinguishing early-stage lung cancer from benign nodules within a Chinese population.

In this study, we examined the role of 4MP detection, clinical characteristics, and CT features of nodules in differentiating benign lung nodules from early-stage lung cancer. To improve the efficacy of the 4MP model, we devised an innovative composite model that integrates 4MP, clinical characteristics, and CT features for the differential diagnosis of benign and malignant lung nodules in Chinese patients. The findings from this research provide a more robust basis for the early diagnosis of early-stage lung cancer within the Chinese population.

Materials and methods

Study subjects

A cohort of 380 patients, who presented at Zhejiang Hospital between March 2021 and April 2024 with an initial diagnosis of pulmonary nodules, was selected for this study. The nodules were classified as either benign or malignant based on postoperative pathological findings. All patients in the cohort were required to meet the following criteria: (a) absence of extrathoracic malignant tumors; and (b) no history of chemotherapy or radiotherapy treatment within the 6 months preceding the study. Clinical data, including age, gender, body mass index (BMI), smoking history, alcohol consumption history, personal and family history of cancer, and tumor grade, were collected from all participants.

The study design is depicted in Fig. 1. A total of 380 patients were randomized and allocated into a training cohort (n = 228) and a validation cohort (n = 152) in a 6:4 ratio. In the training cohort, 55 patients with benign nodules and 173 patients with lung cancer were included. Similarly, the validation cohort comprised 36 patients with benign nodules and 116 patients with lung cancer. This work conformed to the ethical guidelines of the Declaration of Helsinki and has been approved by the Medical Ethics Committee of Zhejiang Hospital (approval No. 2021-141 K).

Nodule assessment and histological diagnosis

Since LDCT provides a better evaluation of the morphologic features of pulmonary nodules, all patients underwent LDCT examination using the multiple contiguous sequential axial imaging procedure through the thorax. The PneuView system (Myrian, Paris, France) was used to analyze the features of nodules. Three radiologists specializing in thoracic imaging conducted a retrospective analysis of all CT characteristics of nodules, including size, number, margin, density, and shape, ultimately reaching a consensus. The postoperative histopathological examination serves as the gold standard for diagnosing both benign and malignant lesions. Histological assessments were conducted by a minimum of two independent histologists.

Detection of serum biomarker levels

Blood samples were collected from all patients when pulmonary nodules were detected on the first examination. Serum levels of CA125 (2K45.77, Abbott Laboratories), CEA (7K68.74, Abbott Laboratories), and Cyfra21-1 (2P55.74, Abbott Laboratories) were analyzed by the immunofluorescence assay on the ARCHITECT i2000SR platform (Abbott Laboratories). The serum Pro-SFTPB level was detected using the immunofluorescence Assay Kit (2024060601, Cosmos Wisdom) on the SMART 500S platform (KEYSMILE).

Statistical analysis

All statistical analyses were analyzed using the SPSS 26 software. Measures that conformed to normal distribution were expressed as the mean ± the standard deviation (SD), and comparisons between groups were made using the independent samples t-test; measures that were not normally distributed were expressed as M (P₂₅, P₇₅), and comparisons between groups were made using the Mann–Whitney U test. The χ² test was used to compare the count data between groups. Clinical features were selected utilizing the LassoCV method with a fivefold cross-validation approach. Subsequently, the Lasso regression model was employed to analyze the importance of these clinical features. Later, the receiver operating characteristic curve (ROC) was plotted by Medcalc 16.8.4 software to analyze the predictive performance of 4MP, clinical data, and CT imaging features in differentiating between benign and malignant lung nodules. Statistical significance was considered at P < 0.05.

Results

Subject characteristics

This study enrolled 55 benign nodule patients and 173 early-stage lung cancer patients in the training cohort. The patient’s clinical characteristics (age, gender, BMI, drinking history, smoking history, family and personal history of cancer) and CT features of nodules (size, number, margin, density, and shape) were detailed in Table 1. In the training cohort, we found significant differences in nodule size, shape, margin, and density between the two groups (P < 0.05). However, age, BMI, gender, smoking history, drinking history, personal and family history of cancer, and nodule number (A single pulmonary nodule is denoted by the 1, whereas multiple pulmonary nodules are represented by the ≥ 2) were not statistically significant in the benign nodule patients group and the early-stage lung cancer group patients (P > 0.05).

Table 1 The characteristics of patients in the training and validation cohort

Full size table

Analysis of serum biomarker levels

To investigate whether 4MP levels differed in the benign nodules and early-stage lung cancer patients, we analyzed the 4MP levels in the two groups. The level of CA125 had significant differences between the benign lung group and the early-stage lung cancer group in the training cohort (P < 0.05). We found that serum Pro-SFTPB, Cyfra21-1, and CEA levels were higher in the early-stage lung cancer group than in the benign lung group patients, but the differences were not significant (Fig. 2A–D).

Diagnosis performance of the 4MP detection

We evaluated the efficacy of serum 4MP detection in distinguishing between benign nodules and early-stage lung cancer using the training cohort, which yielded an AUC value of 0.612 (Fig. 3A). We also analyzed the diagnostic effects of clinical characteristics (age, gender, BMI, smoking history, drinking history, personal and family history of cancer) and CT features of nodules (size, number, margin, density, and shape) in differentiating benign nodules patients and early stage lung cancer, respectively (Fig. 3A). The analysis showed that the AUC values for clinical characteristics and CT features of nodules to identify benign nodule and early-stage lung cancer patients with pulmonary nodules were 0.628, and 0.726, respectively (Table 2). Interestingly, we found that clinical characteristics had high specificity (78.18%) and low sensitivity (48.55%). In comparison, CT features of the nodule and 4MP had high sensitivity (79.19%, 65.90%) and low specificity (58.18%, 54.55%) in identifying benign nodule and early-stage lung cancer patients (Table 2). Next, an independent validation cohort (n = 152) containing 116 benign nodule patients and 36 early-stage lung cancer patients was used to validate the efficacy of the 4MP, clinical characteristics, and CT features of nodules. In the validation cohort, we found that the AUC values for 4MP, CT features of nodules, and clinical characteristics to identify benign and malignant nodule patients were 0.686, 0.678, and 0.701, respectively (Fig. 3B, Table 2). These results suggest that patient clinical characteristics and CT features of nodules may increase the performance of serum 4MP detection.

Table 2 The discrimination performance of 4MP, clinical characteristics, CT features of nodule, and composite model in the training and validation cohort

Full size table

Construction of a new composite model

We investigated the diagnostic performance of 4MP combining clinical characteristics and CT features of nodule. LASSO regression analyses were conducted on the gathered clinical data and CT characteristics of the patients, resulting in the identification of 7 factors with non-zero coefficients within the training cohort. These factors include age, gender, BMI, family history of cancer, nodule size, nodule margin, and nodule density (Fig. 4A). Subsequently, Lasso regression was employed to conduct an analysis of variable importance. The variables, ranked in descending order of importance, were as follows: nodule density, nodule margin, gender, family history of cancer, nodule size, BMI, and age (Fig. 4B, Table 3). Based on the 4MP and 7 factors of pulmonary nodules, we constructed a new composite model (composite model = Pro-SFTPB + CA125 + CEA + Cyfra21-1 + age + gender + BMI + family history of cancer + nodule size + nodule margin + nodule density) and found that the AUC value of the new composite model was 0.808, sensitivity was 75.14%, and specificity was 74.55% (Fig. 4C, Table 2). The equation for calculating the probability of early-stage lung cancer was derived from logistic regression: logit(P) = 4.064 + 0.01*Pro-SFTPB − 0.103*CA125 − 0.053*CEA − 0.189*Cyfra21-1 − 0.025*age − 1.220*gender − 0.084*BMI + 0.713*family history of cancer + 0.121* nodule size + 1.269*nodule margin − 2.145*nodule density. Also, the AUC value of the new composite model was 0.714, the specificity was 44.44%, and the sensitivity was 87.07% in the validation cohort (Fig. 4D, Table 2). These results indicate that the new composite model has a high predictive performance in identifying early-stage lung cancer and benign lung nodules.

Table 3 Results of the importance analysis of the 7 factors

Full size table

Diagnosis performance of the new composite model

To further evaluate the diagnostic efficacy of the novel composite model in distinguishing between benign nodules and early-stage lung cancer, we conducted an analysis of the model’s performance across nodules of varying sizes (Fig. 5). In this study, the AUC of the new composite model was 0.734 for 42 controls and 178 cases with nodule sizes > 10 mm. For 49 controls and 153 cases with nodule sizes of ≤ 10 mm, the AUC was 0.756. Among 27 controls and 107 cases with nodule sizes of ≤ 8 mm, the AUC increased to 0.820. Furthermore, for 9 controls and 27 cases with nodule sizes of ≤ 6 mm, the model demonstrated an AUC of 0.835, with high sensitivity (70.37%) and specificity (88.89%) (Table 4). These findings suggest that the new composite model exhibits significantly enhanced diagnostic efficacy in detecting smaller nodules.

Table 4 The discrimination performance of composite model in lung nodules of different sizes

Full size table

In addition, we screened 154 patients (34 controls and 120 cases) from the subjects involved in this study for performance comparison between the new composite model and LCBP nodule risk models. In this population, the AUC of the new composite model was 0.680, and the AUC of the LCBP nodule risk models was 0.599 (Fig. 6). Furthermore, the specificity of the new composite model and the LCBP nodule risk model was 70.59% and 61.76%, respectively, while their sensitivity was 67.50% and 60.83%, respectively (Table 5). These findings suggest that the composite model outperformed the LCBP nodule risk model.

Table 5 The discrimination performance comparison between the composite model and LCBP nodule risk model

Full size table

Discussion

Although the response of patients to treatment has dramatically improved in recent years with the advent of precision therapies for lung cancer, such as immunotherapy and targeted therapy, the 5-year survival rate is still only 21% [21]. Early screening, diagnosis, and timely treatment are the keys to a good prognosis for lung cancer patients [22]. Clinically, the presence of small nodules in a patient’s lungs on CT imaging is critical in determining early lung cancer [23]. Nonetheless, not all lung nodules are malignant, and in a large-scale study of CT screening for lung cancer, it was found that about 49% of cancers screened for may be overdiagnosed [24, 25]. In recent years, the prevalence of pulmonary nodules identified through chest CT scans conducted during routine medical care has risen significantly. Consequently, the effective management of both incidental and screen-detected nodules has emerged as a critical public health concern [26]. Therefore, improving the ability to recognize and predict benign and malignant lung nodules is vital to treating lung cancer.

At present, the primary challenge in managing lung nodules lies in accurately identifying high-risk nodules with potential malignancy and stratifying patients into distinct risk categories to inform subsequent management strategies [26]. In the NELSON trial, it was observed that participants with nodules measuring < 5 mm in diameter exhibited a low probability of developing lung cancer [27, 28]. Conversely, those with nodules ranging from 5 to 10 mm demonstrated a moderate probability, while nodules ≥ 10 mm were associated with a significantly increased likelihood of lung cancer development [27, 28]. In accordance with the Fleischner guidelines, distinct management strategies are necessitated for solid and subsolid pulmonary nodules. Specifically, for solid pulmonary nodules > 8 mm in size, tissue sampling is advised. [29]. In Japan, the protocol for LDCT lung cancer screening advises follow-up evaluations at intervals of 3, 6, 12, 24, 36, 48, and 60 months for nodules with an overall mean diameter of < 15 mm and a solid component measuring < 8 mm in diameter [30]. The [Chinese Expert Consensus on the Diagnosis and Treatment of Pulmonary Nodules (2024)] delineates 18 consensus points, underscoring the critical importance of early diagnosis and intervention. It recommends specific screening ages for high-risk populations, clarifies the definition of lung nodules and the methodologies for their assessment, and advocates for the integration of artificial intelligence to enhance diagnostic accuracy [31]. Ye et al. proposed for the first time an adjustment of the criteria for a positive result in chest CT screening for pure ground-glass nodules of the lung in a Chinese population [32]. They suggested that the criteria should be raised from 6 to 8 mm and that only pure ground-glass nodules with a diameter of 8 mm and above should require management of lung nodules [32].

Liquid biopsy, as a non-invasive approach, has received widespread attention for its ease of repeated analysis and its ability to monitor tumor recurrence, metastasis, and response to treatment in real-time [33]. With the rapid development of molecular techniques, circulating tumor cells, circulating tumor DNA, circulating cell-free RNA, circulating cell-free DNA, and extracellular vesicles show potential clinical value in the diagnosis, treatment, and prognosis of lung cancer, but the low concentration of them in the blood results in low sensitivity of liquid biopsies [33]. Recently, Chen et al. developed an epigenetic biomarker model based on circulating ribosomes that is particularly effective in identifying high-risk lung nodules [34]. For the first time, MD Anderson researchers found that the 4MP was helpful for lung cancer risk prediction, with a higher AUC value for the 4MP + smoking model than the smoking-based risk prediction model (0.83 vs. 0.73) [19]. Afterward, researchers analyzed the performance of 4MP in distinguishing lung cancer from benign lung nodules [35]. The researchers found that the 4MP + nodules size model had a higher AUC (0.895) than the model based on nodule size alone (AUC was 0.860) or 4MP (AUC was 0.757), and in the independent validation cohort, the AUC of 4MP was 0.87 [35]. In the past two years, to step forward to determine the role of 4MP in lung cancer, researchers explored the lung cancer risk prediction performance of the 4MP + PLCO_m2012 model and found the 4MP can be used for lung cancer risk assessment, with AUC values of 0.80 for 4MP alone detection and 0.85 for the combined 4MP + PLCO_m2012 model for sera from cases collected within 1-year preceding diagnosis [36]. Moreover, Vykoukal et al. analyzed the predictive performance of 4MP in distinguishing lung cancer patients from controls and found an AUC value of 0.80 for 4MP, whereas the AUC value for 4MP + miR-210-3p + miR-320a-3p + miR-21-5p was 0.81 [37]. In 2024, MD Anderson Cancer Center researchers analyzed repeated measurements of 4MP in pre-diagnostic serum from 2483 ever-smoker participants [38]. They improved the performance of 4MP in the early detection of lung cancer using a parametric empirical Bayes algorithm [38]. However, most of these studies have investigated clinical diagnoses in Western populations. Lung cancer types, environmental factors, and genetic susceptibility are different between Western and Asian populations [39]. Yao et al. found that 4MP combined with SCC, neuron-specific enolase (NSE), and pro-gastrin-releasing peptide (Pro-GRP) better-distinguished lung cancer and lung disease, and lung cancer pathology types in Chinese patients [2]. In our previous study, we found that the 4MP significantly identified Chinese lung cancer patients from normal individuals [20]. The nodule risk model (4MP + nodule size) constructed by 4MP combined with nodule size has good potential in the benign-malignant differential diagnosis of lung nodules [20]. As our previous study was based on the results of a single-center study, we collected a new study cohort to validate the performance of 4MP in the differential diagnosis of benign nodule and early-stage lung cancer patients. In this research, our results showed that the AUC of 4MP in distinguishing early-stage lung cancer from Chinese benign lung nodule patients was 0.612 in the training cohort and 0.686 in the validation cohort. Therefore, we aim to improve the diagnostic performance of 4MP by combining the detection of other factors.

Clinical studies have demonstrated that benign or malignant lung nodules correlate with patients’ clinical characteristics and CT features of nodules. In the Mayo model incorporating patient age, history of cancer, cigarette-smoking status, spiculation, nodule diameter, and upper lobe location as predictors, the AUC value of the Mayo model was 0.833 [15]. However, researchers and clinicians have found that the Mayo model may not apply to Asians [40]. In the Brock model, age, gender, emphysema, family history of cancer, nodule size, total nodule number, solid nodule, spiculation parameters, and upper lobe involvement are used, and the Brock model with AUCs of at least 0.94 in an external validation cohort [41]. Previous research found that differences in disease prevalence and environmental factors may have led to the limited applicability of the Brock model in Asian populations, which had an AUC of 0.58 to 0.71 in the Chinese cohort [32]. In recent years, researchers have made several advances in the study of differential diagnosis and treatment of benign and malignant nodules. Miao et al. proposed a deep learning model combining CT images of lung nodules and intrathoracic fat images to differentiate between benign and malignant lung nodules, which significantly outperformed the model using CT images of lung nodules alone with an AUC of 0.910, 0.922, and 0.899 in the internal and external test cohorts, respectively [42]. Zhao et al. proposed the MAEMC-NET model based on self-supervised learning, which can effectively distinguish between benign and malignant isolated lung nodules by analyzing CT images of patients, and the AUC value of the model is 0.962 [43]. Meng et al. constructed a new risk stratification model cLung-RADS^®v2022 based on Lung-RADS^®v2022 and CT features for predicting invasive pure ground-glass nodules in China, which had an AUC value of 0.718 and 0.693 in the training and validation sets, respectively [44]. We analyzed the diagnostic effects of clinical characteristics (age, gender, BMI, drinking history, smoking history, family and personal history of cancer), CT features of nodules (size, number, margin, density, and shape) in Chinese benign nodules patients and early-stage lung cancer patients. We found that the AUC values for clinical characteristics and CT features of nodules were 0.628 and 0.726, respectively.

Related studies have shown that blood biomarkers combined with clinical characteristics can significantly improve the predictive performance of risk models for early malignant lung nodules. Xu et al. constructed a network diagnostic model consisting of seven autoantibodies (CAGE, PGP9.5, GAGE7, MAGEA1, SOX2, GUB4-5, and P53), clinical characteristics (age, cancer history, smoking history), and imaging features (nodules size, total nodule number, property of nodule, spiculation, lobulated sign, vessel sign, bubble-like sign, and pleural indentation) for the diagnosis of lung nodules, which had an AUC value of 0.96 [45]. In addition, Yang et al. developed the LCBP nodule risk model; in the training cohort, the AUC of the nodule risk model was 0.9151, but in the validation cohort, the AUC was only 0.5836 [17]. Hou et al. developed a predictive model based on CEA, CYFRA21-1, and CT features to differentiate between benign and malignant lung nodules, which achieved an AUC of 0.85 and 0.76 in the training and validation groups, respectively [18]. In this study, we screened out 7 factors of patient clinical information and CT features of nodule and developed a novel composite model that integrated 4MP, clinical characteristics (age, gender, BMI, family history of cancer), and CT features of nodule (nodule size, nodule margin, and nodule density). This study found that the novel composite model had a good predictive performance, with an AUC value of 0.808 in the training cohort and 0.714 in the validation cohort.

In lung cancer screening, based on primarily retrospective analyses of data from the International Early Lung Cancer Action Program and the NLST data, it is generally accepted that 6.0 mm is the threshold for positive results on the baseline scan [46]. It is important to note that this does not mean that cancers smaller than 6.0 mm cannot be detected on a baseline scan, it just means that they have a low incidence of malignancy [46]. The probability of malignancy is 1–2% for nodules 6–8 mm and less than 1% for all nodules smaller than 6 mm [12]. In the NLST, the lung cancer probability was 0.3% when the nodule diameter was 4–6 mm [47]. Texas MD Anderson Cancer Center studies have shown that in patients with nodule size ≤ 6 mm, the panel of nodule size + 4MP combinations performed exceptionally well, with an AUC of 0.95 [35]. Surprisingly, we found that the AUC values of the composite model were 0.820 and 0.835 in patients with ≤ 8 mm or ≤ 6 mm pulmonary nodules, respectively. Besides, the performance of the composite model (AUC = 0.680) was better than that of the LCBP nodule risk model (AUC = 0.599). These observations suggest that the new composite model has high performance in identifying benign lung nodules and early-stage lung cancer in Chinese patients and may show higher performance in smaller nodules. The new composite model is suitable for the adjunctive diagnosis of early-stage lung cancer patients, and when LDCT is used to screen people at high risk of lung cancer, patients with difficult-to-identify lung nodules can be further diagnosed by combining the patient’s clinical characteristics, biomarker levels, and CT features.

There are several limitations to this study. Firstly, although we analyzed the performance of the 4MP with patient data from different hospitals, continued multicentre studies are needed to comprehensively assess the applicability of the 4MP. Second, the new composite model relies on the characteristics of CT images, which may reduce the applicability of the model in certain resource-limited areas where patients may not have access to CT scans. Finally, there is a need to expand the sample size, especially for patients with lung nodules ≤ 6 mm, which will be the focus of our future studies.

Conclusions

In summary, we constructed a new composite model for differential diagnosis of benign nodule and early-stage lung cancer patients in Chinese patients, which can effectively diagnose malignant pulmonary nodules, particularly small ones, aiding in stratifying patients by lung cancer risk.

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

De Zuani M, Xue H, Park JS, Dentro SC, Seferbekova Z, Tessier J, et al. Single-cell and spatial transcriptomics analysis of non-small cell lung cancer. Nat Commun. 2024;15(1):4388.
Article PubMed PubMed Central Google Scholar
Yao L, Li Y, Wang Q, Chen T, Li J, Wang Y, et al. Multi-biomarkers panel in identifying benign and malignant lung diseases and pathological types of lung cancer. J Cancer. 2023;14(10):1904–12.
Article CAS PubMed PubMed Central Google Scholar
Wang HM, Zhang CY, Peng KC, Chen ZX, Su JW, Li YF, et al. Using patient-derived organoids to predict locally advanced or metastatic lung cancer tumor response: a real-world study. Cell Rep Med. 2023;4(2):100911.
Article CAS PubMed PubMed Central Google Scholar
Wang S, Meng F, Li M, Bao H, Chen X, Zhu M, et al. Multidimensional cell-free DNA fragmentomic assay for detection of early-stage lung cancer. Am J Respir Crit Care Med. 2023;207(9):1203–13.
Article CAS PubMed Google Scholar
Zarinshenas R, Amini A, Mambetsariev I, Abuali T, Fricke J, Ladbury C, Salgia R. Assessment of barriers and challenges to screening, diagnosis, and biomarker testing in early-stage lung cancer. Cancers. 2023;15(5):1595.
Article CAS PubMed PubMed Central Google Scholar
Reck M, Dettmer S, Kauczor HU, Kaaks R, Reinmuth N, Vogel-Claussen J. Lung cancer screening with low-dose computed tomography. Dtsch Arztebl Int. 2023;120(23):387–92.
PubMed PubMed Central Google Scholar
Liu J, Qi L, Wang Y, Li F, Chen J, Cui S, et al. Development of a combined radiomics and CT feature-based model for differentiating malignant from benign subcentimeter solid pulmonary nodules. Eur Radiol Exp. 2024;8(1):8.
Article PubMed PubMed Central Google Scholar
National Lung Screening Trial Research T, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, et al. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365(5):395–409.
Dawson Q. NELSON trial: reduced lung-cancer mortality with volume CT screening. Lancet Respir Med. 2020;8(3):236.
Article PubMed Google Scholar
Li N, Tan F, Chen W, Dai M, Wang F, Shen S, et al. One-off low-dose CT for lung cancer screening in China: a multicentre, population-based, prospective cohort study. Lancet Respir Med. 2022;10(4):378–91.
Article CAS PubMed Google Scholar
Liang W, Tao J, Cheng C, Sun H, Ye Z, Wu S, et al. A clinically effective model based on cell-free DNA methylation and low-dose CT for risk stratification of pulmonary nodules. Cell Rep Med. 2024;5(10):101750.
Article CAS PubMed PubMed Central Google Scholar
Mazzone PJ, Lam L. Evaluating the patient with a pulmonary nodule: a review. JAMA. 2022;327(3):264–73.
Article PubMed Google Scholar
He YT, Zhang YC, Shi GF, Wang Q, Xu Q, Liang D, et al. Risk factors for pulmonary nodules in north China: a prospective cohort study. Lung Cancer. 2018;120:122–9.
Article PubMed Google Scholar
Sun JX, Zhou XX, Yu YJ, Wei YM, Shi YB, Xu QS, et al. CT radiomics based model for differentiating malignant and benign small (</=20mm) solid pulmonary nodules. Front Oncol. 2025;15:1502932.
Article PubMed PubMed Central Google Scholar
Swensen SJ, Silverstein MD, Ilstrup DM, Schleck CD, Edell ES. The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules. Arch Intern Med. 1997;157(8):849–55.
Article CAS PubMed Google Scholar
Adams SJ, Stone E, Baldwin DR, Vliegenthart R, Lee P, Fintelmann FJ. Lung cancer screening. Lancet. 2023;401(10374):390–408.
Article PubMed Google Scholar
Yang D, Zhang X, Powell CA, Ni J, Wang B, Zhang J, et al. Probability of cancer in high-risk patients predicted by the protein-based lung cancer biomarker panel in China: LCBP study. Cancer. 2018;124(2):262–70.
Article CAS PubMed Google Scholar
Hou X, Wu M, Chen J, Zhang R, Wang Y, Zhang S, et al. Establishment and verification of a prediction model based on clinical characteristics and computed tomography radiomics parameters for distinguishing benign and malignant pulmonary nodules. J Thorac Dis. 2024;16(3):1984–95.
Article PubMed PubMed Central Google Scholar
Integrative Analysis of Lung Cancer E, Risk Consortium for Early Detection of Lung C, Guida F, Sun N, Bantis LE, Muller DC, et al. Assessment of Lung Cancer Risk on the Basis of a Biomarker Panel of Circulating Proteins. JAMA Oncol. 2018;4(10):e182078.
Lu Q, Jia Z, Gao J, Zheng M, Gao J, Tong M, et al. Auxiliary diagnosis of lung cancer on the basis of a serum protein biomarker panel. J Cancer. 2021;12(10):2835–43.
Article CAS PubMed PubMed Central Google Scholar
Yang CY, Lin YT, Lin LJ, Chang YH, Chen HY, Wang YP, et al. Stage shift improves lung cancer survival: real-world evidence. J Thorac Oncol. 2023;18(1):47–56.
Article PubMed Google Scholar
Mimae T, Okada M. Asian perspective on lung cancer screening. Thorac Surg Clin. 2023;33(4):385–400.
Article PubMed Google Scholar
Gong J, Liu J, Hao W, Nie S, Zheng B, Wang S, et al. A deep residual learning network for predicting lung adenocarcinoma manifesting as ground-glass nodule on CT images. Eur Radiol. 2020;30(4):1847–55.
Article PubMed Google Scholar
Brodersen J, Voss T, Martiny F, Siersma V, Barratt A, Heleno B. Overdiagnosis of lung cancer with low-dose computed tomography screening: meta-analysis of the randomised clinical trials. Breathe (Sheff). 2020;16(1):200013.
Article PubMed Google Scholar
Warkentin MT, Al-Sawaihey H, Lam S, Liu G, Diergaarde B, Yuan JM, et al. Radiomics analysis to predict pulmonary nodule malignancy using machine learning approaches. Thorax. 2024;79(4):307–15.
Article PubMed Google Scholar
Jacobs C. Challenges and outlook in the management of pulmonary nodules detected on CT. Eur Radiol. 2024;34(1):247–9.
Article PubMed Google Scholar
Autier P. Lung-cancer screening and the NELSON trial. N Engl J Med. 2020;382(22):2165.
PubMed Google Scholar
Zhong D, Sidorenkov G, Jacobs C, de Jong PA, Gietema HA, Stadhouders R, et al. Lung nodule management in low-dose CT screening for lung cancer: lessons from the NELSON trial. Radiology. 2024;313(1):e240535.
Article PubMed Google Scholar
MacMahon H, Naidich DP, Goo JM, Lee KS, Leung ANC, Mayo JR, et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner society 2017. Radiology. 2017;284(1):228–43.
Article PubMed Google Scholar
Ashizawa K, Maruyama Y, Kobayashi T, Kondo T, Nakagawa T, Hatakeyama M, et al. Guidelines for the management of pulmonary nodules detected by low-dose CT lung cancer screening 6th edition: compiled by the Japanese Society of CT Screening. Jpn J Radiol. 2025;43(3):333–46.
PubMed Google Scholar
Chinese Thoracic Society CMA, Chinese Alliance Against Lung Cancer Expert G. [Chinese expert consensus on diagnosis and treatment of pulmonary nodules(2024)]. Zhonghua Jie He He Hu Xi Za Zhi. 2024;47(8):716–29.
Ye W, Fu W, Li C, Li J, Xiong S, Cheng B, et al. Diameter thresholds for pure ground-glass pulmonary nodules at low-dose CT screening: Chinese experience. Thorax. 2025;80(2):76–85.
Article PubMed Google Scholar
Li L, Jiang H, Zeng B, Wang X, Bao Y, Chen C, et al. Liquid biopsy in lung cancer. Clin Chim Acta. 2024;554:117757.
Article CAS PubMed Google Scholar
Chen PH, Tsai TM, Lu TP, Lu HH, Pamart D, Kotronoulas A, Herzog M, Micallef JV, Hsu HH, Chen JS. Accurate diagnosis of high-risk pulmonary nodules using a non-invasive epigenetic biomarker test. Cancers. 2025;17(6):916.
Article CAS PubMed PubMed Central Google Scholar
Ostrin EJ, Bantis LE, Wilson DO, Patel N, Wang R, Kundnani D, et al. Contribution of a blood-based protein biomarker panel to the classification of indeterminate pulmonary nodules. J Thorac Oncol. 2021;16(2):228–36.
Article CAS PubMed Google Scholar
Fahrmann JF, Marsh T, Irajizad E, Patel N, Murage E, Vykoukal J, et al. Blood-based biomarker panel for personalized lung cancer risk assessment. J Clin Oncol. 2022;40(8):876–83.
Article CAS PubMed PubMed Central Google Scholar
Vykoukal J, Fahrmann JF, Patel N, Shimizu M, Ostrin EJ, Dennison JB, Ivan C, Goodman GE, Thornquist MD, Barnett MJ, Feng Z. Contributions of Circulating microRNAs for early detection of lung cancer. Cancers. 2022;14(17):4221.
Article CAS PubMed PubMed Central Google Scholar
Irajizad E, Fahrmann JF, Toumazis I, Vykoukal J, Dennison JB, Shen Y, et al. Biomarker trajectory for earlier detection of lung cancer. EBioMedicine. 2024;108:105377.
Article CAS PubMed PubMed Central Google Scholar
Cui X, Han D, Heuvelmans MA, Du Y, Zhao Y, Zhang L, et al. Clinical characteristics and work-up of small to intermediate-sized pulmonary nodules in a Chinese dedicated cancer hospital. Cancer Biol Med. 2020;17(1):199–207.
Article CAS PubMed PubMed Central Google Scholar
Papalampidou A, Papoutsi E, Katsaounou PA. Pulmonary nodule malignancy probability: a diagnostic accuracy meta-analysis of the Mayo model. Clin Radiol. 2022;77(6):443–50.
Article CAS PubMed Google Scholar
McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med. 2013;369(10):910–9.
Article CAS PubMed PubMed Central Google Scholar
Miao S, Dong Q, Liu L, Xuan Q, An Y, Qi H, et al. Dual biomarkers CT-based deep learning model incorporating intrathoracic fat for discriminating benign and malignant pulmonary nodules in multi-center cohorts. Phys Med. 2025;129:104877.
Article PubMed Google Scholar
Zhao T, Yue Y, Sun H, Li J, Wen Y, Yao Y, et al. MAEMC-NET: a hybrid self-supervised learning method for predicting the malignancy of solitary pulmonary nodules from CT images. Front Med (Lausanne). 2025;12:1507258.
Article PubMed Google Scholar
Meng Q, Liu T, Peng H, Gao P, Chen W, Fang M, et al. Construction and validation of a risk stratification model based on Lung-RADS((R)) v2022 and CT features for predicting the invasive pure ground-glass pulmonary nodules in China. Insights Imaging. 2025;16(1):68.
Article PubMed PubMed Central Google Scholar
Xu L, Chang N, Yang T, Lang Y, Zhang Y, Che Y, et al. Development of diagnosis model for early lung nodules based on a seven autoantibodies panel and imaging features. Front Oncol. 2022;12:883543.
Article CAS PubMed PubMed Central Google Scholar
Zhu Y, Yankelevitz DF, Henschke CI. How i do it: Management of pleural-attached pulmonary nodules in low-dose CT screening for lung cancer. Radiology. 2025;314(1):e240091.
Article PubMed Google Scholar
Aberle DR, DeMello S, Berg CD, Black WC, Brewer B, Church TR, et al. Results of the two incidence screenings in the National Lung Screening Trial. N Engl J Med. 2013;369(10):920–31.
Article CAS PubMed PubMed Central Google Scholar

Download references

Funding

This research was supported by the Zhejiang Provincial Medical and Health Science and Technology Plan Project (No. 2022PY035).

Author information

Authors and Affiliations

Department of Thoracic Surgery, Zhejiang Hospital, Hangzhou, 310013, China
Lei Zhang, Yanhui Xu, Yan Ma, Lilong Xia, Kaixiang Zhao & Xinhai Zhu
Hangzhou Cosmos Wisdom Mass Spectrometry Center of Zhejiang University Medical School, Hangzhou, 311200, China
Qinqin Lou, Fangfang Chen, Fang Li, Kun Chai, Junshun Gao, Mingjie Tong & Junli Gao

Authors

Lei Zhang
View author publications
Search author on:PubMed Google Scholar
Yanhui Xu
View author publications
Search author on:PubMed Google Scholar
Qinqin Lou
View author publications
Search author on:PubMed Google Scholar
Fangfang Chen
View author publications
Search author on:PubMed Google Scholar
Fang Li
View author publications
Search author on:PubMed Google Scholar
Kun Chai
View author publications
Search author on:PubMed Google Scholar
Junshun Gao
View author publications
Search author on:PubMed Google Scholar
Mingjie Tong
View author publications
Search author on:PubMed Google Scholar
Yan Ma
View author publications
Search author on:PubMed Google Scholar
Lilong Xia
View author publications
Search author on:PubMed Google Scholar
Kaixiang Zhao
View author publications
Search author on:PubMed Google Scholar
Junli Gao
View author publications
Search author on:PubMed Google Scholar
Xinhai Zhu
View author publications
Search author on:PubMed Google Scholar

Contributions

XH Zhu and JL Gao conceived the study. L Zhang, YH Xu, QQ Lou, K Chai, JS Gao and Y Ma designed and conducted the experiments. QQ Lou, F Li, LL Xia and KX Zhao analyzed the data. L Zhang, FF Chen, MJ Tong and LL Xia wrote the manuscript. All authors contributed to manuscript revisions.

Corresponding authors

Correspondence to Junli Gao or Xinhai Zhu.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Ethical approval

Ethics Statement: This work has been approved by the Medical Ethics Committee of Zhejiang Hospital (approval No. 2021-141 K).

Consent to participate

All participants agreed to participate in this study, and signed the written informed consent to participate in this study.

Consent to publish

All participants agreed to participate in this study and agreed to have their data published in a journal article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, L., Xu, Y., Lou, Q. et al. A novel composite model for distinguishing benign and malignant pulmonary nodules. Clin Exp Med 25, 159 (2025). https://doi.org/10.1007/s10238-025-01672-5

Download citation

Received: 12 March 2025
Accepted: 05 April 2025
Published: 14 May 2025
DOI: https://doi.org/10.1007/s10238-025-01672-5

A novel composite model for distinguishing benign and malignant pulmonary nodules

Abstract

Similar content being viewed by others

Development and validation of a risk model with variables related to non-small cell lung cancer in patients with pulmonary nodules: a retrospective study

Construction and validation of a predictive model of invasive adenocarcinoma in pure ground-glass nodules less than 2 cm in diameter

Development of a combined radiomics and CT feature-based model for differentiating malignant from benign subcentimeter solid pulmonary nodules

Introduction