Skip to main content
Advertisement

Revealing cancer driver genes through integrative transcriptomic and epigenomic analyses with Moonlight

  • Mona Nourbakhsh,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark, Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark

  • Yuanning Zheng,

    Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Biomedical Data Science, Stanford Center for Biomedical Informatics Research, Palo Alto, California, United States of America

  • Humaira Noor,

    Roles Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Biomedical Data Science, Stanford Center for Biomedical Informatics Research, Palo Alto, California, United States of America

  • Hongjin Chen,

    Roles Data curation, Formal analysis, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark

  • Subhayan Akhuli,

    Roles Data curation, Formal analysis, Investigation, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark

  • Matteo Tiberti,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Supervision, Validation, Writing – review & editing

    Affiliation Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark

  • Olivier Gevaert,

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Biomedical Data Science, Stanford Center for Biomedical Informatics Research, Palo Alto, California, United States of America

  • Elena Papaleo

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing

    elpap@dtu.dk, elenap@cancer.dk

    Affiliations Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark, Cancer Structural Biology, Danish Cancer Institute, Copenhagen, Denmark

Abstract

Cancer involves dynamic changes caused by (epi)genetic alterations such as mutations or abnormal DNA methylation patterns which occur in cancer driver genes. These driver genes are divided into oncogenes and tumor suppressors depending on their function and mechanism of action. Discovering driver genes in different cancer (sub)types is important not only for increasing current understanding of carcinogenesis but also from prognostic and therapeutic perspectives. We have previously developed a framework called Moonlight which uses a systems biology multi-omics approach for prediction of driver genes. Here, we present an important development in Moonlight2 by incorporating a DNA methylation layer which provides epigenetic evidence for deregulated expression profiles of driver genes. To this end, we present a novel functionality called Gene Methylation Analysis (GMA) which investigates abnormal DNA methylation patterns to predict driver genes. This is achieved by integrating the tool EpiMix which is designed to detect such aberrant DNA methylation patterns in a cohort of patients and further couples these patterns with gene expression changes. To showcase GMA, we applied it to three cancer (sub)types (basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma) where we discovered 33, 190, and 263 epigenetically driven genes, respectively. A subset of these driver genes had prognostic effects with expression levels significantly affecting survival of the patients. Moreover, a subset of the driver genes demonstrated therapeutic potential as drug targets. This study provides a framework for exploring the driving forces behind cancer and provides novel insights into the landscape of three cancer sub(types) by integrating gene expression and methylation data.

Abstract

Cancer is a complex disease and a main cause of mortality worldwide. This heterogeneous disease arises due to accumulation of changes which occur in driver genes that drive cancer progression when they are altered. These driver genes are commonly divided into oncogenes, which promote cancer, and tumor suppressors, which prevent it. A major goal of cancer research is identifying these driver genes, crucial for increasing our current understanding of cancer biology and for developing novel treatment approaches. A large number of cancer driver genes have already been identified. However, the underlying mechanisms for the alterations in these genes is challenging to predict given their context-dependent behavior and the complexity of cancer. Such explanations are the focus of this study with the aim of providing evidence of why certain genes do not function normally in cancer. Within this context, we present a new functionality to our previously developed cancer driver predictive framework, Moonlight. This new functionality integrates multiple data types to predict oncogenes and tumor suppressors in a systems-biology-oriented manner that is freely available as a R package for the community.

Introduction

Cancer is a complex and heterogeneous disease and a leading cause of death globally [1]. This widespread disease is categorized into multiple (sub)types and is characterized by stepwise accumulation of (epi)genetic alterations in cancer driver genes [2]. Driver genes are classified according to their function, i.e., oncogenes (OCGs) activated by gain-of-function mechanisms and tumor suppressor genes (TSGs) inactivated by loss-of-function mechanisms [3]. Recently, dual role genes also emerged which show context-dependent behavior and can act as both OCGs and TSGs in different biological contexts [4,5]. Driver genes participate in several cellular pathways conceptualized in the Hallmarks of Cancer, a collection of functional capabilities that cells gain during their transition from normal to tumor cells [68]. Distinct driver genes can initiate cancer development in different cancer types and even within subtypes of cancers originating from the same tissue. Thus, context-specific discovery of driver genes in light of the cancer hallmarks is essential. Numerous tools have been developed for prediction of driver genes based on varying computational methods which we recently reviewed [9]. Prediction of driver genes is essential for increasing current knowledge of cancer development and for analyzing and interpreting the vast amount of data in relation to cancer phenotypes which is important towards reversing these phenotypes, discovering novel drug targets, facilitating new treatment strategies, and designing precision medicine strategies [1013]. We have contributed to this field with Moonlight which uses a multi-omics systems biology approach for prediction of driver genes [14,15].

The accumulated (epi)genetic alterations in driver genes include mutations, copy number variations, aberrant methylation levels, and histone modifications [3,16]. While abnormal methylation patterns are recognized as cancer-causing mechanisms, they have been described to a lesser extent compared to mutations [9]. Hypomethylation and hypermethylation, respectively representing loss and gain of methylation compared to normal conditions, have been described as activating and inactivating mechanisms of OCGs and TSGs, respectively [1719]. For instance, Søes et al. found promoter hypomethylation and increased expression of putative OCG ELMO3 to be associated with development of non-small cell lung cancer [20].

Here, we present a novel functionality of Moonlight2, expanding upon features presented in our previous work [15]. Specifically, we incorporate methylation evidence to Moonlight2 predicted driver genes as a source of epigenetic explanation of the deregulated expression of these genes. Information about methylation state is provided by EpiMix, an integrative tool for detecting aberrant DNA methylation patterns connected with expression changes in patient cohorts [21]. To showcase this new feature, we apply it to three cancer (sub)types (basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma) and discover driver genes in the context of cell proliferation and apoptosis, two well-established cancer hallmarks, and explore the prognostic and therapeutic potentials of the predicted driver genes. We apply our new method on data from The Cancer Genome Atlas (TCGA) [22,23].

Design and implementation

Design and implementation of a new functionality in Moonlight

Here, we present a new functionality to Moonlight, our framework for driver gene prediction [14,15]. Moonlight requires a set of differentially expressed genes (DEGs) as input and is built up of two layers (Fig 1A). In this context, a “layer” is made of a set of data analysis steps with a precise purpose. The primary layer uses gene expression differences between tumor and normal tissue, as well as information about cancer-related biological processes, to discover putative driver genes termed oncogenic mediators. This is done through four steps. The first step is a functional enrichment analysis (FEA) which assesses enrichment of cancer-related biological processes of the DEGs. This allows the user to understand the biological context in which the putative drivers will be predicted. Secondly, a gene regulatory network (GRN) analysis is carried out where interactions between the DEGs are modelled by means of mutual information. The resulting networks of DEGs are subsequently used to estimate the effect of the DEGs on the given biological processes through an upstream regulator analysis (URA). Finally, a pattern recognition analysis (PRA) is carried out where the DEGs are divided into putative OCGs and TSGs (termed oncogenic mediators) based on their effects on cancer growing and cancer blocking processes. For instance, if a DEG has a positive effect on cell proliferation and a negative effect on apoptosis, it would be categorized as a putative OCG and vice versa for putative TSGs [14,15]. Following this primary layer, a secondary layer couples mechanistic evidence to the oncogenic mediators by investigating (epi)genetic alterations in the oncogenic mediators (namely, mechanistic indicators). The rationale behind this is that gene expression changes alone are insufficient to explain the (in)activation of the drivers, hence a second layer of evidence incorporated in the (epi)genetic alterations is necessary. From this secondary layer, the critical driver genes are predicted among the oncogenic mediators (Fig 1A). We recently presented Moonlight2 with the overall goal of implementing new functionalities to provide standardized and automatized solutions to the analysis of the mechanistic indicators. At first, we developed a secondary layer for mechanistic indicators based on mutational data in a functionality called Driver Mutation Analysis (DMA) [15]. DMA first classifies mutations in the cancer cohort into driver and passenger mutations, and next retains those oncogenic mediators with at least one driver mutation as the final set of driver genes [15].

thumbnail
Fig 1. Overview of the Moonlight framework with new methylation functionality.

(A) Moonlight consists of a primary layer requiring differentially expressed genes and gene expression data as input. The primary layer predicts oncogenic mediators through a series of functions called functional enrichment analysis (FEA), gene regulatory network analysis (GRN), upstream regulator analysis (URA), and pattern recognition analysis (PRA). Moonlight’s secondary mutation layer requires mutation data as input and is carried out via the driver mutation analysis (DMA) function and similarly, Moonlight’s secondary methylation layer implemented in the gene methylation analysis (GMA) function requires methylation data as input. The secondary layer results in the final prediction of driver genes. (B) DNA methylation is a mechanism occurring under physiological conditions in cells which functions to regulate gene expression. However, in cancer, the DNA methylation process is altered. A loss of methylation called hypomethylation can occur which can lead to increased expression of a gene and thus an increased amount of the resulting protein. In contrast, gain of methylation called hypermethylation can also occur which can silence gene expression and lead to decreased protein expression. These two mechanisms can finally lead to cancer. Hypo- and hypermethylation can activate and inactivate oncogenes and tumor suppressors, respectively, the biological principle that GMA is built on. (C) The outputs of EpiMix and Moonlight are integrated to predict driver genes. EpiMix outputs a table of CpG-gene pairs containing differentially methylated CpG sites whose DNA methylation state is associated with gene expression. Moonlight outputs a list of oncogenic mediators and their putative driver role as tumor suppressors or oncogenes. (D) Driver genes are defined in GMA by comparing EpiMix’s predictions of methylation state and Moonlight’s predictions of driver role in “evidence” categories. Those oncogenic mediators labeled with an “agreement” evidence are retained as the final set of predicted driver genes.

https://doi.org/10.1371/journal.pcbi.1012999.g001

In this contribution, we added another secondary layer to Moonlight2 to cover mechanistic indicators related to methylation changes. This new functionality is termed Gene Methylation Analysis (GMA) and should be applied following the Pattern Recognition Analysis (PRA) function which predicts the oncogenic mediators in the primary layer (Fig 1A). To fully take advantage of the Moonlight framework, the user must apply a secondary layer following the primary layer, meaning the user must apply either DMA or GMA depending on the research question and available source of -omics data. The user can also apply both DMA and GMA if this is of interest, with the expectation that they would provide complementary evidence since they take into consideration different sources that can affect gene expression. The biological foundation for GMA lies within the observed roles of DNA methylation in both physiological and cancer states. Under healthy conditions, DNA methylation serves an essential regulatory role in cells by regulating expression of genes [24]. However, in cancer, DNA methylation processes are altered, where hypo- and hypermethylation can activate and inactivate OCGs and TSGs, respectively, leading to overexpression of OCGs and silencing of TSGs [1719] (Fig 1B).

GMA predicts methylation-driven driver genes by using EpiMix [21]. EpiMix models DNA methylation in patient cohorts and predicts differential methylation associated with gene expression and further allows for DNA methylation analysis of non-coding regulatory regions [21], therefore being perfectly suitable to complement Moonlight’s primary layer. Here, we are using EpiMix’s “regular mode” as this allows for analyzing DNA methylation in the promoter regions. In brief, EpiMix first uses a beta mixture model to decompose the DNA methylation profiles of the cohort. Next, differential methylation values are calculated, which represent the mean differences in DNA methylation levels between patients in each of the identified mixture components in the disease group compared with the control group. Finally, EpiMix finds those CpG sites that are significantly associated with gene expression [21]. EpiMix is available as a R BioConductor package, which allows for easy integration with Moonlight. A key result of EpiMix is a table which includes functional CpG-gene pairs containing differentially methylated CpG sites whose DNA methylation state is associated with the expression of the corresponding genes they map to. Moreover, the methylation state (e.g., hypo- or hypermethylated) of each CpG site is reported. This table is integrated with the main output table from Moonlight’s primary layer, specifically the output from PRA, which provides a list of oncogenic mediators and their putative driver role (e.g., putative TSG or OCG) (Fig 1C). This integration step involves the following: for each oncogenic mediator, the number of associated CpG sites is summarized. EpiMix’s predictions of methylation state and Moonlight’s predictions of driver gene role are then compared and used to assess whether the gene’s methylation status supports the putative role (OCG or TSG) of the oncogenic mediator. These comparisons are subsequently used to define the driver genes based on the assumption with which methylation changes can activate or inactive driver genes (Fig 1D, Table 1). For example, suppose EpiMix predicts a gene to contain one or more hypermethylated CpG sites whereas Moonlight’s primary layer predicts this gene to be an OCG. This indicates conflicting evidence between the mechanism of (in)activation and putative driver role between the two tools. Consequently, this gene is labeled with “conflicting” evidence. Contrary, if EpiMix for instance predicts a gene to be associated with hypermethylated CpG site(s) and Moonlight predicts this gene as a TSG, this gene is labeled with an “agreement” evidence to signify correspondence between EpiMix and Moonlight (Fig 1D, Table 1). Those oncogenic mediators labeled with an “agreement” evidence are retained as the final set of predicted driver genes. Those oncogenic mediators labeled with a “conflicting” evidence are still included in one of the outputs of GMA, but they are not retained as part of the predicted driver genes set.

thumbnail
Table 1. Comparison between EpiMix’s predictions of methylation state and Moonlight’s primary layer’s predictions of driver gene role in “evidence” categories. Those oncogenic mediators labeled with “agreement” evidence are retained as the final set of predicted driver genes.

https://doi.org/10.1371/journal.pcbi.1012999.t001

As input, GMA requires i) a gene expression matrix with genes in rows and tumor and normal samples in columns, ii) a methylation matrix with CpG sites in rows and tumor and normal samples in columns which should be the same samples as in the expression data, iii) output of PRA from Moonlight’s primary layer, i.e., the predicted oncogenic mediators and their putative driver role, iv) output of a differential expression analysis (DEA) which includes information about the DEGs, and finally, v) a table containing information about the samples which includes the sample names and the sample types (tumor or normal sample). In return, GMA outputs the following: i) a list of predicted driver genes categorized into TSGs and OCGs, ii) a summary of the oncogenic mediators which includes the number of associated CpG sites and evidence label, iii) a summary of various annotations found to all DEGs input to Moonlight on the gene and methylation level, and iv) raw EpiMix results corresponding to applying EpiMix on the input data independently of the GMA function.

We have also created three functions for visualizing genes and methylation states: plotGMA which visualizes the number of differentially methylated hypo-, hyper- or dual-methylated CpG sites, plotMoonlightMet which visualizes the effect of genes on biological processes estimated in Moonlight’s primary layer, and plotMetExp which calls a visualization function from EpiMix, EpiMix_PlotModel, to display gene expression and methylation levels of a specific gene and CpG site [21].

Application of new functionality to three cancer (sub)types

Following implementation of the new functionality, GMA, in Moonlight2, we conducted a case study applying GMA to basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma data from TCGA to discover methylation-driven driver genes. Moreover, we compared these predicted drivers with mutation-driven drivers by applying our previously developed secondary mutational layer called DMA [15]. We selected these cancer types to compare with and build upon our previous findings where we examined mutational drivers with the DMA functionality [15]. Detailed methods behind this case study are included in S1 Text.

Results

Case study: Prediction of driver genes with differential methylation in three different cancer types using Moonlight2

To showcase the new functionality in Moonlight2 and predict driver genes driven by methylation changes, we applied Moonlight2 on three cancer (sub)types: basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma. First, we used RNAseq data to perform DEA between each of these cancer tissues and corresponding normal samples as this is the input to Moonlight’s primary layer (Table 2). Following DEA, Moonlight’s primary layer predicted 159, 1228, and 1598 oncogenic mediators in these three cancer (sub)types, respectively (Table 2). Additionally, EpiMix alone identified 9483, 10018, and 6142 functional gene-CpG pairs in these three cancer (sub)types, respectively. These functional gene-CpG pairs represent differentially methylated CpG sites whose DNA methylation state is associated with the expression of the corresponding genes they map to. The number of hits discovered individually from EpiMix and Moonlight’s primary layer indicate a substantial amount of significant associations. Consequently, integrating the results from EpiMix with the oncogenic mediators identified in Moonlight’s primary layer, as implemented in GMA presented here, helps narrowing down the dataset to the most relevant findings in a synergistic fashion. From GMA, we found that those oncogenic mediators in basal-like breast cancer that are associated with differentially methylated CpGs include in total 38 hypomethylated CpGs, 165 hypermethylated CpGs, and 22 methylated CpGs with a dual status, the latter meaning the CpG site was found hypomethylated in cancer tissues from some patients, while hypermethylated in other patients. Similarly, oncogenic mediators in lung adenocarcinoma that are associated with differentially methylated CpGs include in total 218 hypomethylated CpGs, 625 hypermethylated CpGs, and 48 dual-methylated CpGs. Finally, oncogenic mediators in thyroid carcinoma associated with differentially methylated CpGs contain in total 945 hypomethylated CpGs, 305 hypermethylated CpGs, and 230 dual-methylated CpGs (Fig 2A).

thumbnail
Table 2. Number of predicted DEGs, oncogenic mediators, and driver genes in three cancer (sub)types: basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma. The oncogenic mediators and driver genes predicted by Moonlight’s primary and secondary methylation layer, respectively, are divided into (putative) TSGs and OCGs.

https://doi.org/10.1371/journal.pcbi.1012999.t002

thumbnail
Fig 2. Integration of Moonlight and EpiMix for prediction of cancer driver genes.

(A) Number of differentially methylated CpGs as found from EpiMix in oncogenic mediators predicted from Moonlight’s primary layer. The differentially methylated CpGs are categorized into methylation status and stratified by cancer (sub)type. (B) Heatmap showing number of differentially methylated CpGs and classifications of methylation status in the oncogenic mediators in basal-like breast cancer. The heatmap was generated using the plotGMA function. (C) Venn diagram comparing oncogenic mediators predicted from Moonlight’s primary layer with functional genes predicted from EpiMix in basal-like breast cancer. The functional genes are genes containing differentially methylated CpG pairs whose DNA methylation state is associated with expression of the gene. Only those functional genes that contained the same methylation state in all of its associated CpGs were included in this comparison, and moreover, the dual methylation states were excluded. (D) Heatmap showing the effect of the predicted driver genes in basal-like breast cancer on apoptosis and proliferation of cells. This heatmap was generated using the function plotMoonlightMet. These effects define the basis upon which the oncogenic mediators are predicted from the PRA step in Moonlight’s primary layer. (E) Comparison between the predicted driver genes with the predicted oncogenic mediators in all three cancer (sub)types where the driver genes were predicted with the new functionality GMA in Moonlight’s secondary layer, and the oncogenic mediators were predicted with Moonlight’s primary layer. The comparisons were quantified in terms of overlaps with genes reported in the COSMIC Cancer Gene Census (CGC) by computing the precision and sensitivity. The precision was calculated as (TP/(TP + FP))*100 and sensitivity as (TP/(TP + FN))*100. The true positives (TP) are the overlap between the gene set (either the driver genes or the oncogenic mediators) and the CGC. The false positives (FP) are those genes found in the gene set but are not included in CGC. The false negatives (FN) comprise those genes reported in CGC but are not predicted in our gene set.

https://doi.org/10.1371/journal.pcbi.1012999.g002

Across all three cancer (sub)types, we found that the largest number of differentially methylated CpG sites mapping to a single oncogenic mediator was 28. The classifications of methylation status in the oncogenic mediators in basal-like breast cancer are shown in Fig 2B, generated with the plotGMA function. Next, we compared Moonlight’s oncogenic mediators with EpiMix’ functional genes. For this, we included only those functional genes that contained the same methylation state in all its associated CpGs and moreover, the genes with a dual methylation state, as previously defined, were excluded. In basal-like breast cancer, this comparison revealed 109 oncogenic mediators not associated with differentially methylated CpGs, 17 oncogenic mediators with a “conflicting” evidence label, and 33 oncogenic mediators with an “agreement” evidence label (Fig 2C). Consequently, these 33 oncogenic mediators are retained as the final set of driver genes divided into 32 TSGs and 1 OCG (Table 2). Next, we visualized the effect of these predicted driver genes in basal-like breast cancer on two well-known cancer hallmarks, apoptosis and proliferation of cells, using the function plotMoonlightMet. These effects define the basis upon which the oncogenic mediators are predicted from the PRA step in Moonlight’s primary layer, demonstrating that the predicted OCGs have a positive effect on proliferation of cells and a negative effect on apoptosis and vice versa for the predicted TSGs (Fig 2D). Similar overviews for lung adenocarcinoma and thyroid carcinoma are shown in S1 Fig, which resulted in a final prediction of 190 driver genes divided into 110 TSGs and 80 OCGs in lung adenocarcinoma and 263 driver genes categorized into 5 TSGs and 258 OCGs in thyroid carcinoma (Table 2). We did not discover any dual role genes across the three cancer (sub)types, i.e., genes predicted as OCGs in one of the three cancer (sub)types and as TSGs in another cancer (sub)type and vice versa.

We then compared the predicted driver genes with the predicted oncogenic mediators in each cancer (sub)type. We quantified these comparisons in terms of overlaps with genes reported in the COSMIC Cancer Gene Census (CGC) [25]. Specifically, we computed the precision as (TP/(TP + FP))*100 and sensitivity as (TP/(TP + FN))*100. We defined the true positives (TP) as the overlap between the gene set (either the driver genes or the oncogenic mediators) and the CGC, whereas the false positives (FP) are those genes found in the gene set but are not included in CGC. In contrast, the false negatives (FN) comprise those genes reported in CGC but are not predicted in our gene set. For all three cancer (sub)types, we found that GMA had a greater precision and lower sensitivity compared to using only Moonlight’s primary layer (Fig 2E). A higher precision of GMA is desirable as it indicates that the predicted driver gene sets have a higher fraction of genes from the CGC compared to the oncogenic mediator sets. On the other hand, the higher sensitivity of using only Moonlight’s primary layer compared to also using GMA might be attributed to the larger numbers of oncogenic mediators. A larger number of oncogenic mediators results in a larger overlap between the CGC and the oncogenic mediators, thereby lowering the number of FNs and increasing the sensitivity. In this case, prioritizing higher precision over sensitivity is preferable since our aim is to find the most crucial driver genes among the oncogenic mediators. Thus, a higher precision indicates a greater proportion of TPs, corresponding with our objective. Next, we also evaluated the significance of association between the gene sets and the CGC using a Fisher’s exact test (Table 3). We only found the oncogenic mediator and driver gene sets from basal-like breast cancer to have a significant association with genes in the CGC (p-value = 0.000392 for the oncogenic mediators predicted using Moonlight’s primary layer and p-value = 0.00228 for the driver genes predicted using GMA). However, in all three cancer (sub)types, we found the driver genes to have a higher odds ratio than the oncogenic mediators, demonstrating a greater association between the driver gene sets and the CGC compared to the oncogenic mediators (Table 3).

thumbnail
Table 3. Significance of association between Moonlight’s gene sets and genes from the Cancer Gene Census (CGC) evaluated using Fisher’s exact test in three cancer (sub)types: basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma. The gene sets from Moonlight were found using Moonlight’s primary layer and Moonlight’s secondary layer through the Gene Methylation Analysis (GMA) functionality. p-values and odds ratios from Fisher’s exact test are includned.

https://doi.org/10.1371/journal.pcbi.1012999.t003

While these results together demonstrate the added value of GMA, it is worth highlighting certain limitations. Notably, the driver genes reported in CGC are mainly based on mutation evidence. In this study, we have used abnormal DNA methylation levels as evidence for deregulated expression of the driver genes. Hence, these methylation patterns may not be fully captured in the CGC, challenging our comparison with the CGC. However, to date, no golden standard of cancer drivers exists, and the CGC stands as the most robust and comprehensive resource available. Thus, it serves as the main reference point that most studies use to evaluate their predicted driver genes and method [2637]. To our knowledge, a similar well-curated resource of cancer driver genes driven by methylation changes does not exist. Moreover, performing cancer type-specific comparisons would be more desirable. While the CGC reports which cancer types the driver genes are associated with, these annotations are limited in scope. Therefore, while ideal, performing such cancer type-specific comparisons do not contain enough power. Finally, the quantitative statistical measures are not taking into account that some of our predicted driver genes may be novel. Consequently, some FPs may in fact be TPs but are not included in CGC, and some FNs may not necessarily be FNs; rather, they may not represent drivers in the specific cancer (sub)type.

To investigate biological roles of the predicted driver genes, we performed enrichment analyses (Fig 3). The predicted driver genes are involved in various signaling pathways such as KRAS signaling in basal-like breast cancer and thyroid carcinoma, mTORC1 signaling in lung adenocarcinoma, and TNF−alpha signaling via NF − kB and p53 pathway in thyroid carcinoma. Previously, TP53 and TNF signaling have been associated with the onset of cancer among epigenetically modified pathways [38]. Furthermore, IL − 6/JAK/STAT3 signaling was significantly enriched among the predicted driver genes in basal-like breast cancer (Fig 3). Basal-like breast cancers overexpress Interleukin 6 (IL-6), a pro-inflammatory cytokine, and it has been reported that p53 absence triggers an IL-6 dependent epigenetic reprogramming driving breast cancer cells towards a basal-like/stem cell-like gene expression profile [39]. Additionally, epithelial-mesenchymal transition (EMT) is a recurring enriched term, observed in both lung adenocarcinoma and thyroid carcinoma. Epigenetic regulation of EMT has previously been described, and DNA methylation and demethylation plays a key role in this regulation [4043].

thumbnail
Fig 3. Enrichment analyses of predicted driver genes.

Enrichment analysis of predicted driver genes in (A) basal-like breast cancer, (B) lung adenocarcinoma, and (C) thyroid carcinoma using the “MSigDB Hallmark 2020” database. The top 10 most significantly enriched terms (adjusted p-value < 0.05) are included. The gene ratio on the x axis is the ratio between the number of predicted driver genes that intersect with genes annotated in the given hallmark gene set and the total number of genes annotated in the respective hallmark gene set. The point sizes reflect the number of driver genes playing a role in the respective hallmark gene set.

https://doi.org/10.1371/journal.pcbi.1012999.g003

Association between expression of predicted driver genes and survival of cancer patients

We performed survival analysis to evaluate the prognostic potential of the predicted driver genes. We first used Cox proportional hazards regression and found that the expression level of 20 of the predicted OCGs in lung adenocarcinoma had a significant effect on survival at the multivariate level when accounting for tumor stage, age of patients, and sex of patients. Similarly, expression of two of the predicted OCGs in thyroid carcinoma had a significant effect on survival. No driver genes in basal-like breast cancer were found to have a significant effect on survival at the multivariate level. Thus, we deemed these 22 OCGs as prognostic (Fig 4A). Next, we examined whether high or low expression of these prognostic genes were associated with survival of the patients. For this, we divided the patients into high and low expression groups and assessed differences in survival through Kaplan-Meier survival analyses and log-rank tests. These analyses revealed a significant difference in survival between patients with high and low expression of 18 of the 20 prognostic OCGs in lung adenocarcinoma. The two OCGs that did not show a significant difference were RPL39L and GINS2. On the other hand, we did not observe a significant difference in survival between patients with high and low expression of the two predicted OCGs in thyroid carcinoma. These results together indicate a greater prognostic potential of OCGs compared to TSGs and additionally, a greater presence of prognostic OCGs in lung adenocarcinoma compared to basal-like breast cancer and thyroid carcinoma. It is, however, worth mentioning that a smaller subset of driver genes was predicted in basal-like breast cancer with only one predicted OCG, indicating a more limited search pool for prognostic OCGs. The Cox regression analysis demonstrated that increases in expression of 20 OCGs in lung adenocarcinoma and two OCGs in thyroid carcinoma are associated with an increase in the hazard of experiencing death. However, while the Kaplan-Meier analyses showed statistically significant differences between high and low expression groups of 18 of the 20 prognostic OCGs in lung adenocarcinoma, the magnitudes of the survival differences are small. Small absolute survival differences can challenge clinical implications, and further studies are needed in larger cohorts to determine prognostic potential of the driver genes.

thumbnail
Fig 4. Survival analysis of predicted driver genes.

(A) Hazard ratios from multivariate Cox proportional hazards regression of 20 of the predicted OCGs in lung adenocarcinoma and of two of the predicted OCGs in thyroid carcinoma. (B-D) Kaplan-Meier survival plots of three of the predicted OCGs in lung adenocarcinoma which were deemed prognostic from multivariate Cox regression analysis: (B) GNPNAT1, (C) RRM2, and (D) SLC2A1. Patients with expression values above and below the median expression level of the respective gene were divided into a high and low expression group, respectively. The p-values represent the significance of difference in survival between the two groups for each gene.

https://doi.org/10.1371/journal.pcbi.1012999.g004

To highlight a few examples, multivariate Cox regression analysis identified GNPNAT1, RRM2, and SLC2A1 as prognostic OCGs in lung adenocarcinoma with hazard ratios of 1.4, 1.3, and 1.3, respectively. In all three cases, patients with high expression of the OCG had a significantly lower survival probability compared to patients with low expression of these OCGs (Fig 4B-D) (p-values: < 0.0001, 0.0015 and 0.00055 for GNPNAT1, RRM2 and SLC2A1, respectively). This aligns with the anticipated role of OCGs which are typically upregulated in cancer, indicating a worse prognosis.

Predicted driver genes have therapeutic potential as drug targets

The potential of cancer driver genes as drug targets have previously been highlighted [4446] and targeted therapies have been developed towards these genes. Thus, we next investigated the therapeutic potential of the predicted driver genes as drug targets by querying the Drug-Gene Interaction Database (DGIdb) [47] for driver gene-drug interactions using only cancer-specific data sources (see also S2 Text). In basal-like breast cancer, we identified seven TSGs documented to interact with drugs in DGIdb. In lung adenocarcinoma, both OCGs and TSGs, numbering 12 each, were reported as drug targets. Finally, in thyroid carcinoma, 23 OCGs were reported as interacting with drugs (S2 Fig). Across all three cancer (sub)types, the number of driver gene-drug interactions varied between one and 55. Roughly half of all predicted driver genes interacted with one drug while the other half interacted with two or more drugs (Fig 5A).

thumbnail
Fig 5. Exploration of predicted driver genes as drug targets.

(A) Distribution of driver gene-drug interactions stratified by cancer type with the number of drug interactions on the x axis and number of driver genes on the y axis. (B-D) Heatmaps visualizing driver gene-drug interactions in (B) lung adenocarcinoma, (C) basal-like breast cancer, and (D) thyroid carcinoma. Only those driver gene-drug interactions where the interaction type was known are included in the heatmaps. The type of interaction is shown in different colors. The driver genes are divided into OCGs and TSGs.

https://doi.org/10.1371/journal.pcbi.1012999.g005

Next, we examined those driver gene-drug interactions for which the interaction type was known. In basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma, we found three (all TSGs), six (three OCGs and three TSGs), and five (all OCGs) driver genes, respectively, for which the interaction type was known (Fig 5B-D). Most of the drugs were classified as inhibitors. The two driver genes with the most interactions were PDGFRB in basal-like breast cancer and MET in thyroid cancer. We predicted PDGFRB as a TSG in basal-like breast cancer which is annotated to interact with 16 inhibitors and three drugs with antagonist or inhibitor interactions. These drugs exert inhibitory mechanisms for targeting an OCG role of PDGFRB. As the gene-drug target interactions are not specific for a certain cancer type, these results might suggest a potential dual role of PDGFRB. On the other hand, MET predicted as an OCG in thyroid cancer interacted with 19 inhibitors, in accordance with the OCG role of MET. Moreover, in lung adenocarcinoma, the predicted OCG RRM2, which we also identified as a prognostic gene above, interacted with one inhibitor, gemcitabine. Previously, one study investigated the mRNA expression of RRM1 and RRM2 in tumors from patients with lung adenocarcinoma treated with docetaxel/gemcitabine. They found low RRM2 mRNA expression to be associated with a higher response rate to treatment compared to patients with high expression [48]. Similarly, in thyroid carcinoma, we observed an interaction between ERBB3, a member of the epidermal growth factor receptor (EGFR) family of receptor tyrosine kinases, and four inhibitors (sapitinib, poziotinib, gefitinib, and dacomitinib). These inhibitors, all classified as tyrosine kinase inhibitors [4956], align with ERBB3’s predicted role as an OCG. Another example is the interaction between EpCAM and solitomab in lung adenocarcinoma. EpCAM is an epithelial cell adhesion molecule which plays a role in cell proliferation, migration, and signaling and is frequently overexpressed on the cell surface of several human carcinomas [5759]. For instance, EpCAM was recently found to be upregulated in primary lung cancer compared to normal lung tissues caused by gene amplification and promoter hypomethylation [60]. Solitomab is a bispecific antibody binding to EpCAM and CD3 [57] which previously has shown preliminary signs of antitumor activity [61].

Integrating the results from DMA and GMA functions of Moonlight2

Next, we also applied the Moonlight2 DMA functionality [15] to the data used for the case studies above to show the potential of integrating different mechanistic indicators. For basal-like breast cancer, DMA predicted 46 driver genes (10 OCGs and 36 TSGs), while GMA predicted 33 driver genes (32 OCGs and 1 TSG) (Fig 6A-C). For lung adenocarcinoma, DMA predicted 842 driver genes (490 OCGs and 352 TSGs), while GMA predicted 190 (80 OCGs and 110 TSGs) (Fig 6D-F). Both secondary layers predicted a larger number of driver genes in lung adenocarcinoma than basal-like breast cancer (Table 2, Fig 6). This is likely a direct consequence of Moonight’s primary layer, which identified a larger number of oncogenic mediators in lung adenocarcinoma than basal-like breast cancer. At the same time, DMA predicted a larger number of driver genes for both datasets than GMA, with a larger proportion in lung adenocarcinoma than basal-like breast cancer (~4.5 times against ~1.4 times, respectively). This observation aligns with previous reports suggesting that lung adenocarcinoma exhibits a high mutation burden [62,63], suggesting that DMA was able to identify a larger number of driver mutations overall. In most cases, we found an overlap between driver genes identified by DMA and GMA, which suggests multiple mechanisms at play. In basal-like breast cancer, 13 driver genes were predicted by both DMA and GMA, which were all TSGs (Fig 6A-C). In lung adenocarcinoma, 141 driver genes (63 OCGs and 78 TSGs) were identified by both methods (Fig 6D-F). In the case of lung adenocarcinoma, and more so than in basal-like breast cancer, the driver genes predicted by GMA were in good part also predicted by DMA.

thumbnail
Fig 6. Comparison of number of mutation- and methylation-driven driver genes.

Venn diagram comparing (A, D) the number of driver genes, (B, E) TSGs, and (C, F) OCGs predicted by the Driver Mutation Analysis (DMA) and Gene Methylation Analysis (GMA) functions of Moonlight2 for (A-C) basal-like breast cancer and (D-F) lung adenocarcinoma.

https://doi.org/10.1371/journal.pcbi.1012999.g006

Next, we performed enrichment analysis of the DMA predicted driver genes in basal-like breast cancer and lung adenocarcinoma to understand whether DMA and GMA can identify distinct or overlapping biological mechanisms. The significantly enriched terms (adjusted p-value < 0.05) among the DMA predicted driver genes in basal-like breast cancer were angiogenesis, KRAS signaling up, epithelial mesenchymal transition, and IL-2/STAT5 signaling (Fig 7A) while among the GMA predicted drivers they were IL-6/JAK/STAT3 signaling, UV response dn, KRAS signaling up and adipogenesis (Fig 3A). Thus, results from both GMA and DMA were enriched in the KRAS signaling term only.

thumbnail
Fig 7. Enrichment analysis of predicted mutation-driven driver genes.

Enrichment analysis of (A) mutation-driven driver genes predicted by Driver Mutation Analysis (DMA) in basal-like breast cancer, (B) mutation-driven driver genes predicted by Driver Mutation Analysis (DMA) in lung adenocarcinoma, and (C) driver genes predicted by both DMA and GMA in lung adenocarcinoma. The “MSigDB Hallmark 2020” database was used for the enrichment analyses. The top 10 most significantly enriched terms (adjusted p-value < 0.05) are included. The gene ratio on the x axis is the ratio between the number of predicted driver genes that intersect with genes annotated in the given hallmark gene set and the total number of genes annotated in the respective hallmark gene set. The point sizes reflect the number of driver genes playing a role in the respective hallmark gene set.

https://doi.org/10.1371/journal.pcbi.1012999.g007

Both GMA and DMA identified NRP1 as a driver gene involved in KRAS signaling. NRP1 has been shown to be highly expressed in different cancer types [64] and together with FSTL1 is predicted to be driver for basal-like breast cancer by DMA. These two genes are involved in angiogenesis, one of the cancer hallmarks [6567], which is also prognostic indicators of survival in breast cancer [68,69]. Additionally, for lung adenocarcinoma, key enriched terms for both DMA and GMA predicted driver genes included G2-M checkpoint, E2F targets and mTORC1 signaling (Figs 3B and 7B), suggesting that the two mechanistic indicators identify at least partially overlapping biological processes. These processes are all important in cancer progression or metastasis [7073].

Finally, we also performed gene enrichment analysis of the driver genes identified by both DMA and GMA. An enrichment analysis of the 141 overlapping driver genes between GMA and DMA in lung adenocarcinoma revealed E2F targets, G2-M checkpoint and mTORC1 signaling to again be the most significant (Fig 7C), covering a vast majority of the overlapping genes. A similar enrichment analysis of the 13 overlapping driver genes between GMA and DMA in basal-like breast cancer revealed no significantly enriched terms.

Using methylation- and mutation-driven predictions of driver genes both have strengths and drawbacks. These two approaches are complementary as they provide a more comprehensive insight into cancer biology when used together. The quantitative nature of methylation alterations allows for an easier direct link to expression changes compared to mutations. Methylation changes capture a dynamic and reversible state, challenging their prediction as drivers, unlike mutations which are fixed changes. Additionally, methylation alterations may not always be causal and may often occur as a secondary consequence of other molecular events such as mutations. For example, methylation aberrations can often be traced back to mutations.

By integrating gene expression profiles with methylation levels, we can gain novel insights into the underlying cancer-causing mechanisms. Incorporating methylation changes provides a mechanistic explanation of the observed deregulated expression patterns and can generate new hypotheses of how epigenetics plays a role in oncogenic pathways. For instance, following prediction of methylation-driven OCGs and TSGs across cancer (sub)types, it can be speculated that certain driver genes are relevant in specific cancer (sub)types. Additionally, stage-specific methylation patterns in cancer progression can be explored. Given the reversible nature of methylation, some methylation-driven drivers might be active in specific stages of cancer development. These hypotheses can be investigated with the new GMA function.

Availability and future directions

The data that support the findings of this study are openly available in The Cancer Genome Atlas (https://www.cancer.gov/tcga). The data used for this analysis are available at the Genomic Data Commons (https://portal.gdc.cancer.gov). GitHub and OSF repositories associated with this study are available at https://github.com/ELELAB/Moonlight2R, https://github.com/ELELAB/Moonlight2_GMA_case_studies, and https://osf.io/j4n8q/. Example data and vignette are available in S1 Data.

In the future, we envision incorporation of additional secondary -omics layers such as chromatin accessibility and copy number variation. Moreover, we would like to implement proteomics and single-cell RNA sequencing data as additional input data types. Finally, in the future, experimental studies are needed to validate the key driver genes.

Supporting information

S1 Fig. Integration of Moonlight and EpiMix for prediction of cancer driver genes.

(A) Heatmap showing number of differentially methylated CpGs and classifications of methylation status in the oncogenic mediators in lung adenocarcinoma. The heatmap was generated using the plotGMA function. (B) Venn diagram comparing oncogenic mediators predicted from Moonlight’s primary layer with functional genes predicted from EpiMix in lung adenocarcinoma. The functional genes are genes containing differentially methylated CpG pairs whose DNA methylation state is associated with expression of the gene. Only those functional genes that contained the same methylation state in all of its associated CpGs were included in this comparison, and moreover, the dual methylation states were excluded. (C) Heatmap showing the effect of the predicted driver genes in lung adenocarcinoma on apoptosis and proliferation of cells. This heatmap was generated using the function plotMoonlightMet. These effects define the basis upon which the oncogenic mediators are predicted from the PRA step in Moonlight’s primary layer. (D) Heatmap showing number of differentially methylated CpGs and classifications of methylation status in the oncogenic mediators in thyroid carcinoma. The heatmap was generated using the plotGMA function. (E) Venn diagram comparing oncogenic mediators predicted from Moonlight’s primary layer with functional genes predicted from EpiMix in thyroid carcinoma. The functional genes are genes containing differentially methylated CpG pairs whose DNA methylation state is associated with expression of the gene. Only those functional genes that contained the same methylation state in all of its associated CpGs were included in this comparison, and moreover, the dual methylation states were excluded. (F) Heatmap showing the effect of the predicted driver genes in thyroid carcinoma on apoptosis and proliferation of cells. This heatmap was generated using the function plotMoonlightMet. These effects define the basis upon which the oncogenic mediators are predicted from the PRA step in Moonlight’s primary layer.

https://doi.org/10.1371/journal.pcbi.1012999.s001

(PDF)

S2 Fig. Number of driver gene-drug interactions.

Number of driver gene-drug interactions in (A) basal-like breast cancer, (B) lung adenocarcinoma, and (C) thyroid carcinoma found by querying DGIdb. The driver genes are stratified into OCGs and TSGs. The number of drug interactions is shown on the x axis and the driver genes are shown on the y axis.

https://doi.org/10.1371/journal.pcbi.1012999.s002

(PDF)

S1 Text. Methods of case study: Prediction of driver genes with differential methylation in basal-like breast cancer, lung adenocarcinoma, and thyroid carcinoma using Moonlight.

https://doi.org/10.1371/journal.pcbi.1012999.s003

(PDF)

S2 Text. Results from driver-gene drug target analysis using new version of DGIdb.

https://doi.org/10.1371/journal.pcbi.1012999.s004

(PDF)

S1 Data. Moonlight2R source code, with documentation and examples.

https://doi.org/10.1371/journal.pcbi.1012999.s005

(ZIP)

References

  1. 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. pmid:33538338
  2. 2. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–24. pmid:19360079
  3. 3. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. Cancer genome landscapes. Science. 2013;339(6127):1546–58. pmid:23539594
  4. 4. Shen L, Shi Q, Wang W. Double agents: genes with both oncogenic and tumor-suppressor functions. Oncogenesis. 2018;7(3):25. pmid:29540752
  5. 5. Datta N, Chakraborty S, Basu M, Ghosh MK. Tumor suppressors having oncogenic functions: the double agents. Cells. 2020;10(1):46. pmid:33396222
  6. 6. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70.
  7. 7. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–74. pmid:21376230
  8. 8. Hanahan D. Hallmarks of cancer: new dimensions. Cancer Discov. 2022;12(1):31–46. pmid:35022204
  9. 9. Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform. 2024;25(2):bbad519. pmid:38261338
  10. 10. Konda P, Garinet S, Van Allen EM, Viswanathan SR. Genome-guided discovery of cancer therapeutic targets. Cell Rep. 2023;42(8):112978. pmid:37572322
  11. 11. Liu Y, Hu X, Han C, Wang L, Zhang X, He X, et al. Targeting tumor suppressor genes for cancer therapy. Bioessays. 2015;37(12):1277–86. pmid:26445307
  12. 12. Yu X, Zhao H, Wang R, Chen Y, Ouyang X, Li W, et al. Cancer epigenetics: from laboratory studies and clinical trials to precision medicine. Cell Death Discov. 2024;10(1):28. pmid:38225241
  13. 13. Iannuccelli M, Micarelli E, Surdo PL, Palma A, Perfetto L, Rozzo I, et al. CancerGeneNet: linking driver genes to cancer hallmarks. Nucleic Acids Res. 2020;48(D1):D416–21. pmid:31598703
  14. 14. Colaprico A, Olsen C, Bailey MH, Odom GJ, Terkelsen T, Silva TC, et al. Interpreting pathways to discover cancer driver genes with Moonlight. Nat Commun. 2020;11(1):69. pmid:31900418
  15. 15. Nourbakhsh M, Saksager A, Tom N, Chen XS, Colaprico A, Olsen C, et al. A workflow to study mechanistic indicators for driver gene prediction with Moonlight. Brief Bioinform. 2023;24(5):bbad274. pmid:37551622
  16. 16. Ciriello G, Miller ML, Aksoy BA, Senbabaoglu Y, Schultz N, Sander C. Emerging landscape of oncogenic signatures across human cancers. Nat Genet. 2013;45(10):1127–33. pmid:24071851
  17. 17. Ehrlich M. DNA hypomethylation in cancer cells. Epigenomics. 2009;1(2):239–59. pmid:20495664
  18. 18. Esteller M. Epigenetics in cancer. N Engl J Med. 2008;358(11):1148–59. pmid:18337604
  19. 19. Lakshminarasimhan R, Liang G. The role of DNA methylation in cancer. Adv Exp Med Biol. 2016;945: 151–72.
  20. 20. Søes S, Daugaard IL, Sørensen BS, Carus A, Mattheisen M, Alsner J, et al. Hypomethylation and increased expression of the putative oncogene ELMO3 are associated with lung cancer development and metastases formation. Oncoscience. 2014;1(5):367–74. pmid:25594031
  21. 21. Zheng Y, Jun J, Brennan K, Gevaert O. EpiMix is an integrative tool for epigenomic subtyping using DNA methylation. Cell Rep Methods. 2023;3(7):100515. pmid:37533639
  22. 22. Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn). 2015;19(1A):A68-77. pmid:25691825
  23. 23. Hutter C, Zenklusen JC. The Cancer Genome Atlas: creating lasting value beyond its data. Cell. 2018;173(2):283–5. pmid:29625045
  24. 24. Moore LD, Le T, Fan G. DNA methylation and its basic function. Neuropsychopharmacol. 2013;38(1):23–38. pmid:22781841
  25. 25. Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer. 2018;18(11):696–705. pmid:30293088
  26. 26. Rahimi M, Teimourpour B, Marashi S-A. Cancer driver gene discovery in transcriptional regulatory networks using influence maximization approach. Comput Biol Med. 2019;114:103362. pmid:31561101
  27. 27. Akhavan-Safar M, Teimourpour B. KatzDriver: a network based method to cancer causal genes discovery in gene regulatory network. Biosystems. 2021;201:104326. pmid:33309969
  28. 28. Wei P-J, Zhang D, Li H-T, Xia J, Zheng C-H. DriverFinder: a gene length-based network method to identify cancer driver genes. Complexity. 2017;2017:1–10.
  29. 29. Pham VVH, Liu L, Bracken CP, Goodall GJ, Long Q, Li J, et al. CBNA: A control theory based method for identifying coding and non-coding cancer drivers. PLoS Comput Biol. 2019;15(12):e1007538. pmid:31790386
  30. 30. Dinstag G, Shamir R. PRODIGY: personalized prioritization of driver genes. Bioinformatics. 2020;36(6):1831–9. pmid:31681944
  31. 31. Wei P-J, Zhang D, Xia J, Zheng C-H. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network. BMC Bioinform. 2016;17(Suppl 17):467. pmid:28155630
  32. 32. Zhang D, Bin Y. DriverSubNet: a novel algorithm for identifying cancer driver genes by subnetwork enrichment analysis. Front Genet. 2021;11:607798. pmid:33679866
  33. 33. Li A, Chapuy B, Varelas X, Sebastiani P, Monti S. Identification of candidate cancer drivers by integrative Epi-DNA and Gene Expression (iEDGE) data analysis. Sci Rep. 2019;9(1):16904. pmid:31729402
  34. 34. Han Y, Yang J, Qian X, Cheng W-C, Liu S-H, Hua X, et al. DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies. Nucleic Acids Res. 2019;47(8):e45. pmid:30773592
  35. 35. Gu H, Xu X, Qin P, Wang J. FI-Net: identification of cancer driver genes by using functional impact prediction neural network. Front Genet. 2020;11:564839. pmid:33244318
  36. 36. Hou Y, Gao B, Li G, Su Z. MaxMIF: a new method for identifying cancer driver genes through effective data integration. Adv Sci (Weinh). 2018;5(9):1800640. pmid:30250803
  37. 37. Zapata L, Susak H, Drechsel O, Friedländer MR, Estivill X, Ossowski S. Signatures of positive selection reveal a universal role of chromatin modifiers as cancer driver genes. Sci Rep. 2017;7(1):13124. pmid:29030609
  38. 38. Terekhanova NV, Karpova A, Liang W-W, Strzalkowski A, Chen S, Li Y, et al. Epigenetic regulation during cancer transitions across 11 tumour types. Nature. 2023;623(7986):432–41.
  39. 39. D’Anello L, Sansone P, Storci G, Mitrugno V, D’Uva G, Chieco P, et al. Epigenetic control of the basal-like gene expression profile via Interleukin-6 in breast cancer cells. Mol Cancer. 2010;9:300. pmid:21092249
  40. 40. Lin Y-T, Wu K-J. Epigenetic regulation of epithelial-mesenchymal transition: focusing on hypoxia and TGF-β signaling. J Biomed Sci. 2020;27(1):39. pmid:32114978
  41. 41. Skrypek N, Goossens S, De Smedt E, Vandamme N, Berx G. Epithelial-to-mesenchymal transition: epigenetic reprogramming driving cellular plasticity. Trends Genet. 2017;33(12):943–59. pmid:28919019
  42. 42. Liu Q-L, Luo M, Huang C, Chen H-N, Zhou Z-G. Epigenetic regulation of epithelial to mesenchymal transition in the cancer metastatic cascade: implications for cancer therapy. Front Oncol. 2021;11:657546. pmid:33996581
  43. 43. Lu W, Kang Y. Epithelial-mesenchymal plasticity in cancer progression and metastasis. Dev Cell. 2019;49(3):361–74. pmid:31063755
  44. 44. Yang H, Gan L, Chen R, Li D, Zhang J, Wang Z. From multi-omics data to the cancer druggable gene discovery: a novel machine learning-based approach. Brief Bioinform. 2023;24(1):bbac528. pmid:36515158
  45. 45. Yoshimaru T, Nakamura Y, Katagiri T. Functional genomics for breast cancer drug target discovery. J Hum Genet. 2021;66(9):927–35. pmid:34285339
  46. 46. Zsákai L, Sipos A, Dobos J, Erős D, Szántai-Kis C, Bánhegyi P, et al. Targeted drug combination therapy design based on driver genes. Oncotarget. 2019;10(51):5255–66. pmid:31523388
  47. 47. Cannon M, Stevenson J, Stahl K, Basu R, Coffman A, Kiwala S, et al. DGIdb 5.0: rebuilding the drug-gene interaction database for precision medicine and drug discovery platforms. Nucleic Acids Res. 2024;52(D1):D1227–35. pmid:37953380
  48. 48. Souglakos J, Boukovinas I, Taron M, Mendez P, Mavroudis D, Tripaki M, et al. Ribonucleotide reductase subunits M1 and M2 mRNA expression levels and clinical outcome of lung adenocarcinoma patients treated with docetaxel/gemcitabine. Br J Cancer. 2008;98(10):1710–5. pmid:18414411
  49. 49. Atwell B, Chen C-Y, Christofferson M, Montfort WR, Schroeder J. Sorting nexin-dependent therapeutic targeting of oncogenic epidermal growth factor receptor. Cancer Gene Ther. 2023;30(2):267–76. pmid:36253541
  50. 50. Robichaux JP, Le X, Vijayan RSK, Hicks JK, Heeke S, Elamin YY, et al. Structure-based classification predicts drug response in EGFR-mutant NSCLC. Nature. 2021;597(7878):732–7. pmid:34526717
  51. 51. Robichaux JP, Elamin YY, Tan Z, Carter BW, Zhang S, Liu S, et al. Mechanisms and clinical activity of an EGFR and HER2 exon 20-selective kinase inhibitor in non-small cell lung cancer. Nat Med. 2018;24(5):638–46. pmid:29686424
  52. 52. Cohen P, Cross D, Jänne PA. Kinase drug discovery 20 years after imatinib: progress and future directions. Nat Rev Drug Discov. 2021;20(7):551–69. pmid:34002056
  53. 53. Poels KE, Schoenfeld AJ, Makhnin A, Tobi Y, Wang Y, Frisco-Cabanos H, et al. Identification of optimal dosing schedules of dacomitinib and osimertinib for a phase I/II trial in advanced EGFR-mutant non-small cell lung cancer. Nat Commun. 2021;12(1):3697. pmid:34140482
  54. 54. van Alderwerelt van Rosenburgh IK, Lu DM, Grant MJ, Stayrook SE, Phadke M, Walther Z, et al. Biochemical and structural basis for differential inhibitor sensitivity of EGFR with distinct exon 19 mutations. Nat Commun. 2022;13(1):6791. pmid:36357385
  55. 55. Pu X, Zhou Y, Kong Y, Chen B, Yang A, Li J, et al. Efficacy and safety of dacomitinib in treatment-naïve patients with advanced NSCLC harboring uncommon EGFR mutation: an ambispective cohort study. BMC Cancer. 2023;23(1):982. pmid:37840124
  56. 56. Momeny M, Zarrinrad G, Moghaddaskho F, Poursheikhani A, Sankanian G, Zaghal A, et al. Dacomitinib, a pan-inhibitor of ErbB receptors, suppresses growth and invasive capacity of chemoresistant ovarian carcinoma cells. Sci Rep. 2017;7(1):4204. pmid:28646172
  57. 57. Brischwein K, Schlereth B, Guller B, Steiger C, Wolf A, Lutterbuese R, et al. MT110: a novel bispecific single-chain antibody construct with high efficacy in eradicating established tumors. Mol Immunol. 2006;43(8):1129–43. pmid:16139892
  58. 58. Ferrari F, Bellone S, Black J, Schwab CL, Lopez S, Cocco E, et al. Solitomab, an EpCAM/CD3 bispecific antibody construct (BiTE®), is highly active against primary uterine and ovarian carcinosarcoma cell lines in vitro. J Exp Clin Cancer Res. 2015;34:123. pmid:26474755
  59. 59. Keller L, Werner S, Pantel K. Biology and clinical relevance of EpCAM. Cell Stress. 2019;3(6):165–80. pmid:31225512
  60. 60. Cui Y, Li J, Liu X, Gu L, Lyu M, Zhou J, et al. Dynamic expression of EpCAM in primary and metastatic lung cancer is controlled by both genetic and epigenetic mechanisms. Cancers (Basel). 2022;14(17):4121. pmid:36077658
  61. 61. Kebenko M, Goebeler M-E, Wolf M, Hasenburg A, Seggewiss-Bernhardt R, Ritter B, et al. A multicenter phase 1 study of solitomab (MT110, AMG 110), a bispecific EpCAM/CD3 T-cell engager (BiTE®) antibody construct, in patients with refractory solid tumors. Oncoimmunology. 2018;7(8):e1450710. pmid:30221040
  62. 62. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446(7132):153–8. pmid:17344846
  63. 63. Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9(1):34. pmid:28420421
  64. 64. Chen Z, Gao H, Dong Z, Shen Y, Wang Z, Wei W, et al. NRP1 regulates radiation-induced EMT via TGF-β/Smad signaling in lung adenocarcinoma cells. Int J Radiat Biol. 2020;96(10):1281–95. pmid:32659143
  65. 65. Madu CO, Wang S, Madu CO, Lu Y. Angiogenesis in breast cancer progression, diagnosis, and treatment. J Cancer. 2020;11(15):4474–94. pmid:32489466
  66. 66. Teleanu RI, Chircov C, Grumezescu AM, Teleanu DM. Tumor angiogenesis and anti-angiogenic strategies for cancer treatment. J Clin Med. 2019;9(1):84. pmid:31905724
  67. 67. Badodekar N, Sharma A, Patil V, Telang G, Sharma R, Patil S, et al. Angiogenesis induction in breast cancer: a paracrine paradigm. Cell Biochem Funct. 2021;39(7):860–73. pmid:34505714
  68. 68. Vartanian RK, Weidner N. Correlation of intratumoral endothelial cell proliferation with microvessel density (tumor angiogenesis) and tumor cell proliferation in breast carcinoma. Am J Pathol. 1994;144(6):1188–94. pmid:7515558
  69. 69. Weidner N, Semple JP, Welch WR, Folkman J. Tumor angiogenesis and metastasis--correlation in invasive breast carcinoma. N Engl J Med. 1991;324(1):1–8. pmid:1701519
  70. 70. Gargalionis AN, Papavassiliou KA, Papavassiliou AG. Implication of mTOR signaling in NSCLC: mechanisms and therapeutic perspectives. Cells. 2023;12(15):1–10. pmid:37566093
  71. 71. Nevins JR. The Rb/E2F pathway and cancer. Hum Mol Genet. 2001;10(7):699–703. pmid:11257102
  72. 72. Kent LN, Leone G. The broken cycle: E2F dysfunction in cancer. Nat Rev Cancer. 2019;19: 326–38.
  73. 73. Stark GR, Taylor WR. Analyzing the G2/M Checkpoint. Checkpoint controls and cancer. New Jersey: Humana Press; 2004. p. 051–82. https://doi.org/10.1385/1-59259-788-2:051 pmid:15187249