Introduction

Currently, the usage of websites is becoming more widespread, and it can now be found in several facets of everyday life, which results in several security threats rise. In addition, the complexity and frequency of cyber-attacks have surged significantly1. These security threats and their adverse consequences, such as unauthorized access by attackers seeking sensitive data, emphasize the urgent need for robust cybersecurity measures. For example, intrusion detection is one of the significant kinds of attack, which is unauthorized action in a system network. Therefore, researchers discovered a Network Intrusion Detection system (NIDS) to control problems in network intrusion2,3.

The connectivity can be exploited in a variety of ways4. Distributed Denial of Service (DDoS) attack is a significant harmful type of attack, which may make it impossible for legal consumers to access their virtual networks5,6,7. By multicasting the network with a considerable amount of high traffics, a DDoS assault might exhaust network resources or targeted servers. With the enormous number of bots from various locations, the attackers can launch an DDoS attack. When a DDos attack needs bots to be implemented for processing, it is complex to find8,9,10. Furthermore, these assaults deplete network resources in a matter of seconds. Thus, to detect the DDoS attack, a robust method is needed.

In recent years, many methods relying on the Machine Learning (ML) approach to detect DDoS attacks have been presented, most of which depend on supervised or unsupervised methods to discover relevant characteristics11. Support vector, decision tree, random forest, K-nearest neighbor (KNN), principal component analysis, Gaussian mixture model, and naïve Bayes are major ML-based algorithms used in existing approaches.

Existing powerful ML algorithms, on the other hand, use a small number of input data using a vast set of data for such an ML approach is time-consuming12,13. Moreover, the high dimensionality and nonlinear characteristics of large datasets make these algorithms inefficient for multiple classification tasks, leading to reduced performance and increased complexity in handling diverse attack types14.

The deep convolutional generative adversarial network, the convolutional neural Network (CNN), the deep long short-term memory recurrent neural network, and long short-term memory are the deep learning techniques to the subdivision of the machine learning approach15. Above all are applied in DDos attacks to detect and gain the perfect classification solution, significantly enhancing extensive data. Deep learning enhances classification results by expanding deep features across multiple layers, enabling the extraction of more representative subsets of features16,17,18,19. A key advantage of deep learning is its ability to operate without a feature selection process. However, achieving optimal detection performance with deep learning requires meticulous selection of structural parameters to construct an effective model20. As a result, the optimized process is utilized to predict a set of parameters from which an efficient DL framework can be built.

This work is motivated by the increasing need for robust intrusion detection systems (IDS) that can effectively detect Distributed Denial of Service (DDoS) attacks in real-time, especially in the presence of imbalanced datasets where traditional methods often struggle. Current state-of-the-art techniques, while effective, frequently face challenges in adapting to diverse network conditions and require complex, computationally intensive models. While many articles have targeted the issue of DDoS detection using deep learning techniques, our work distinguishes itself through several key contributions. Therefore, this paper presents a hybrid optimization-based deep belief network for DDoS attacks detection. The Stacked Sparse Denoising Autoencoder (SSDAE) can learn complex features, which is achieved by its layer-by-layer studying strategy where the greater-level features are learned from the previous layers; hence the greater-level features better withdraw the instructions from the input data’s structures.

Further, hybridization of Optimization techniques with deep belief networks significantly enhance DDoS attack detection by improving accuracy, speed, and scalability. This hybridization also enables models to handle large, complex datasets, adapt to evolving threats, and operate efficiently in real-time environments. This leads to more effective and reliable DDoS protection, which is crucial in safeguarding networks against these pervasive and potentially devastating attacks. For this reason, we use a hybrid firefly-black widow optimization algorithm. Firefly is quite powerful and relatively efficient, and it can achieve promising results. The search used in FA is based on randomness, so it cannot always get the globally best values. To overcome this disadvantage, we integrate the black widow optimization algorithm, which achieves faster convergence and optimal predicted values compared to other approaches and can deliver aggressive and attractive outcomes. The main contribution of the proposed approach as follows,

  • Developed an excellent intrusion detection system for DDoS attacks using SSDAE, Firefly, and Black Widow Optimization. This hybrid technique improves the model’s ability to achieve global optimality, which increases detection accuracy.

  • Introduced a CGAN model to address data imbalance issues, significantly boosting the classifier’s performance.

  • To assess the efficiency of the suggested approach, F-score, precision, Area under Curve (AUC), accuracy, and recall are evaluated.

  • The effectiveness of the model was analyzed under imbalanced and balanced data.

  • The model’s robustness is proven by examining various benchmark datasets.

  • Finally, the performance was compared using various state-of-the-art methods.

The last portion of the paper is described as follows: we analyzed recent studies about intrusion detection system in Division 2; the brief problem statement is introduced in Division 3; the proposed methodology is introduced in Division 4; Division 5 produces the acquired results and its simplification and last division 6 report the conclusion.

Related works

For the past few years, various researchers have suggested many different techniques for intrusion detection. In this section, a few of them are discussed.

To address the problem of low accuracy and feature extraction, Su et al.21 developed the BAT method, which combines the attention mechanism and BLSTM (Bidirectional Long Short-term memory). The combined model obtains the key characteristics of traffic data. With these characteristics, the softmax classifier was used to classify the attacks. The performance was analysed using the NSL-KDD data set.

Multi-architectural modular deep neural network model is presented by Atefinia and Ahmadi22 to reduce the number of false positives in intrusion detection systems that use anomaly detection. It includes a stack of restricted Boltzmann machine modules, the output weights of these modules, the feed-forward module, and two recurrent modules that are put into the aggregator method to generate the answer of the process. Despite these advancements, handling imbalanced datasets remains a significant challenge, often leading to biased classification results. Techniques such as oversampling, undersampling, and synthetic data generation methods are utilized to address this issue, though each has inherent limitations.

Nguyen and Kim23 proposed a network intrusion detection system (NIDS) that combines several advanced techniques, including convolutional neural networks (CNN), fuzzy C-means clustering (FCM), exhaustive search with a genetic algorithm, and a bagging classifier. Their model emphasizes high-quality feature extraction using a three-layered approach. The hybrid system, which integrates CNN with the bagging classifier, was tested on the NSL-KDD dataset and demonstrated a significant improvement in detection accuracy.

To control the intrusion flow in complex networks, Wu et al.24 suggested a novel intrusion detection model named SRDLM which is based on semantic re-encoding and deep learning. This approach re-encodes the interpretation of network activity and improves traffic distinguish ability and the algorithm’s generalization ability. The SRDLM models was trained using NSL-KDD dataset and attained a detection rate of 99%.

Elmasry et al.25 developed a particle swarm optimization (PSO)-based approach that selects both the subset of attributes and the evaluation metrics in the same step. The abovementioned approach is used during pre-training to choose the optimum features and model hyperparameters dynamically. To evaluate the performance, they utilized three deep learning models: Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN), Deep Belief Networks (DBN), and Deep Neural Networks (DNN). Furthermore, the integration of sophisticated optimization algorithms, such as genetic algorithms and particle swarm optimization, has demonstrated potential in fine-tuning model parameters, resulting in enhanced detection accuracy and lower false positive rates.

Khan26 developed a deep learning (DL)-based composite ID architecture (HCRNNIDS) for anticipating and identifying harmful network assaults. In the HCRNNIDS, the CNN uses convolution to capture local features, while the RNN captures temporal features to enhance system performance and prediction. To evaluate the hybrid convolutional recurrent neural network intrusion detection system, experiments were conducted on publicly accessible realistic CSE-CIC-DS2018 data. The model achieves a detection accuracy of 97.75%.

The increasing demand for sophisticated intrusion detection systems (IDSs) arises when conventional rule-based approaches fail to adequately address dynamic cyber threats. Wang et al.27 introduced IDS-CNN, an innovative method that utilizes Convolutional Neural Networks (CNNs) for real-time and automated intrusion detection. Developed with open-source technologies. IDS-CNN model was trained and tested using NSL-KDD dataset and demonstrates acceptable accuracy and precision. Nonetheless, obstacles such as computing requirements and scalability constraints in extensive networks persist. Although promising, additional research is required to verify its efficacy in more intricate, real-world contexts.

In order to surmount the convergence speed and generalization capacity of CNN during the model training procedure Wang et al.28 suggest the use of a deep multi-scale convolutional neural network (DMCNN) to detect network intrusions. Different scales of convolution kernels are employed to extract varying levels of features from a large number of high-dimensional data. Additionally, the batch normalization method is employed to optimize the learning rate of the network structure to obtain the most optimal features. The performance of the proposed model is analyzed and compared to that of existing works using NSL-KDD. The experimental results indicate that the Accuracy and TPR have been enhanced.

In order to identify the low rate DoS attacks, Tang et al.29 developed a convolution neural network and multi-feature fusion-based technique. It calculates many network attributes and combines them into a feature space that will be used to describe the network’s current condition. Experiments using the NS2 simulation and test-bed platforms validate that the method effectively detects LDoS attacks.

Lu et al.30 developed a DLAMD model to determine the best feature combination in the fifth-generation environment for classifying dangerous and benign software. The proposed system has two phases for rapid and deep detection, including features based on application permissions and opcodes (i.e. combining random forest for feature selection, CNNs for opcode extraction, and LSTM for time-series learning). The experimental results showed that DLAMD’s F1 score for classification may reach 95.69%.

Lu et al.31 introduced a Backpropagation Neural Network (BPNN) based attack detection model. Here, for the weight updation of BPNN, an adaptive clonal genetic algorithm (ACGA) is used. Whenever the weight is optimized in ACGA, the reactive controller changes the likelihood of crossover and mutation, and the clone operator can sustain the best community.

Zhang et al.32 developed a flow-based intrusion detection model known as SGM-CNN. The proposed SGM-CNN model uses a hybrid model based on oversampling (SMOTE) and undersampling (GMM-based clustering) to overcome data uncertainty. Then the proposed model utilises UNSW-NB15 and CICIDS2017 datasets for binary and multi-class classification with 99.74% and 96.54% detection rates. Moreover, this work does not concentrate more on the feature selection component.

Chang et al.33 developed a methodology to improve the IDS model by distinguishing between benign and attack data. This framework employs a variety of ML models, including Bayes networks, decision trees, random forests, and nearest neighbors. The KDDCUP-1999 dataset was utilized to analyze the performance, and the detection rate exceeded 98%.

To enhance IIoT network protection, multiple novel models were designed by Khan et al.34,35,36. These works address the problems of misclassification errors, identifying the most essential features, computational costs, using the UNSWNB-15 and gas pipeline-based ICS network data.

Many recent studies on DDoS detection utilizing Machine/deep learning algorithms have shortcomings, which we intend to solve. For example, while numerous techniques have been presented to improve detection accuracy, they frequently encounter outdated datasets, imbalanced datasets, resulting in biased results and lower detection rates for minority groups. Furthermore, standard optimization approaches utilized in these models, such as gradient descent and Adam optimization, may become locked in local minima, lowering the overall efficacy of the model. Furthermore, the adaptability of these models to different and changing network traffic conditions is frequently limited, making them ineffective in real-world applications. Our study tackles these problems by adding a hybrid optimization strategy (FA-BWO), improving the model’s convergence to global optima, and increasing accuracy and resilience. We also use cGAN to balance the dataset, resulting in more reliable detection across all classes. To test the proposed approach, the latest publicly accessible dataset CICDDoS2019 is utilized, which contains a wide range of DDoS attacks and fills in the gaps in previous datasets. In addition to that, we also conducted extensive tests on several benchmark datasets, which shows that our proposed technique not only overcomes these constraints but also establishes a new record for DDoS detection efficiency.

Problem statement

In today’s world, communication technology plays an essential role. Also, with the fast development of the internet and digital systems, massive volumes of data are transferred daily. It necessitates the continuous running of connected devices to meet final requirements. Furthermore, device connections to computers and internet networks expose most everyday network operations to cyber-security risks. As a result, operators and specialists are concentrating their efforts on spotting various system threats. Traditional security measures, though essential, are increasingly inadequate against advanced attacks that can circumvent firewalls and encryption methods. Consequently, there is a heightened emphasis on implementing more dynamic and intelligent systems such as IDS. Conventional methods for exterior network security, like access cryptography, firewalls, and control, have proven ineffective in protecting against internal threats. As a result, the IDS receive a lot of attention because it detects interior and exterior intrusions and then responds quickly to them. Depending on the IDS’s alerts, the attack management system, a vital element of the IDS, can take appropriate countermeasures to guarantee that the desktop network system checking is not disrupted. A well-designed IDS not only detects potential threats but also offers actionable insights and automated responses to mitigate the impact of an attack. This proactive approach is crucial for maintaining the integrity and availability of network services. Hence, we proposed a NIDS design that utilizes deep learning to reinforce computer network security.

Proposed methodology

This section presents the proposed DDoS attack detection model. The device consists of three main modules: an imbalance processing module, a classification decision module, and a preprocessing data module. Socket features, data cleaning, and data normalization operations are performed in preprocessing. The imbalance development model generates the training set to reduce the bias in the analytical outcomes induced by the imbalance in the data. We proposed a new approach, cGAN, to execute a fully balanced sampling dataset. Finally, the SSDAE is designed to perform classification in the classification decision module. It extracts the deep attributes of the training data and performs classification. Due to the random initialization of the weight parameters, the training time of SSDAE is increased and falls into the local optimum. The firefly-Blackwidow optimization-based optimal weight selection process is conducted to overcome this. As outlined here, we undertake binary class classifications on the CICDDoS2019 dataset to measure efficiency in a current network environment. The framework of the proposed technique is shown in Fig. 1.

Fig. 1
figure 1

System architecture of proposed system.

Algorithmic STEPS of the proposed model

The step by step procedure of the proposed model was represented in Algorithm 1. This sequence of steps shows the detailed steps followed from dataset preparation to classification.

Algorithm 1
figure a

Proposed model steps.

Dataset description

The numerous DDoS assault datasets are accessible for deep learning testing; the most recent dataset available is CICDDoS2019, which includes two parts of DDoS attacks, exploitation-based and reflection-based. Reflection depends upon TCP, UDP, or both. The Simple Service Discovery Protocol (SSDP) can be another TCP-based assault focusing on delivering massive traffic to an intended target, flooding the intended network, and putting off the online resource. TCP and UDP are used in exploit-based attacks. SYN flood assaults, which function by exploiting a TCP router handshake process, are used in TCP intrusions. Another form of threat is UDP-Lag, which intruders employ to disrupt a link in multiplayer activities when they want to impact the efficiency of other players. We have considered this traffic data, since it contains several latest attacks when compared to previous datasets. This dataset also contains huge volumes of traffic for each type of attack, due to the resource constraints, few samples are considered from each attack along with benign traffic and used for experimentation.

Data preprocessing

We organize the data to be ready for the learning algorithm immediately. In a circulation manner, the CICDDoS2019 dataset is provided. Before the component testing, we go through a few processes to collect the data required.

Removing socket features

Every socket feature is removed, such as server and client IP, timestamp, flow ID source, and destination port. Such attributes differ from node to node; hence we must test the model using packet properties. In some cases, both the intruder and casual users can have an identical IP address, which makes the model difficult to identify the attacks. Deleting the socket attributes results 77 attributes for the input model.

Cleaning the data

To handle unavailable or corrupted data, we assessed the entire datasetTo do so, they initially evaluated whether instances had incomplete data and which had insufficient amounts, such as −inf, +inf, NaN, etc. Because the dataset had a significant amount of traffic data for each attack pattern, eliminated any specimen that contained incomplete or corrupted entries.

One hot encoding

One-hot encoding is the primary approach for dealing with the numeralization of ordinal attributes since it is a feasible and elegant technique. Ordinal attributes transfer into binary vectors containing one unit with a value of 1 and the other units of 0. An entity with several 1 indicates the possibility of feasible numbers corresponding to the category feature.

Data normalization

During the normalization process, data scaling is used to equalize the wide variety of data attributes, allowing the suggested classification approach to identify the best solution faster. To scale attribute values, we employ the maximum-minimum normalization approach. According to Eq. (1), all attribute values are normalized within a defined range of [0, 1]

$$x^{\prime} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }}$$
(1)

where ‘x’ is an initial value and ‘x'’ is the normalized gain. The dataset’s shortest and highest values are represented in min and max, respectively, and the normalized values range from 0 and 1. The min and max values computed for every column are employed to normalize the data in the training part.

Imbalanced data handling using cGAN

CICDDoS2019 is an imbalanced dataset; therefore, two extensive class descriptions for over 50%. Hence, balancing the dataset is very important before classification. For this purpose, a conditional Generative Adversarial Network (cGAN) based oversampling technique is performed to balance the dataset. A producer G and a discriminator D compete to outperform one another in the cGAN architecture, which is an expansion of the GAN architecture. The discriminator’s goal is to discriminate between occurrences produced by the generator and examples from the given dataset. Furthermore, the generator module G is represented as G: Z →X, while Z is the noise space of arbitrary dimension dz and the data space x, is the goal to get the data administering. D: X → [0, 1] represents the module discriminative, and the likelihood that a specimen comes out from the distribution of data instead of G is estimated. The cGAN framework extends the probabilistic model G to include the extra space Y, which denotes external data from the training data expressed as:

$$G: \, Z \, \times \, Y \, \to \, X$$
(2)

The discriminator D is adjusted in the same way as G:

$$D: \, X \, \times \, Y \, \to \, \left[ {0, \, 1} \right]$$
(3)

This two-method MLP and two-player min-max with the function is expressed as

$$\min_{G} \max_{D} V(D,G) = E_{D} + E_{G}$$
(4)

where

$$E_{D} = E_{{x,y\sim p_{data} (x,y)^{[\log D(x,y)]} }}$$
$$E_{G} = E_{{z\sim p_{z} (z),y\sim p(y)^{[\log (1 - D(G(z,y),y))]} }}$$

The (x, y) X × Y values come from the distribution of data Pdata(x, y), the z Z values come from the audio signal, pz(z), and the y Y values come from provisional eigenvectors in the training phase and are defined by the probability density, py(y).The training technique for cGANs is the same as that for GAN models. The estimation methods for the elevation upgrade of the discriminator and producer on a subset rephrased as Eq. (4) of m training examples \(\left\{ {(x_{i,} y_{i} )} \right\}\frac{m}{i = 1}\) and m noise samples \(\left\{ {(z_{i} )} \right\}\frac{m}{i = 1}\) are the following logistic cost expressions:

$$J_{D} = - \frac{1}{2m}\left( {\sum\limits_{i = 1}^{m} {\log D(x_{i,} y_{i} ) + \sum\limits_{i = 1}^{m} {\log \left( {1 - D\left( {G\left( {z_{i} ,y_{i} } \right),y_{i} } \right)} \right)} } } \right)$$
(5)
$$J_{G} = - \frac{1}{m}\left( {\sum\limits_{i = 1}^{m} {\log D\left( {G(z_{i} ,y_{i} ),y_{i} } \right)} } \right)$$
(6)

To prevent saturation of the discriminator, Eq. (6) is a refined version of the utility’s error function described by Eq. (4). Equations (56) upgrade the gradient based on the cGAN method, which is trained. This includes imbalanced class data {(xi, yi)} n i = 1, which (xi, yi) X × {0, 1} with y = 1 correlates with minimum class. D denotes parameters are upgrade k times follows by a single upgrade of G parameters. The class variable y represents the cGAN module’s exterior details. As previously stated, Z is a noisy field with dimensionality dZ, whereas dX and dY are the dimensionalities of X and Y. Z × Y space is the input vector that receives the input vector, and the X is an input space that is based on the output vector. The discriminator, on the other hand, takes as input vectors in the X × Y space and defines it as actual or fictional data provided by G. Once completing cGAN training, input vector of the kind (z, y = 1) Z × Y, here the instance from noisy field Z is represented by z, which is used to produce synthetic data for the minority community.

The noisy field dimension dZ and hyperparameters relating to the G and D networks design and training choices are the hyperparameters of this procedure. The dimensions of X and Y are the input and output space, respectively, and the reality that D is a binary classification limits the leveraging global of the two methods, according to Eqs. (2) and (3). Significantly, G's output and input layers have dZ + dY and dX number of components, accordingly. The output and input layers of D have dX + dY and components. In a binary classification task, the dimension dY of the class, y, is one. The non-constrained hyperparameters of the cGAN are the size dZ of the noisy field and the total quantity for the hidden layers of G and D when employing many hidden layers for G and D. The cGAN method effectively addresses the issue of imbalanced datasets in DDoS attack detection. By generating synthetic data for underrepresented classes, cGAN produces a more balanced training set. This results in enhanced classification performance and increased robustness of the detection model.

Classification using stacked sparse denoising auto encoder

A semi-supervised Denoising Autoencoder (SSDAE) network identifies whether a DDoS attack occurs. It is based on the concept of Auto Encoder (AE) and sparse denoising autoencoder. In AE, the primary objective is to minimize the difference between input and reconstructed output. If X and \(\hat{X}\) is the input and reconstructed data, the following equation shows the average reconstruction error among \(\hat{X}\) and X:

(7)

Here, ‘b’ and ‘W’ denote bias and weight, the raw input is represented by Xi, the new output reconstructed from the input is characterized by \(\hat{X}_{{\text{i}}}\), and ‘n’ indicates the number of training samples.

AE performs input repetition via the encoding and decoding phases, during which much repetitive information obstructs the extraction of essential features. A sparse autoencoder (SAE) is presented as a solution to this problem, in which a restriction condition is introduced to the coding process. In most cases, sparse restriction refers to inhibiting hidden layer neurons, and as an activation function, a sigmoid function is used. The SAE modifies the loss function by adding a sparse penalty term, illustrated in Eq. (8):

$$\sum\limits_{l = 1}^{m} {KL\left( {\rho \left\| {\hat{\rho }_{{_{{_{l} }} }} } \right.} \right)} = \sum\limits_{l = 1}^{m} {\rho \log \frac{\rho }{{\hat{\rho }_{{_{l} }} }}} + \left( {1 - \rho } \right)\log \frac{1 - \rho }{{1 - \hat{\rho }_{{_{l} }} }}$$
(8)

Here, the sparse parameter is ‘ρ’, index \(l\) denotes the hidden layer neurons, and the total number of neurons in the hidden layer is denoted by ‘m’. Moreover, the average activity of the hidden layer \(l\) is represented \(\hat{\rho }_{{_{l} }}\) which is expressed in the following equation.

$$\hat{\rho }_{{_{l} }} = \frac{1}{n}\sum\limits_{i = 1}^{n} {[a_{{_{l} }}^{2} (x^{i} )} ]$$
(9)

Here, the hidden neuron’s activation is denoted by \(a_{{_{l} }}^{2}\) and hidden neuron x’s activation in each input of \(l\) is denoted by \(a_{{_{l} }}^{2} (x^{i} )\). Moreover, when \(\hat{\rho }_{{_{{_{l} }} }} = \rho , \, KL\left( {\rho \left\| {\hat{\rho }_{{_{{_{l} }} }} } \right.} \right) = 0\) increases monotonously as the difference among \(\hat{\rho }_{{_{{_{l} }} }} {\text{and }}\rho\) rises.The function \(KL\left( {\rho \left\| {\hat{\rho }_{{_{{_{l} }} }} } \right.} \right)\) is added to the loss function, and the loss function is then minimized, so the effects of \(\hat{\rho }_{{_{l} }} {\text{ and }}\rho\) are as close as possible. The loss function of the SAE is shown in Eq. (10):

$$J_{sparse} \left( {W,b} \right) = J(W,b) + \beta \sum\limits_{l = 1}^{m} {KL\left( {\rho \left\| {\hat{\rho }_{{_{{_{l} }} }} } \right.} \right)}$$
(10)

Here, the sparse penalty item’s weight is denoted by 'β'. Most neurons in the hidden layer are inhibited while the SAE extracts features, and its operation is identical to that of a visual system. SAE has the advantage of learning rather than reproducing more representative and sparse features from the input. As a result, the SAE’s features aid in improving the accuracy of intrusion detection.

A denoising autoencoder (DAE) aims to apply noise to the input before training the network with it. This training process is used to get the output near the raw input to improve feature robustness. This network expresses the reconstructed data and new feature 'h' in Eq. (11, 12). Finally, the loss function is described in Eq. (13).

$$h = i(\tilde{x}) = s_{i} (w\tilde{x} + b)$$
(11)
$$\hat{x} = d(h) = s_{d} (\tilde{w}h + p)$$
(12)
(13)

Here, the p and b denote the decoding and coding deviation, respectively, \(\tilde{w}\) representing the weight matrix, sd and si denote the decoding and coding activation function, respectively. Moreover, \(\theta = \{ w,b,p\}\) and N denotes the training sample.

According to the findings, a stacking training technique is easier to implement, and network training converges more quickly. Therefore, in this research, a hybrid approach called SSDAE is implemented. This technique helps extract the best features used to classify the intrusion easily, and it effectively obtains robust and sparse features and handles dimensional feature vectors successfully.

In SSDAE training process, the training sample is represented by i, the decoding weight is denoted by \(W_{d}^{i}\) which is the weight value from hidden to output layer, the coding weight from input to hidden layer is denoted by \(W_{e}^{i}\), the feature of the hidden layer is denoted by Hi. Initially, H1 are the trained hidden layer parameters which are given as an input to the DAE. The SAE and DAE are unsupervised techniques. Finally, the hybrid proposed technique is created based on these two model’s stacked training parameters. The first and second layer’s coding weight are denoted by \(W_{e}^{1}\) and \(W_{e}^{2}\) and the third and fourth layer’s decoding weights are represented by \(W_{d}^{1}\) and \(W_{d}^{2}\). Finally, the Softmax classifier is added into the network to determine whether it is DDoS attack or not.

The random initialization of the weight parameter increased the training time of SSDAE and fell into the local optimum. Therefore, in this work, the environment-inspired optimization approach known as a firefly-black widow is employed to update the weights of the autoencoder. This hybrid optimization technique utilizes the social behaviors and mating strategies of fireflies and black widow spiders to maintain a balance between exploration and exploitation during the weight update process. This approach enhances both the convergence rate and accuracy of the autoencoder, thereby significantly improving the overall performance of DDoS attack detection.

Through the careful design and integration of advanced methodologies, the proposed system ensures robust and efficient detection capabilities, identifying DDoS attacks with high precision and recall. This makes it a crucial tool in enhancing network security.

Firefly-Black widow optimization-based weight selection:

The Firefly-black widow (FA-BW) optimization algorithm is suggested to choose the feasible weight of the SSDAE classifier. In the Firefly algorithm, every Firefly in the search area is a possible solution. The FA is based on the mating and light flashing patterns of fireflies, which acts as their information exchange method. This algorithm is used to solve many real-world problems. However, some things could be improved, such as the inability to maintain a proper balance between local and global searches. As a result, this research aims to improve the original FA’s global search capability by using the BW algorithm’s mutation operator.

In the original FA, the best Firefly does not move despite all the other fireflies being supposed to move closer to it. The FA’s efficiency will be harmed if the algorithm fails to identify an improved positioning after multiple iterations; however, the BW mutation operator will aid the FA by enabling the random movement of the best Firefly in the population toward a new position. It means employing the BW mutation operator to update the best-found solution at random, thus boosting its capacity to look for better positions and, as a result, improving algorithmic efficiency. The black widow optimization algorithm mimics the black widow spider’s evolutionary lifecycle. Typically, female gender black widow spider makes their net at night and then release pheromone to fascinate male spiders. The male spiders fascinated by the female pheromone, get entangled in the net. Before or after breeding, the black female widow spider feeds on the black male. The black female widow puts an egg on their mesh after mating. Adult spiders emerge from the eggs around 11 days and engage in sibling cannibalism. In the short term, the adult spider lives in the parent net. Throughout this short time, the mother may even eat some juvenile spiders. Based on this principle, other younger spiders from the internet are regarded as the healthiest and, and a black widow optimization technique is constructed.

Only a few practical procedures are involved in implementing BW within the FA. The normal FA is based on fireflies, while the BW is based on spiders. The portrayal of FA’s fireflies by BW’s spiders is the initial stage in embedding spider actuators in FA. Every FA firefly transforms into a spider in BW, and all FA fireflies symbolize the demographic’s spiders. The freshly produced spiders are exchanged per the ratio given in the testing method to apply the crossover operator.

The optimum values of the optimization features are examined after processes are conducted on spiders. The algorithm ends if the fitness value of the optimization features is equal to the needed value; else, Firefly changes the spiders during the next generation, and the cycle continues till the number of iterations exceeds the final condition.

The brightness of their light primarily drew most fireflies in FA. When a firefly can move toward one of the two fireflies, it is more interested in the brighter Firefly and will travel to that location. The brightness of the light reflects the efficiency feature value, and Eq. (14) is used to upgrade the flash amplitude:

$$I(r) = I_{0} e^{{ - \gamma r^{2} }}$$
(14)

Where I0 represents the light source’s amplitude, γ represents the flash’s absorption coefficient, and r represents the spacing among the fireflies. The attractions of fireflies can be determined as follows because they are attracted by the light amplitude observed by fireflies.

$$\beta = \beta_{0} e^{{ - \gamma r^{2} }}$$
(15)

When the distance between firefly and the object is \(\beta\) 0, get a value of 0 for attractiveness. The following formula can be used to determine the distance r among two fireflies, i and j:

$$r_{ij} = \sqrt {\sum\nolimits_{k - 1}^{n} {(s_{ik} } } - s_{jk} )^{2}$$
(16)

Where n is the issue’s size. It uses the following equation to upgrade the movement of every firefly.

$$s_{i} (t + 1) = s_{i} (t) + \beta_{0} e^{{ - \gamma r^{2} ij}} (s_{j} (t) - s_{i} (t)) + \alpha \varepsilon_{i}$$
(17)

As shown in Eq. (17), the upgrade of fireflies’ movement is impacted by three terms: attractiveness to another firefly’s position, random number εi and the firefly’s present position, and randomization constraint α.

As previously stated, the best firefly maintains its position in the original FA, slowing the search phase and increasing the probability of local optima trapping. On the other hand, the proposed FF-BW variation uses the BW mutation operator to update the position of the best firefly via a random exchange of random features and variables. To execute this operator, spiders replace the fireflies.

Let’s assume Spider S= {s1, s2…., si} in the population of BW optimization, and the amount of objective function is represented as f (obj). The population of fireflies is represented by F= {f1, f2, f3…., fj}, the position of each firefly is represented as P= {p1, p2, p3…pn}, and each firefly is associated with its light intensity.

The assignments of fireflies of FA to spiders of BW is the first logical operation in the suggested paradigm, where spiders of BW address every firefly of FA, and the result of the objective function of BW is expressed by illuminance of FA. The below equation is used to calculate it.

$${\text{Ch}}_{{1}} = {\text{ f}}_{{1}} ,{\text{ f }}\left( {{\text{Obj}}_{{1}} } \right) \, = {\text{ I }}\left( {1} \right)$$
$${\text{Ch}}_{{2}} = {\text{ f}}_{{2}} ,{\text{ f }}\left( {{\text{Obj}}_{{2}} } \right) \, = {\text{ I}}\left( {2} \right)$$
$${\text{Ch}}_{{3}} = {\text{ f}}_{{3}} ,{\text{ f }}\left( {{\text{Obj}}_{{3}} } \right) \, = {\text{ I}}\left( {3} \right)$$
$${\text{Ch}}_{{\text{i}}} = {\text{ f}}_{{\text{i}}} = > {\text{ f }}\left( {{\text{Obj}}_{{\text{i}}} } \right) \, = {\text{ I}}\left( {\text{i}} \right)$$

The objective function in GA represents the luminance of FA in this case. The resultant population is then subjected to a mutation operator. The logical representation of the mutation operator is as follows:

$${\text{Ch}}_{{1}} > < {\text{ Ch}}_{{2}} \left( {{\text{mutationatposition1}}} \right)$$
$${\text{Ch}}_{{1}} = {\text{ f}}_{{1}} = \, \left[ {{\text{p}}_{{{12}}} ,{\text{ p}}_{{{13}}} ,{\text{ p}}_{{{14}}} ,{\text{ p}}_{{{\text{1j}}}} } \right] \, = {\text{f }}\left( {{\text{Obj}}_{{1}} } \right)$$
$${\text{Ch}}_{{2}} > < {\text{ Ch}}_{{4}} \left( {{\text{Mutationatposition3}}} \right)$$
$${\text{Ch}}_{{2}} = {\text{ f}}_{{2}} = \, \left[ {{\text{p}}_{{{21}}} ,{\text{ p}}_{{{22}}} ,{\text{ p}}_{{{24}}} ,{\text{ p}}_{{{\text{2j}}}} } \right] \, = {\text{ f }}\left( {{\text{Obj}}_{{2}} } \right)$$
$${\text{Ch}}_{{\text{i}}} > < {\text{ Ch}}_{{2}} \left( {{\text{Mutationatpositioni}}} \right)$$
$${\text{Ch}}_{{\text{i}}} = {\text{ f}}_{{\text{i}}} = \, \left[ {{\text{p}}_{{{\text{i1}}}} ,{\text{ p}}_{{{\text{i2}}}} ,{\text{ p}}_{{{\text{i3}}}} ,{\text{ p}}_{{{\text{i4}}}} } \right] \, = {\text{ f }}\left( {{\text{Obj}}_{{\text{i}}} } \right)$$

The mutation operator improves the firefly’s search process by reducing the algorithm’s possibility of becoming trapped in the search space. The best optimal solution is assigned as the weight of the SSDAE.

Results and discussion

The efficiency of the suggested intrusion detection system is analyzed in this section, and it is compared with existing state-of-art techniques named Encoder with Gradient descent optimization, Encoder with Adam optimization, SSDAE with Firefly optimization, and SSDAE with black widow optimization, etc. To experiment the model, we have used 16 GB RAM and an Intel i7 10th generation 2.60 GHz processor, with 4 GB NVidia graphics runs on Windows 10. The studies are conducted in the Anaconda3 environment using Python and KERAS with Tensor flow as a backdrop. For experimentation we have considered a traffic data of 2,20,000 records by covering all varieties of attacks and benign samples, where majority samples are attacks and minority samples are benign requests. After the data balancing the size of the total traffic is 4,39,000 records, with an equal proportion of attacks and benign samples. Among these samples the model considered 70% of traffic as training and 30% as testing.

The proposed model analyses the binary class classification based on some standard metrics, like precision, f-score, accuracy, recall, and Receiver Operating Characteristic (ROC) Curve to analyze the performance of DDoS detection. In this result section, we performed two types of experiments, and the initial investigation was conducted with imbalanced data and the second with balanced data.

Imbalanced data results

The suggested technique achieved the best results in the initial trial experiments, as shown in Table 1, with 99.89% accuracy, 99.24% precision, 99.02% recall, and 99.39% of F1-score. Meanwhile, the SSDAE + Firefly combined approach achieves comparable outcomes in all performance criteria.

Table 1 Performance comparison of different techniques on imbalanced data.

The comparison graph of the proposed technique is represented in Fig. 2. The SSDAE + black widow model combination produces the greatest results in precision and AUC, as shown in the figure. However, compared to SSDAE + Firefly, it attains lower values.

Fig. 2
figure 2

Comparison of proposed technique (without data augmentation).

Furthermore, ROC curves have been created by performing binary classification on the Normal and Attack classes, yielding a 98% value, as shown in Fig. 3. Compared to other techniques, it is the highest value.

Fig. 3
figure 3

ROC curve values for CICDDoS dataset (without augmentation).

Balanced data results

The cGAN approach is used to increase performance and tackle the unbalanced issue. In the second trial experiment (Table 2), models are fed the balanced data with an equal number of attacks and normal instances. Experiments showed that the outcomes are significantly better than the prior trial experiment, as seen in the Table 2. The proposed model achieved a 99.99% accuracy score, 99.81% precision, 99.26% recall, and 99.63% F-score in this trial. SSDAE and the black widow model produce comparable results but not as good as the best when used together.

Table 2 Performance comparison of different techniques on balanced data.

The comparison graph of the proposed technique with balanced data is presented in Fig. 4. From the chart, the proposed method attains the best performance in terms of all metrics. It is due to adopting basic deep learning models with bigger batch sizes and fewer layers and cGAN’s method on datasets, which reduces processing complexity. In addition, incorporating the FA-BWO algorithm aided in achieving the global optimal without becoming trapped at the local optimum.

Fig. 4
figure 4

Comparison of proposed approach for binary class classification.

Figure 5 shows the comparison of the ROC graph. The figure shows that the suggested system has the more excellent AUC value while recognizing attack samples in a dataset, demonstrating its ability to accurately detect intrusions in network traffic. The confusion matrix for the model was shown in Fig. 6. Finally, based on the above discussion, the suggested technique effectively distinguishes between legitimate and malicious network data.

Fig. 5
figure 5

ROC curve values for CICDDoS dataset with augmentation.

Fig. 6
figure 6

Confusion matrix for CICDDoS dataset.

Comparison with the state-of-the art models

The best results obtained from the proposed model was compared with the existing approaches worked on the CICDDoS 2019 dataset. The comparative results are shown in the Table 3. The Table 3 compares with the proposed method in terms of accuracy, recall, precision, F1 score and AUC. Thus, our model outperformed the other approaches.

Table 3 Comparison of the proposed approach with state-of-the-art approaches.

Further, Table 4 provides a comparative performance analysis of the proposed method across various benchmark datasets, including CICDDoS2019, NSL-KDD, UNSW-NB15, and KDD Cup 99. The proposed method demonstrates exceptional accuracy and robustness across all datasets. These results highlight the model’s adaptability and effectiveness in diverse network intrusion detection scenarios.

Table 4 Performance on different datasets with proposed algorithm.

The proposed model extracts the essential attributes and detects DDoS attacks with high accuracy. The model is successful due to the following factors: The recent CICDDoS2019 dataset is used, because it contains the real-world attack patterns. It is subjected to a sequence of steps, including preprocessing, data balancing, and classification:

  • In the preprocessing step, the input traffic was handled properly with the approaches designed in the model. Due to this, the model learns the traffic patterns effectively.

  • The data imbalance problem is handled by integrating Conditional Generative Adversarial Networks (cGAN), which eliminates the bias concerning the majority samples and boosts the training efficacy.

  • The input traffic is normalized, which aids the performance of the model.

  • Socket-related attributes are discarded from the traffic to ensure that the model is not biased towards a particular network and efficient on various networks.

  • The proposed deep learning-based IDS model that incorporates a hybrid optimization approach using Firefly and Black Widow Optimization (FA-BWO) algorithms. This hybrid enhances the model’s ability to achieve global optimality, thereby improving detection accuracy of the model and ensuring the model generalizes well to new data.

Conclusion

This paper demonstrates the effectiveness of a deep learning-based intrusion detection system in identifying DDoS attacks, which are among the most severe threats in the real-world. By implementing a three-phase framework that includes data pre-processing, data balancing with CGAN, and classification using a stacked sparse denoising autoencoder (SSDAE) optimized with a firefly-black widow (FA-BW) hybrid algorithm, the proposed approach significantly improves detection accuracy. The validation of this system using the CICDDoS2019 dataset indicated that the proposed deep learning strategy achieves good outcomes in terms of essential performance indicators, including precision, recall, AUC, F-score, and accuracy. In addition, the IDS based on the proposed method recorded an accuracy of 99.89% for imbalanced data and 99.99% for balanced data. The comparative results with various approaches are depicted in Table 4. Future work could extend this approach to classify multiple types of attacks across various network environments and incorporate explainability through XAI techniques. These findings emphasize the crucial role of advanced deep learning and hybrid optimization techniques in strengthening cybersecurity and mitigating the impact of DDoS attacks.