Introduction

ASDs are developmental disabilities defined by challenges in communication and social interaction, as well as repetitive, restricted, and stereotypical behaviour patterns1. Symptoms are usually apparent before age 3 years. The nature of these difficulties, with an absence of biological markers for analysis and variations in clinical descriptions over time, raises tasks to monitor ASD occurrence. Accurate data reporting is crucial to recognize ASD occurrence in the population and can assist primary research2. ASD is a critical neurodevelopment disease with extremely high medical care costs. ASD is an everlasting absence of stereotypical behaviour and social interaction, frequently accompanied by extensive communication skill deterioration. ASD is connected with neurological and genetic elements3. ASD-related behaviours are stated through social interaction, the capability of imagining and thinking, repetitive activities, and communication disorders with other individuals. If ASD had been analyzed earlier, it could have been treated. Primary ASD detection plays a significant role in medical analysis4. Then, this analysis can notify personalized treatment regimens to enhance the living standards for children with ASD and their families. Unfortunately, the analysis process of ASD can take much time with valuable testing approaches5. In recent times, the enhancement of ASD cases around the globe has been a motivation for scientists or doctors to discover more efficient screening approaches. ASD symptoms closely resemble those of other mental disorders, like depression, which require distinctly different treatment approaches, leading to potential confusion6.

To simplify the analytical process and reduce the waiting period, a computerized approach for recognizing ASD, which necessitates little or beginner supervision, would be significant progress over common practice. Usually, it is identified in its initial phase, but the major bottleneck exists in the tedious and subjective nature of present analysis procedures7. Consequently, there is a waiting period of at least 13 months from the earlier suspicion to the actual analysis. Treating and detecting ASD in the early phase is highly important. This aids in alleviating or decreasing the symptoms to some extent, therefore enhancing the overall living standards of the individual. Nevertheless, a valuable period is lost due to the gaps between analysis and primary concern as this disease remains undetected8. Various investigations have discovered significant autism features in multiple ways to recognize autism, like voice recognition, eye tracking, feature extraction, medical image analysis, and facial recognition. Developed information technology that utilizes the Artificial Intelligence (AI) method has assisted in analyzing ASD earlier through facial pattern detection9. Recently, DL and ML have attained several accomplishments, obtaining more significance when employed in sciences like biomedicine and medicine. ML approaches are suggested or utilized in medical diagnostics and decision-making to aid data interpretation. Thus, screening disease approaches with the help of ML are widely examined10. ML approaches would not only aid in assessing the ASD risk precisely and rapidly but are also essential to streamlining the entire analysis process and assisting families in accessing much-needed therapies quickly.

This study presents a Two-Tier Metaheuristic-Driven Ensemble Deep Learning for Effective Autism Spectrum Disorder Diagnosis in Disabled Persons (T2MEDL-EASDDP) model. The main aim of the presented T2MEDL-EASDDP model is to analyze and diagnose the different stages of ASD in disabled individuals. To accomplish this, the T2MEDL-EASDDP model utilizes min-max normalization for data pre-processing to ensure that the input data is scaled to a uniform range. Furthermore, the improved butterfly optimization algorithm (IBOA)-based feature selection (FS) is utilized to identify the most relevant features and reduce dimensionality efficiently. Additionally, an ensemble of DL holds three approaches, namely autoencoder (AE), long short-term memory (LSTM), and deep belief network (DBN) approach is employed for analyzing and detecting ASD. Finally, the presented T2MEDL-EASDDP model employs brownian motion (BM) and directional mutation scheme-based coati optimizer algorithm (BDCOA) techniques to fine-tune the hyperparameters involved in the three ensemble methods. A wide range of simulation analyses of the T2MEDL-EASDDP technique are accomplished under the ASD-Toddler and ASD-Adult datasets. The key contribution of the T2MEDL-EASDDP technique is listed below.

  • The T2MEDL-EASDDP technique utilizes min-max normalization to scale features within a specified range, ensuring effectual data pre-processing. This step assists in improving the performance of the model by standardizing the input data. It plays a significant role in optimizing the analysis and detection of ASD by improving model stability and convergence.

  • The T2MEDL-EASDDP model employs the IBOA technique for feature selection, ensuring that only the most relevant features are chosen. This improves model performance by mitigating complexity and improving accuracy. By concentrating on key features, the model becomes more effectual and capable of better detecting ASD.

  • The T2MEDL-EASDDP approach utilizes an ensemble of DL methods, comprising AE, LSTM, and DBN, to analyze and detect ASD using every technique’s merits. This integration enhances the model’s capability to capture intrinsic patterns and discrepancies in the data. By incorporating these models, the approach improves detection accuracy and robustness.

  • The T2MEDL-EASDDP method implements the BDCOA model to fine-tune the model parameters, improving its accuracy and robustness. This optimization technique allows for more precise adjustments, improving the model’s performance. Effectually tuning the parameters assists the model in achieving improved generalization and stability in ASD detection.

  • Integrating IBOA for feature selection and BDCOA for model tuning, along with a hybrid DL ensemble approach, creates an optimized solution for ASD detection. This integration improves the model’s capability to identify relevant features and fine-tune parameters for enhanced performance. The novelty is the seamless fusion of advanced optimization algorithms with a multi-model DL framework, enhancing detection accuracy and efficiency for ASD diagnosis.

Literature survey

In11, an understandable ASD method that depends on DL for autism disease recognition in children and toddlers is projected. The presented method is separated into two different modules. Initially, applies a DL method for autism disease recognition. The second utilizes an XAI method called SHAP to emphasize the main features and describe the results of the methods. In12, employing approaches from ML, a method is created that is efficient for attaining higher performance to identify the earlier indicators of ASD in children. This work utilizes ML approaches, like RF, SVM, linear discriminant analysis, and k-nearest neighbour approaches, to separate populations into those who have ASD and do not have it. Al-Muhanna et al.13 present an ultra-light, computer-aided, Attention-based Hybrid Optimized Residual Memory Network (AHRML) for efficient and precise ASD recognition. At this point, an innovative hybridized Arithmetic HHO is applied to reduce the feature dimensions and modernize the disability of the identification process. Lakhan et al.14 project an effective and difficult ASD platform with complete services to improve ASD structure outcomes. The presented method is the federated learning-enabled convolutional neural network with LSTM (FCNN-LSTM) method utilizing multimodal databases. A security method that depends on AES was applied inside the FL framework to guarantee patient data security. Nogay and Adeli15 advanced an optimum CNN method utilizing the Grid Search Optimization (GSO) method. 3 CNN methods were intended to be employed in the method. (1) These methods are the quadruple classification method formed by taking gender, (2) The quadruple classification method was generated by taking age (3) The eightfold classification method was generated by considering both age and gender. Pokhrel et al.16 present a novel method for ASD classification, balancing the ability of ANNs with facial landmark recognition. This method exploits the subtleties fixed in these complicated particulars. The research examines dual classification, an ML challenge that dichotomizes data into one of 2 diverse categories. Furthermore, PCA is a dimensionality reduction model essential in altering data into smaller representations while retaining crucial modification. Prasad et al.17 focus on ASD classification utilizing MRI Images by Artificial Gannet Optimization enabled DCNN (AGO-DCNN). The bilateral filter is applied to extract the noise from the input MRI image, and this phase of image pre-processing ROI extraction is organized. The AGO is utilized to remove the essential region. AGO is a recently intended method that integrates dual optimizations, including Gannet Optimizer Algorithm (GOA) and Artificial Ecosystem-based Optimization (AEO).

Ullah et al.18 project a body-worn multi-sensor-based IoT framework utilizing ML to identify the complicated sign language of speech-impaired children. Optimum sensor position is crucial for feature extraction, as dissimilarities for positioning outcomes to understand the precise detection. Нryntsiv et al.19 propose a methodology for evaluating efficiency based on criteria like speech development, communication, and emotional regulation. Nadimi-Shahraki, Zamani, and Mirjalili20 propose an enhanced whale optimization algorithm (E-WOA) method by using a pooling mechanism and three effectual search strategies named migrating, preferential selecting, and enriched encircling prey. Prakash et al.21 propose a methodology to analyze behavioural videos for diagnosing ASD, utilizing a pipeline for child detection, action localization, and classification. Umrani and Harshavardhanan22 propose an automatic anxiety detection module for ASD utilizing a Deep CNN optimized by intelligent search techniques to tune classifier parameters. Polavarapu et al.23 present the involutional neural networks and ML techniques (INN-ASDNet) method, a novel approach that utilizes involutional neural networks and ML to predict autism. It uses location-specific involution kernels and a random forest (RF) classifier, requiring fewer parameters, reducing memory usage, and improving computation speed. Simon et al.24 introduce an approach for diagnosing ASD by utilizing eye-tracking data and ML. Gaze data was analyzed with Principal Component Analysis (PCA) and integrated into CNNs for feature extraction and classification, with transfer learning (TL) applied to enhance performance. Kavitha and Siva25 introduce the 3T Dilated Inception Network (3T-DINet) framework for ASD diagnosis utilizing rs-fMRI images. The model integrates a 3T dilated inception module with three dilation rates for extracting multi-scale brain features and employs Residual networks for better feature extraction. To improve performance, the model is optimized with a Crossover-based Black Widow Optimization (CBWO) approach. Vengidusamy et al.26 explore the complexities of ASD, accentuating early diagnosis, neuroimaging, genetic markers, and behavioural assessments for precise diagnosis. Lu et al.27 propose a DL technique for quantifying hand motion using the Sequential Bag of Convolutional Features (SBoCF) framework, incorporating Bag of Words (BoW) with a skeleton-based CNN for gesture classification, enabling automatic hand motor assessment. Sravani and Kuppusamy28 developed an automatic model for detecting ASD and Typically Developing (TD) individuals. It comprises pre-processing with edge detection and data augmentation, followed by model optimization using DCNNs fine-tuned with Dipper-Throated Particle Swarm Optimization (DTPSO). Grad-CAM is utilized for model interpretation.

Despite the improvements in ASD detection methods, there are still limitations in attaining highly accurate and generalized models across various populations. Many existing techniques depend on large, complex datasets that need substantial computational resources and are prone to overfitting. Furthermore, most approaches concentrate on a single modality (e.g., MRI, eye-tracking, or behavioural data), which limits the capability to capture the full complexity of ASD. There is also a lack of standardized, universally accepted evaluation metrics for evaluating the model’s performance. Moreover, few methods address real-time, on-the-ground applications, particularly in resource-constrained settings. Further research is required to improve model robustness, mitigate computational complexity, and incorporate multimodal data for a more comprehensive understanding of ASD.

The proposed model

In this paper, a novel T2MEDL-EASDDP model is developed. The main aim of the presented T2MEDL-EASDDP model is to analyze and diagnose the different stages of ASD in disabled individuals. The T2MEDL-EASDDP model has data normalization, dimensionality reduction using IBOA, three ensemble classification models, and a based parameter selection process to accomplish this. Figure 1 represents the entire procedure of the T2MEDL-EASDDP method.

Fig. 1
figure 1

Overall process of T2MEDL-EASDDP model.

Data normalization

Initially, the T2MEDL-EASDDP model undergoes min-max normalization for data pre-processing to ensure that the input data is scaled to a uniform range29. This model is chosen as it effectually scales data within a specified range, usually between 0 and 1, ensuring that all features contribute equally to the model. This is crucial when the dataset comprises features with varying units or magnitudes, as it prevents dominant features from skewing the outputs. Compared to other techniques, such as Z-score normalization, min-max normalization is more appropriate when the model requires a fixed input range, such as neural networks that often perform better with normalized data. This method enhances convergence speed and model stability, making it highly effective for DL tasks. Moreover, it assists in avoiding issues of numerical instability that can arise when features have significantly diverse scales, giving a more effective and reliable training process.

Min-max normalization is a straightforward technique that usually measures data to a stated range \(\:\left[\text{0,1}\right].\) The procedure includes deducting the least rate from every data point and separating within the range. Equation (5) signifies this normalization solution.

$$\:{X}_{normalized}=\frac{x-{x}_{\:\text{m}\text{i}\text{n}\:}}{{x}_{\text{m}\text{a}\text{x}}-{x}_{\text{m}\text{i}\text{n}}}$$
(1)

Here, \(\:x\) denotes the novel value of a variable. \(\:{x}_{\text{m}\text{i}\text{n}}\) and \(\:{x}_{\text{m}\text{a}\text{x}}\) represent the minimum and maximum values correspondingly. This model is simple to understand and perform, making it appropriate for states where feature scale is vital, like techniques that are delicate to fluctuating input measures. Conversely, its susceptibility to outliers can pretend the scaling, theoretically affecting the result of the method, particularly in databases with great values.

Dimensionality reduction: IBOA

Next, the IBOA-based FS efficiently identifies the most relevant features and reduces dimensionality30. This model is chosen because it can effectually detect and retain the most relevant features from a high-dimensional dataset. Unlike conventional methods, namely Recursive Feature Elimination (RFE), IBOA utilizes an optimization technique inspired by butterfly behaviour, which improves its capacity to explore the feature space and avoid local optima. This results in better feature selection accuracy and enhanced model performance. IBOA is specifically efficient for complex datasets, as it can handle large feature sets and mitigate computational costs by eliminating irrelevant or redundant features. Compared to other optimization techniques, IBOA balances exploration and exploitation, presenting robust feature selection with minimal computational overhead. Its adaptability to various datasets makes it a highly effectual choice for improving the performance of ML models. Figure 2 illustrates the steps involved in the IBOA model.

Fig. 2
figure 2

Steps involved in the IBOA method.

Chaotic mapping is mainly described by features like non-periodicity, unpredictability, non-repeatability, and good ergodicity, which are leveraged to enhance the range of the population and improve the model’s performance. In the original BOA, the butterfly range undergoes owing to the initialization of the randomly generated population. So, this paper presents Tent chaotic mapping to evenly dispense the population of butterflies and extend its searching range. Its mathematical formulation is given below:

$$\:{x}_{n+1}=f\left({x}_{n}\right)=\left\{\begin{array}{l}\frac{{x}_{n}}{a},{x}_{n}\in\:\left[0,\:a\right)\\\:\frac{1-{x}_{n}}{\left(1-a\right)},{x}_{n}\in\:\left[a,\:1\right)\end{array}\right.$$
(2)

Where \(\:\in\:\left(\text{0,1}\right)\).

The new BOA model has not fixed the search-step length of a solitary butterfly. Throughout the process, the length of the search step is not restricted owing to the higher degree of freedom of persons. To evade the limitation of butterfly individual search-step dimension, this paper develops a weight coefficient, which dynamically alters as per an individual fitness value. Its mathematical formulation is expressed below:

$$\:{\omega\:}_{i}=\frac{{F}_{b}-{F}_{w}}{{F}_{i}-{F}_{w}}$$
(3)

Here, \(\:{F}_{i}\) denotes the present individual fitness value, \(\:{F}_{b}\) represents the present global optimum, and \(\:{F}_{w}\) represents the worst fitness rates.

When the fitness value of the present individual is closely equal to the worse global fitness, then the advanced weight coefficient is allocated to that distinct, and also, the step dimension will take in their drive, intended to evade the set-up in a local optimal. If the present fitness value is dissimilar from the value of the global worst, then it is closer to the value of the global optimum. Also, the weight coefficient is lower, and the moving step size certifies the higher-precision exploration in the latter stage, evading the individual by avoiding the global optimum value. It will decrease the model’s performance.

Besides, by following the No free lunch theorem, a single technique cannot be entirely relevant to every issue, so this paper presents a sine-cosine model for enhancing the search stage of BOA. When united with the adaptive weighted co-efficient, the formulation of global and local search stages of BOA is upgraded below:

$$\:{x}_{i}^{t+1}=\left\{\begin{array}{l}{\omega\:}_{i}\times\:[{x}_{i}^{t}+{r}_{1}\times\:\text{s}\text{i}\text{n}({r}_{2})\times\:|{r}_{3}\times\:{g}^{*}-{x}_{i}^{t}\left|\right],\:{r}_{4}<0.5\\\:{\omega\:}_{i}\times\:[{x}_{i}^{t}+{r}_{1}\times\:\text{c}\text{o}\text{s}({r}_{2})\times\:|{r}_{3}\times\:{g}^{*}-{x}_{i}^{t}\left|\right],\:{r}_{4}>0.5\end{array}\right.$$
(4)
$$\:x_{i}^{{t + 1}} = \left\{ {\begin{array}{*{20}l} {\omega \:_{i} \times \:[x_{i}^{t} + r_{1} \times \:{\text{sin}}(r_{2} ) \times \:|r_{3} \times \:x_{j}^{t} - x_{k}^{t} |],r_{4} } \hfill & { < 0.5} \hfill \\ {\:\omega \:_{i} \times \:[x_{i}^{t} + r_{1} \times \:{\text{cos}}(r_{2} ) \times \:|r_{3} \times \:x_{j}^{t} - x_{k}^{t} |],\:r_{4} } \hfill & { > 0.5} \hfill \\ \end{array} } \right.$$
(5)

While \(\:{r}_{1}=a\times\:(1-t/{t}_{\text{m}\text{a}\text{x}}),\) \(\:a\) denotes a constant, which is fixed to 2; \(\:t\) represents the current iteration count; \(\:{t}_{\text{m}\text{a}\text{x}}\) denotes a maximum iteration count; \(\:{r}_{2}\) refers to a randomly generated number amid \(\:0\) and 2 \(\:\pi\:;\) \(\:{r}_{3}\) indicates random value among \(\:0\) and 2; and \(\:{r}_{4}\) is the generated number at random between \(\:0\) and 1.

In the IBOA model, the objectives are combined into a solitary objective formulation such that a present weight recognizes every importance of an objective. Here, a fitness function (FF) that unites both objectives of FS is utilized. Its mathematical formulation is expressed in Eq. (6).

$$\:Fitness\left(X\right)=\alpha\:\cdot\:E\left(X\right)+\beta\:*\left(1-\frac{\left|R\right|}{\left|N\right|}\right)$$
(6)

Here, \(\:Fitness\left(X\right)\) signifies the fitness value of a subset \(\:X,\) \(\:E\left(X\right)\) embodies the classifier rate of error by utilizing the nominated features in the \(\:X\) subset, \(\:\left|R\right|\) and \(\:\left|N\right|\) denote the number of chosen features and original features in the database, correspondingly; \(\:\beta\:\) and \(\:\alpha\:\) are the weights of the reduction ratio and classifier error, \(\:\beta\:=(1-\alpha\:)\) and \(\:\alpha\:\in\:\left[\text{0,1}\right].\)

ASD detection process: three ensemble models

In addition, an ensemble of DL holds three approaches, AE, LSTM, and DBN, which are employed for analyzing and detecting ASD. An ensemble of AE, LSTM, and DBN is chosen for analyzing and detecting ASD due to the complementary merits of each model. AE outperforms in unsupervised feature learning, capturing complex data representations, while LSTM effectually handles sequential data, which is significant for time series or behaviour patterns in ASD. On the contrary, DBN captures hierarchical feature representations, making it highly efficient for structured data like brain scans or behavioural features. Integrating these three techniques allows the model to utilize spatial and temporal patterns, improving its ability to detect ASD. This hybrid approach presents more comprehensive learning and enhanced accuracy related to individual models, as each model compensates for the limitations of the others, giving a robust solution for ASD detection.

AE classifier

An AE neural network is a feedforward neural network (NN) with an output layer size equivalent to the input layer31. These kinds of NNs are tailored to function unsupervised, whereas they struggle through training input vectors to reconstruct output vectors. AEs include dual major modules: encoding and decoding.

The AE workings in a method so as the vector of input \(\:X\) has been converted in the encoding to a hidden representation \(\:H\), as offered in Eq. (7).

$$\:H=\partial\:\left({W}_{xh}X+{\beta\:}_{xh}\right)$$
(7)

\(\:W\) denotes a weighted matrix, \(\:\partial\:\) which means an activation function rather than a rectified linear unit (RLU) or sigmoid function, and \(\:\beta\:\) represents a biased vector.

The first input space is reconstructed by converting the vector of hidden representation \(\:H\) utilizing decoding, as shown in Eq. (8).

$$\:\widehat{X}=\partial\:\left({W}_{\widehat{x}h}h+{\beta\:}_{\widehat{x}h}\right)$$
(8)

The error of reconstruction \(\:\epsilon\:\) is then computed by discovering the variance among the reconstructed vector \(\:\widehat{X}\) and the new input vector, as shown in Eq. (9).

$$\:\varepsilon \: = \left\| {x - \hat{X}} \right\|$$
(9)

This method is trained in the unsupervised model to reduce the reconstruction error \(\:\epsilon\:\). It achieves this by learning the relations among input characteristics. The input would be relatively similar after being trained using data that resembles the training data. When the input generates a higher \(\:RE\), it is measured as data.

LSTM classifier

The vanishing gradient problem of recurrent neural networks (RNNs), which contains poor capability for short-term mutation data acquisition, is solved by LSTM32. Furthermore, the LSTM’s activation functions, such as tanh and sigmoid, have an exact super-saturation region, which prevents an input data flow from altering dramatically in the supersaturated area’s value range. This paper proposes an LSTM with a transformation gating mechanism to further enhance the LSTM network’s ability to learn long‐term dependencies. Every LSTM block has five major modules such as forget gate \(\:\left({f}_{t}\right)\), input gate \(\:\left({i}_{t}\right)\), cell state \(\:\left({c}_{t}\right)\), hidden layer (HL) \(\:\left({h}_{t}\right)\), and an output gate \(\:\left({\text{o}}_{t}\right)\). The main feature of LSTM is the memory from one cell to the following, while the RNN requires a cell state. The cell state is constantly upgraded; if a little data is not needed, it might be rejected, and novel data might be inserted with three gates. The process of LSTM is formulated below in the mathematical formulation.

$$\:\:{i}_{t}=\sigma\:({x}_{t}+{h}_{t-1}+{b}_{i})$$
(10)
$$\:{f}_{t}=\sigma\:\left({x}_{t}+{h}_{t-1}+{b}_{f}\right)$$
(11)
$$\:{o}_{t}=\sigma\:\left({x}_{t}+{h}_{t-1}+{b}_{o}\right)$$
(12)
$$\:{c}_{t}={f}_{t}\odot\:{c}_{t-1}+{i}_{t}\odot\:\text{t}\text{a}\text{n}\text{h}\left({x}_{t}+{h}_{t-1}+{b}_{c}\right)$$
(13)
$$\:{h}_{t}=tanh\left({c}_{t}\right)\odot\:{\text{o}}_{t}$$
(14)

As per the mathematical formulations stated above, \(\:{x}_{t}\) denotes an input at the present time-step, \(\:{h}_{t-1}\) refers to an HL from the preceding time-step; \(\:b\) denotes the bias matrix, which is linked to an input gate, and HL; \(\:{c}_{t-1}\) is the memory cell from the prior time step; \(\:{c}_{t}\) indicates the memory cell state of the present time-step; \(\:\sigma\:\), \(\:tanh\) represents an activation function.

DBN classifier

The intellectual model using DBN has been presented to tackle the problem33. DBN is a dynamic Bayesian network. Thus, it contains a valuable fault tolerance performance in the decision-making procedure. DBN comprises numerous limited Boltzmann machines (RBM) and a top‐layer classifier. While \(\:h\) and \(\:v\) represent hidden elements and visible units of the system, \(\:{v}_{i}\in\:\left\{\text{0,1}\right\},{h}_{j}\in\:\left\{\text{0,1}\right\}\), and \(\:a,\) \(\:b\) represent the biased values of the system. Figure 3 portrays the structure of DBN.

Fig. 3
figure 3

DBN architecture.

RBM assumes that the combined distribution of hidden and visible units is according to canonical distribution, which is defined as demonstrated:

$$\:P\left(v,h\right)=\frac{1}{Z}{e}^{-E\left(v,h\right)}$$
(15)

Here, \(\:Z\) signifies the divider functions, fine-tuning the likelihood amount to 1, and \(\:E\) represents the networking energy system. Considering that the system encounters the characteristics of the stochastical procedure that the Ising method defines, the network energy method is stated as shown:

$$\:E\left(v,h\right)=-\sum\:_{i}{a}_{i}{v}_{i}-{\sum\:}_{j}{b}_{j}{h}_{j}-{\sum\:}_{i}{\sum\:}_{j}{h}_{j}{w}_{i,j}{v}_{i}$$
(16)

An elementary general theory for an arbitrary method is that lower energy identifies more strength. Therefore, the RBM experiences unsupervised training by reducing the energy \(\:E\). Based on Eq. (15), this is equivalent to increasing the probability \(\:P\). The optimizer aim of RBM is designated as shown:

$$\:{\mathcal{L}}_{\theta\:,S}=\prod\:_{n=1}^{{n}_{s}}P\left({v}^{n}\right)$$
(17)

Now \(\:\theta\:=\{w,\:a,\:b\}\) signifies bias and weights value of RBM models; \(\:S=\{{v}^{1},{v}^{2},\:\cdots\:,{v}^{{n}_{s}}\}\) represents the input database; \(\:{n}_{s}\) designates the quantity of input model samplings. Regarding upgrading bias and weights, the faster model, for example, contrastive divergence (CD), according to the Markov chain Monte Carlo (MCMC) theory, is applied. The DBN’s topmost layer classifier is a general BP technique in this study.

Parameter selection: BDCOA

Finally, the presented T2MEDL-EASDDP model utilizes BDCOA to fine-tune the hyperparameters involved in the three ensemble methods34. This model is chosen because it can effectively balance exploration and exploitation during optimization. Unlike conventional techniques such as grid or random search, BDCOA integrates advanced strategies like Brownian motion and directional mutation to avert getting stuck in local optima, enhancing the quality of the solution. Its robustness allows it to fine-tune complex model parameters with higher precision, resulting in improved model performance and faster convergence. BDCOA is appropriate for high-dimensional and computationally intensive tasks, where conventional methods mostly face difficulty finding optimal configurations. BDCOA optimizes model accuracy and stability by optimizing parameters dynamically while mitigating computational cost, making it an ideal choice over conventional parameter selection techniques. Figure 4 demonstrates the steps involved in the BDCOA model.

Fig. 4
figure 4

Steps involved in the BDCOA technique.

The BM and Directional Mutation Scheme (DMS) were combined into the BDCOA to avoid the sub-optimal solution and improve the searching space efficacy when hunting for optimal performance. The typical COA is a metaheuristic technique that imitates the natural actions of coati. The COA utilizes the standard of assisting between agents, imitating a creature’s act. Next, the BM and DMS are combined to evade the sub-optimal solution and improve the searchability, which aims to attain an effectual balance between exploitation and exploration throughout the search procedure. This outcome improved global search efficacy in optimal control signals discovery employed to improve the obtained signal excellence. The search procedure for optimal control signal has been described below:

Initialization procedure

The matrix \(\:X\) contains randomly generated initial population positions, which was signified in Eq. (18); the number of coatis was definite by \(\:N\), and the problem size was represented by \(\:m.\) Each \(\:ith\) coati position is stated in Eq. (19), where an objective of each population was calculated, and the vector \(\:F\) is made with the size \(\:N\text{x}1\) that signified in Eq. (20). The primary solutions are randomly generated value of voltages.

$$\:X = \left[ {\begin{array}{*{20}c} {X_{1} } \\ \vdots \\ {X_{i} } \\ \vdots \\ {X_{N} } \\ \end{array} } \right]_{{N \times \:m}} = \left[ {\begin{array}{*{20}l} {x_{{1_{{\prime {\kern 1pt} }} 1}} } \hfill & \ldots \hfill & {x_{{1_{{\prime {\kern 1pt} }} j}} } \hfill & \ldots \hfill & {x_{{1,m}} } \hfill \\ \vdots \hfill & \ddots \hfill & \vdots \hfill & . \hfill & \vdots \hfill \\ {x_{{i_{{\prime {\kern 1pt} }} 1}} } \hfill & \ldots \hfill & {x_{{i,j}} } \hfill & \ldots \hfill & {x_{{i_{{\prime {\kern 1pt} }} m}} } \hfill \\ \vdots \hfill & . \hfill & \vdots \hfill & \ddots \hfill & \vdots \hfill \\ {x_{{N_{{\prime {\kern 1pt} }} 1}} } \hfill & \ldots \hfill & {x_{{N,j}} } \hfill & \ldots \hfill & {x_{{N_{{\prime {\kern 1pt} }} m}} } \hfill \\ \end{array} } \right]$$
(18)
$$\:{X}_{j}:{x}_{i,j}=\left\{{v}_{1},\:{v}_{2},\:\dots\:,{v}_{32}\right\},\:i=\text{1,2},\:\dots\:,\:N,\:j=\text{1,2},\:\dots\:,m$$
(19)
$$\:F = \left[ {\begin{array}{*{20}c} {F_{1} } \\ \vdots \\ {\:F_{i} } \\ \vdots \\ {\:F_{N} } \\ \end{array} } \right]_{{N \times \:1}} = \left[ {\begin{array}{*{20}l} {F\left( {X_{1} } \right)} \hfill \\ \vdots \hfill \\ {\:F\left( {X_{i} } \right)} \hfill \\ \vdots \hfill \\ {\:F\left( {X_{N} } \right)} \hfill \\ \end{array} } \right]_{{N \times \:1}}$$
(20)

While the location of coati \(\:\chi\:\) is signified as \(\:{X}_{i}\), the valuation of the \(\:jth\) decision variable is represented as \(\:{x}_{i,j}.\)

Directional mutation

Here, the route of coatis was altered to improve finding in the searching space efficacy. The DMS in Eq. (7) was gained by utilizing the directed value \(\:d\) to enhance the searching space efficacy.

$$\:X={X}_{sbest}+d\times\:\rho\:\times\:\left({X}_{r2}-{X}_{r3}\right)$$
(21)

Here, the solution expressed from the upmost \(\:S\times\:N\) solutions is stated as \(\:{X}_{best}\), dual dissimilar solutions selected at random from the population are indicated as \(\:{X}_{r2}\) and \(\:{X}_{r3}\), whereas \(\:\rho\:\) symbolizes the real value and constant factor selected between \(\:\left[\text{0,1}\right]\); \(\:d\) refers to the directed value, which was stated in Eq. (22).

$$\:d=\left\{\begin{array}{l}1\:if\:F\left({X}_{r2}\right)<F\left({X}_{r3}\right)\\\:-1\:Otherwise\end{array}\right.$$
(22)

Exploration stage (hunt and attack)

Here, the coatis are divided into dual dissimilar sets. The initial cluster goes up the tree to scare the victim, whereas the next cluster stays below the tree to drop the scared target. The coati’s victim was denoted as \(\:Iguana\). Equation (23) states the 1st part of coatis that jumps up the tree. When the Iguana falls, its position is altered randomly depending upon Eq. (24). Equation (25) identifies the 2nd part of the coatis that stays underneath the tree. The BM has been combined in this stage. In contrast, the exploration escapes from sub-optimal performances and hunts in other areas with a vast ability to find global optimal value. This dynamic and adaptive procedure improves the BDCOA’s ability to exploit and explore the search space effectively. Every step of BM is exposed by the likelihood function meant by standard Gaussian distribution with unit variance \(\:\left({\sigma\:}^{2}=1\right)\) and mean \(\:(\delta\:=0)\). The likelihood density function of BM that is \(\:BM\left(y,\delta\:,\sigma\:\right)\) at a definite point, \(\:y\), was meant in Eq. (26).

$$\begin{gathered} X_{i}^{{P1}} \::x_{{i,j}}^{{P1}} = x_{{i,j}} + rand \ldots \:\left( {Iguana_{j} - I \ldots \:x_{{i,j}} } \right) \hfill \\ for\:i = {\text{1,2}}, \ldots \:,\left| {\frac{N}{2}} \right|\:and\:j = {\text{1,2}}, \ldots \:,m \hfill \\ \end{gathered}$$
$$\:Iguana^{G} :Iguana_{i}^{G} = lb_{j} + rand \cdot \:\left( {ub_{j} - lb_{j} } \right)$$
(24)
$$\:X_{i}^{{P1}} :x_{{i,j}}^{{P1}} = \left\{ {\begin{array}{*{20}l} {x_{{i,j}} + rand \cdot \:\left( {Iguana_{j}^{G} - I \cdot \:xi,j} \right),\:F_{{Iguana}} } \hfill & { < F_{i} } \hfill \\ {\:x_{{i,j}} + BM \cdot \:\left( {x_{{i,j}} - Iguana_{j}^{G} } \right),\:} \hfill & {else} \hfill \\ \end{array} } \right.$$
(25)
$$\:BM\left(y,\delta\:,\sigma\:\right)=\frac{1}{\sqrt{2\pi\:{\sigma\:}^{2}}}\:\text{e}\text{x}\text{p}\left(-\frac{(y-\delta\:{)}^{2}}{2{\sigma\:}^{2}}\right)=\frac{1}{\sqrt{2\pi\:}}\text{e}\text{x}\text{p}\left(-\frac{{y}^{2}}{2}\right)$$
(26)

Here, the novel position calculated for\(\:\:ith\) coati is meant as \(\:{X}_{i}^{P1},{\:x}_{i,j}^{P1}\) is in \(\:jth\) dimension, the novel position’s fitness value was signified as \(\:{F}_{i}^{P1}\) that amongst \(\:\left[\text{0,1}\right]\) is signified as \(\:r\), lower and upper values of \(\:jth\) variable were signified as \(\:l{b}_{j}\) and \(\:u{b}_{j},\:\)correspondingly; the prey’s position is represented as \(\:Iguana,\:Iguan{a}_{j}\) is its \(\:jth\) dimension, \(\:I\) denote an integer value, which is both 1 and 2, location of prey is indicated as \(\:Iguan{a}^{G}\), its \(\:jth\) dimension is expressed as \(\:Iguan{a}_{j}^{G},\:Iguan{a}^{G}{\prime\:}s\) value of fitness is represented as Iguana. When the fitness of \(\:{X}_{i}^{P1}\) of \(\:i\)th coati has enhanced fitness compared to the fitness of old position \(\:{X}_{i}\), it is unchanged. Or else, the old position has been sustained, as exposed in Eq. (27)

$$\:{X}_{i}=\left\{\begin{array}{l}{X}_{i}^{P1},\:{F}_{i}^{P1}<{F}_{i}\\\:{X}_{i},\:else\end{array}\right.$$
(27)

Every solution \(\:\left(coati\right)\) fitness is estimated as per RMS. It was calculated by employing Eqs. (28) and (29).

$$\:{F}_{i}=\text{m}\text{a}\text{x}\left\{\frac{1}{RMS}\right\}$$
(28)
$$\:RMS={\int\:}_{s}\int\:(R\left(r\right)-\overline{R\left(r\right)})r\cdot\:dr=\underset{s}{\iint\:}{\left[\left(A\left(r\right)-C\left(r\right)\right)-\overline{\left(A\left(r\right)-C\left(r\right)\right)}\right]}^{2}r\cdot\:dr$$
(29)

The residual stage is signified as \(\:R\left(r\right)=A\left(r\right)-C\left(r\right)\), and the standardized encircle is denoted as \(\:S.\)

Exploitation stage (escaping from the predators)

When a predator assaults the coati, the coati travels to a randomly generated position closer to the site, depending on Eqs. (30) and (31). The novel site was sufficient where the fitness is improved as per Eq. (32).

$$\:l{b}_{j}^{local}=\frac{l{b}_{j}}{t},\:u{b}_{j}^{local}=\frac{u{b}_{j}}{t},\:t=\text{1,2},\:\dots\:,\:T$$
(30)
$$\begin{gathered} \:X_{i}^{{P2}} \::x_{{i,j}}^{{P2}} = x_{{i,j}} + \left( {1 - 2rand} \right) \cdot \:\left( {lb_{j}^{{local}} + rand \cdot \:\left( {ub_{j}^{{local}} - lb_{j}^{{local}} } \right)} \right) \hfill \\ \:i = {\text{1,2}},\: \ldots \:,N,\:j = {\text{1,2}},\: \ldots \:,m \hfill \\ \end{gathered}$$
(31)
$$\:{X}_{i}=\left\{\begin{array}{l}{X}_{i}^{P2},\:{F}_{i}^{P2}<{F}_{i}\\\:{X}_{i},\:else\end{array}\right.$$
(32)

In this equation, \(\:P2\) means the location and fitness of coati, and the lower and upper limits of the \(\:jth\) decision variable are correspondingly represented as \(\:l{b}_{j}^{local}\) and \(\:u{b}_{j}^{local}\).

The FS is the major feature that manipulates the performance of BDCOA. The hyperparameter range procedure includes the solution-encoded technique to assess the effectiveness of the candidate solution. Here, the BDCOA deliberates accuracy as the foremost standard for projecting the FF. Its formulation is mathematically expressed below.

$$\:Fitness\:=\:\text{m}\text{a}\text{x}\:\left(P\right)$$
(33)
$$\:P=\frac{TP}{TP+FP}$$
(34)

Here, \(\:TP\) signifies the positive value of true, and \(\:FP\) represents the positive value of false.

Result analysis and discussion

In this part, the experimental validation of the T2MEDL-EASDDP technique is examined below the dual dataset, namely the ASD-Toddler dataset35 and the ASD-Adult dataset36. The ASD-Toddler dataset consists of 1000 samples under dual classes, as shown in Table 1. The total number of attributes is 17, but only 14 features have been selected.

Table 1 Details of the ASD-Toddler dataset.

Figure 5 exhibits the classifier results of the T2MEDL-EASDDP approach by the Toddler dataset. Figure 5a and b represents the confusion matrix using precise identification and classification of all class labels with 70%TRPH and 30%TSPH. Figure 5c shows the PR examination, which indicates higher performance than all class labels. Eventually, Fig. 5d demonstrates the ROC examination, revealing proficient outcomes with high ROC values for different classes.

Fig. 5
figure 5

Toddler Dataset (a-b) Confusion matrix, (c) curve of PR, and (d) curve of ROC.

Table 2 and Fig. 6 depict the ASD disorder detection of the T2MEDL-EASDDP technique on the Toddler dataset. The performances indicate that the T2MEDL-EASDDP technique appropriately recognized the samples. On 70%TRPH, the T2MEDL-EASDDP model provides an \(\:acc{u}_{y},\:\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F}_{measure}\), and \(\:AU{C}_{score}\:\)of 94.89%, 94.89%, 94.89%, 96.03%, and 94.89%, respectively. Also, on 30%TSPH, the T2MEDL-EASDDP model provides average \(\:acc{u}_{y},\:\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F}_{measure}\), and \(\:AU{C}_{score}\:\)of 97.79%, 97.79%, 97.79%, 97.96%, and 97.79%, respectively.

Table 2 ASD disorder detection of T2MEDL-EASDDP technique on toddler dataset.
Fig. 6
figure 6

Average of T2MEDL-EASDDP technique on Toddler dataset.

In Fig. 7, the training (TRA) \(\:acc{u}_{y}\) and validation (VAL) \(\:acc{u}_{y}\) performances of the T2MEDL-EASDDP approach on the Toddler dataset are depicted. The \(\:acc{u}_{y}\:\)values are computed through a range of 0–50 epochs. The figure indicated that the values of TRA and VAL \(\:acc{u}_{y}\) present an increasing trend, indicating the capability of the T2MEDL-EASDDP approach through enhanced performance across numerous repetitions. In addition, the TRA and VAL \(\:acc{u}_{y}\) values remain close through the epochs, notifying lesser overfitting and showing enhanced outcomes of the T2MEDL-EASDDP method, assuring steady prediction on unnoticed samples.

Fig. 7
figure 7

\(\:Acc{u}_{y}\) curve of T2MEDL-EASDDP model on Toddler dataset

The TRA loss (TRALOS) and VAL loss (VALLOS) graph of the T2MEDL-EASDDP technique by the Toddler dataset is shown in Fig. 8. The loss values are computed across an interval of 0–50 epochs. It is showcased that the values of TRALOS and VALLOS reveal a diminishing trend, which indicates the proficiency of the T2MEDL-EASDDP approach in corresponding tradeoffs between generalization and data fitting. The constant decrease in loss values also ensures the improved performance of the T2MEDL-EASDDP approach and tunes the prediction results after a bit.

Fig. 8
figure 8

Loss curve of T2MEDL-EASDDP model on Toddler dataset.

The ASD-Adult dataset consists of 680 samples under dual classes, as exposed in Table 3. The total number of attributes is 20, but only 15 have been selected.

Table 3 Details of ASD-Adult dataset.

Figure 9 states the classifier results of the T2MEDL-EASDDP methodology on the Adult dataset. Figure 9a and b illustrates the confusion matrix through specific classification and identification of all classes below 70%TRPH and 30%TSPH. Figure 9c shows the PR study, indicating enhanced performance in all classes. Eventually, Fig. 9d demonstrates the ROC study, which signifies proficient outcomes with high ROC values for different class labels.

Fig. 9
figure 9

Adult Dataset (a-b) Confusion matrix, (c) curve of PR, and (d) curve of ROC.

Table 4 and Fig. 10 show the ASD disorder detection of the T2MEDL-EASDDP approach on the Adult dataset. The performances indicate that the T2MEDL-EASDDP approach correctly recognized the samples. On 70%TRPH, the T2MEDL-EASDDP model provides average \(\:acc{u}_{y},\:\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F}_{measure}\), and \(\:AU{C}_{score}\:\)of 94.52%, 94.52%, 94.52%, 95.33%, and 94.52%, respectively. Similarly, on 30%TSPH, the T2MEDL-EASDDP model delivers average \(\:acc{u}_{y},\:\:sen{s}_{y}\), \(\:spe{c}_{y}\), \(\:{F}_{measure}\), and \(\:AU{C}_{score}\:\)of 93.19%, 93.19%, 93.19%, 94.77%, and 93.19%, respectively.

Table 4 ASD disorder detection of T2MEDL-EASDDP technique on the adult dataset.
Fig. 10
figure 10

Average of T2MEDL-EASDDP technique on the Adult dataset.

In Fig. 11, the TRA \(\:acc{u}_{y}\) and VAL \(\:acc{u}_{y}\) performances of the T2MEDL-EASDDP technique on the Adult dataset are exhibited. The \(\:acc{u}_{y}\:\)values are computed through an interval of 0–50 epochs. The figure implied that the TRA and VAL \(\:acc{u}_{y}\) values display an increasing trend, indicating the competency of the T2MEDL-EASDDP method with higher performance through multiple repetitions. Simultaneously, the TRA and VAL \(\:acc{u}_{y}\) values remain close across the epochs, notifying lower overfitting and showing the superior performance of the T2MEDL-EASDDP approach, guaranteeing reliable prediction on unnoticed samples.

In Fig. 12, the TRALOS and VALLOS graph of the T2MEDL-EASDDP approach on the Adult dataset is exposed. The loss values are computed across a range of 0–50 epochs. It is represented that the values of TRALOS and VALLOS show a diminishing tendency, which indicates the proficiency of the T2MEDL-EASDDP approach in harmonizing an exchange between generalization and data fitting. The constant decrease in loss values further assures the maximum performance of the T2MEDL-EASDDP technique and tunes the prediction results after a while.

Fig. 11
figure 11

\(\:Acc{u}_{y}\) curve of T2MEDL-EASDDP model on the Adult dataset

Fig. 12
figure 12

Loss curve of T2MEDL-EASDDP model on the Adult dataset.

Table 5 represents the comparative analysis of the T2MEDL-EASDDP model with existing methods37,38,39.

Table 5 Comparative analysis of the T2MEDL-EASDDP model with existing methods37,38,39.

Figure 13 illustrates \(\:sen{s}_{y}\) and \(\:spe{c}_{y}\) outcomes of the T2MEDL-EASDDP model with existing techniques. The simulation results implied that the T2MEDL-EASDDP model outperformed more outstanding performances. According to \(\:sen{s}_{y}\), the T2MEDL-EASDDP technique has maximal \(\:sen{s}_{y}\) of 97.79% whereas the 3D Grad-CAM, CNN, ASD-DiagNet, MVS-GCN, HOFC, CONCAT-SDAE, and MMSDAE models have diminish \(\:sen{s}_{y}\) of 84.14%, 86.87%, 70.77%, 65.54%, 71.23%, 94.27%, and 92.66%, correspondingly. Also, according to \(\:spe{c}_{y}\), the T2MEDL-EASDDP technique has a higher \(\:spe{c}_{y}\) of 97.79%. In contrast, the 3D Grad-CAM, CNN, ASD-DiagNet, MVS-GCN, HOFC, CONCAT-SDAE, and MMSDAE models have diminished \(\:spe{c}_{y}\) of 83.36%, 84.09%, 72.50%, 71.86%, 63.40%, 95.52%, and 97.17%, correspondingly.

Fig. 13
figure 13

\(\:Sen{s}_{y}\) and \(\:Spe{c}_{y}\) analysis of the T2MEDL-EASDDP approach with existing techniques

The comparison study of the T2MEDL-EASDDP method with existing techniques is illustrated in terms of \(\:acc{u}_{y}\), and \(\:{F}_{measure}\) in Fig. 14. The simulation result indicated that the T2MEDL-EASDDP technique outperformed more outstanding performances. According to \(\:acc{u}_{y}\), the T2MEDL-EASDDP model has enhanced \(\:acc{u}_{y}\) of 97.79%, whereas the 3D Grad-CAM, CNN, ASD-DiagNet, MVS-GCN, HOFC, CONCAT-SDAE, and MMSDAE techniques have diminished \(\:acc{u}_{y}\) of 83.72%, 85.40%, 71.73%, 71.66%, 73.36%, 94.85%, and 95.73%, correspondingly. According to \(\:{F}_{measure}\), the T2MEDL-EASDDP technique has a maximum \(\:{F}_{-measure}\) of 97.96%. In contrast, the 3D Grad-CAM, CNN, ASD-DiagNet, MVS-GCN, HOFC, CONCAT-SDAE, and MMSDAE techniques have minimum \(\:{F}_{measure}\) of 86.46%, 85.77%, 84.68%, 92.13%, 90.96%, 93.74%, and 95.34%, correspondingly.

Fig. 14
figure 14

\(\:Acc{u}_{y}\) and \(\:{F}_{measure}\) analysis of T2MEDL-EASDDP approach with existing models

Conclusion

In this paper, a novel T2MEDL-EASDDP model is developed. The main aim of the presented T2MEDL-EASDDP model is to analyze and diagnose the different stages of ASD in disabled individuals. The T2MEDL-EASDDP model has data normalization, dimensionality reduction using IBOA, three ensemble classification models, and a BDCOA-based parameter selection process to accomplish this. Initially, the T2MEDL-EASDDP model undergoes min-max normalization for data pre-processing to ensure that the input data is scaled to a uniform range. Next, the IBOA-based FS efficiently identifies the most relevant features and reduces dimensionality. In addition, an ensemble of DL holds three approaches: AE, LSTM, and DBN, which are employed for analyzing and detecting ASD. Finally, the presented T2MEDL-EASDDP model applies BDCOA to fine-tune the hyperparameters involved in the three ensemble methods. A wide range of simulation analyses of the T2MEDL-EASDDP technique are accomplished under the ASD-Toddler and ASD-Adult datasets. The performance validation of the T2MEDL-EASDDP method portrayed a superior accuracy value of 97.79% over existing techniques. The limitations of the T2MEDL-EASDDP method comprise the reliance on a single dataset, which may affect the model’s generalizability to diverse populations with varying characteristics. Moreover, the computational complexity of the proposed approach may need significant resources, limiting its accessibility for resource-constrained environments. The study also does not account for real-time, on-the-ground application, which is substantial for practical implementation. Furthermore, the models used may not fully capture the complexities and heterogeneity within ASD, as they primarily concentrate on a limited set of features. Future work should explore integrating multimodal data, such as genetic, neuroimaging, and behavioural information, to improve model robustness. Additionally, investigating methods for mitigating computational overhead while maintaining performance would make the approach more scalable. Lastly, more attention could be given to validating the model with real-world data to assess its practical utility in clinical settings.