Introduction

Indoor positioning is a rapidly expanding technology in the connected world that is revolutionizing the way they interact and navigate with indoor spaces1,2,3. It is worth noting that indoor positioning systems provide more precise location details within structures for instance, malls, airports, hospitals, or offices as opposed to GPS which works primarily outdoor4. Indoor positioning utilizes numerous methods like ultra-wideband, Wi-Fi, Bluetooth Low Energy, RFID, inertial navigation systems, computer vision, and magnetic positioning to ensure accurate location finding of people or objects within certain spaces, often within a high precision range of a few meters or even centimetres5,6. These technologies work together to find out where people or things are within a small space with a lot of accuracy most of the time within two or even one meter only7. Indoor positioning has got many various uses that keep increasing day by day. It could make shopping much more fun through tailored advertisements together with easy guidance and aid logistics or warehouse management by speeding up movement operation of goods inside warehouses8,9,10. The list of symbols utilized in this research article is shown.

A significant shift has occurred in the manner in which industrial operations are managed and enhanced with the adoption of indoor location tracking within manufacturing settings11. The efficiency of attribution systems used inside buildings is of paramount importance toward enhancement of workers’ safety and simplification warehouse management activities carried out indoor manufacturing plants and outlying production areas12,13.

Manufacturing plants are often great and complex, calling for meticulous organization and surveillance to guarantee smooth operation14,15. These environments can benefit from real-time monitoring of goods, equipment, and people through technological means like RFID tags as well as ultra-wideband beacons and computer vision systems16,17. Through this heightened sense of visibility, they are able to discover hitches, make their workflows better and use their resources prudently. This leads to increased productivity and cost savings7.

In the manufacturing environment, indoor positioning systems help improve worker safety. Real-time tracking and employee and machinery movement tracking helps IPS to receive warning that risks are likely to occur risk early in advance and that safety measures are maintained and risks avoided which would reduce mishappening and injury possibilities18. For instance, proximity sensors placed on portable gadgets or machines can alert workers about their proximity to dangerous equipment.

In addition, ensuring employee safety in various manufacturing scenarios cannot be achieved without precise indoor localization19. This means that indoor positioning system allows for tracing the movement of people and machines thus identifying possible hazards before they cause harm while verifying compliance with safety regulations that ultimately lead to reduced number of accidents and injuries19. At the same time integrating with the already installed industrial automation systems also needs special attention because it is experts who must carry it out6. The proposed deep spatial–temporal attention network (Deep-STAN)'s advanced indoor positioning capabilities can be applied in various fields, including retail and logistics. This can enhance customer engagement and increase sales. In logistics, Deep-STAN can be used for inventory management by accurately tracking the location of goods within large warehouses, helping optimize item retrieval, streamline warehouse operations, and reducing the time spent searching for products.

The major contributions of this study are as follows:

  • Applies robust signal processing and noise reduction techniques such as moving average filtering and outlier removal for purposes of cleaning and normalizing data.

  • The DBSCAN clustering method is used for identifying different indoor locations that have distinct groups.

  • Extract signal-based, spatial–temporal, motion, environmental, and statistical features. These various characteristics are helpful for greater understanding of intricate indoor environmental dynamics.

  • The model presented here combines long short-term memory (LSTM), visual transformers, attention mechanisms, and Convolutional Neural Networks (CNNs) to effectively capture spatial, temporal, and global dependencies in indoor fingerprint data.

  • The model’s performance is optimized using hybrid optimization techniques, including hyperparameter tuning and methods to overcome local minima.

The paper has been organized as follows, Sect “Introduction” of the paper provides an introduction, and Sect “Literature review” encloses recent literature related to the study. Moreover, the suggested approach has been discussed in Sect “Proposed methodology” and the result of the suggested model has been given in Sect “Experimental results”. Finally, the research has been concluded with a conclusion in Sect “Conclusions”.

Literature review

The summary of the literature works is manifested in Table 1. Liu et al.20 designed a visible light indoor positioning system which uses one LED, and a rotatable photo detector based on machine learning. This system applies to two major steps stored in what they call area classification and precise positioning. Nabati and Ghorashi21 introduced a novel indoor positioning system which depends on fingerprinting of the environment, deep learning technology as well as historical data. The purpose of developing KD-CNN algorithm by Mazlan et al.22 was to localize the objects within indoor spaces faster by exploiting information derived from a huge amount of convolutional neural network (CNN) models and using it for training less expensive models in which tasks are performed more quickly albeit at higher accuracies. A technique for indoor localization has been developed by Zhang et al.23 with the help of attention-augmented Residual CNN (RCNN) and Channel State Information (CSI) fingerprints which are utilized for tracking objects inside buildings. Liu et al.24 recommended using a Clustering-based Noise Elimination Scheme (CNES) that is suited to RSSI-based datasets. This technique employs density-based spatial clustering of applications with noise for clustering RSSIs in regions so as to eliminate noisy samples from the dataset. Laska and Blankenbach25 came up with a groundbreaking method for estimating position in wide and large indoor spaces. They presented a unified approach making use of just one neural network for training. A study on indoor 3D positioning algorithms was performed by Wang et al.26 using WiFi fingerprinting. Spatiotemporal features including a Temporal Convolutional Networks (TCN) which has been armed through dilated convolution, causal convolution, and residual connection were taken off by them using deep learning techniques.

Table 1 Summary of the reviewed literature.

Sammy, F., & Vigila, S. M. C27 suggested a distributed blockchain-based Ciphertext-Policy Attribute-Based Encryption (CP-ABE) approach is introduced to secure patient health records (PHRs) in cloud computing. Umran et al.28 blockchain-based private network is proposed for securing the circuit breaker system in the Al-Kufa/Iraq power plant. The system utilizes a multi-chain proof of rapid authentication (McPoRA) as a consensus mechanism to enhance computational performance and reduce latency. Shaikh, J. R., & Iliev, G29 developed a blockchain-based transaction processing system (TPS) to enhance security in E-commerce transactions. The system incorporates zero-knowledge proof (ZKP) and modified ECC to ensure privacy, authentication, integrity, and non-repudiation.

Proposed methodology

Figure 1 outlines the architecture of a proposed methodology for indoor positioning, utilizing a combination of data collection, pre-processing, data augmentation, and machine learning techniques. The process begins with data collection using mobile devices in an indoor environment, with the collected data stored in a database. During the pre-processing phase, the data undergoes moving average filtering, outlier removal, and Min–Max normalization to prepare it for further analysis. Data augmentation methods including rotation, translation, and synthetic noise addition are then applied. In indoor positioning systems (IPS), transformations such as rotation, translation, and noise addition significantly impact the model’s learning process and results. Rotation affects signal orientation, ensuring the model can accurately interpret data from various angles by exposing it to multiple orientations during training, which enhances robustness and generalization. Translation mimics user movement through different areas, allowing the model to associate specific signal patterns with varying locations, thereby improving localization accuracy as it learns to recognize similar patterns across spatial configurations.

Fig. 1
figure 1

Architecture of the proposed methodology.

Noise addition simulates real-world conditions where signals are distorted by environmental interference, helping the model become resilient to variations and enabling it to identify underlying patterns despite noise. The online phase involves feature extraction, where signal-based features, spatial and temporal features, motion features, environmental features, and statistical features are derived from the data. These features are input into the Deep-STAN positioning model, which predicts the location based on the processed and augmented data, resulting in accurate indoor positioning. Finally, S-box cryptography, blockchain integration, and QR code-based security for an accurate indoor positioning system. Integrating security components like Galois Field-based Elliptic Curve Cryptography (ECC), blockchain technology, and the S-box cryptographic transformations into the Deep-STAN model involve several key steps. First, ECC is applied to encrypt and decrypt signal data collected during the positioning process. This ensures that data transmitted between devices and servers is secure, preventing unauthorized access. The ECC’s low computational complexity enables real-time encryption without affecting the system’s latency, preserving performance.

Phase 1: offline phase

Data acquisition

The WiFi RSS Fingerprint Localization Dataset is commonly used for indoor positioning systems, leveraging Received Signal Strength (RSS) values from multiple WiFi access points to estimate a device’s location. The dataset is typically collected in controlled indoor environments such as university buildings, shopping malls, or office spaces, where signal strength varies due to walls, furniture, and human movement. Data collection usually spans multiple days, often ranging from a few days to several weeks, to capture variations in signal due to environmental dynamics. The amount of data collected depends on factors such as the number of reference points, access points, and time intervals between measurements, but many datasets contain thousands to hundreds of thousands of RSS readings across different locations. The size of Wi-Fi RSS Fingerprint Localization datasets varies significantly based on the scope and methodology of data collection. For instance, the WiSig dataset comprises approximately 10 million packets captured from 174 WiFi transmitters over a month-long period. In contrast, a dataset from Tampere University includes 446 reference points and 489 access points, resulting in a more modest dataset. Moreover, the Wi-Fi RSSI Dataset for Fingerprint-based Localization, which contains data from 250 locations with 27 detected Wi-Fi access points. Therefore, the dataset size can range from hundreds of data points in smaller-scale studies to millions in extensive collections.

Pre-processing

Signal processing and noise reduction are essential steps to assure the data collected is precise and dependable. It has various techniques to clean the data and standardize it for consistency.

Moving average filtering

The moving average output helps to reduce the short term “noise” in the data by smoothening out unrelated short-term fluctuations. This is very useful in improving the accuracy of many measurements carried out on wireless devices. It can be arithmetically given in Eq. (1),

$$RSSI_{i} = \frac{1}{N}\sum\limits_{j = 0}^{N - 1} {RSSI_{i - j} }$$
(1)

where \(RSSI_{i}\) represents the smoothed RSSI value at position \(i\), \(RSSI_{i - j}\) signifies the RSSI values within the window, and \(N\) indicates the number of points in the moving window.

Outlier removal

Outliers in RSSI data can significantly affect the accuracy of indoor positioning. These outliers can be identified and removed using the Z-score method and it can be arithmetically given in Eq. (2),

$$Z_{i} = \frac{{RSSI_{i} - \mu }}{\sigma }$$
(2)

where \(Z_{i}\) denotes the Z-score of \(RSSI_{i}\), \(\mu\) signifies the mean of the RSSI values, and \(\sigma\) denotes standard deviation. An RSSI value is considered an outlier if \(\left| {Z_{i} } \right| > k\), where \(k\) is typically set to 2 or 3.

Min–max normalization

Normalization scales the RSSI values to a common range, mitigating device-specific variations and ensuring consistency across different devices and environments it can be arithmetically given in Eq. (3),

$$RSSI^{\prime} = \frac{{RSSI - RSSI_{\min } }}{{RSSI_{\max } - RSSI_{\max } }}$$
(3)

where \(RSSI_{\min }\) and \(RSSI_{\max }\) are the min and max RSSI values in the dataset, respectively. This normalization scales the RSSI values to the range [0, 1].

Clustering and fingerprinting

Fingerprint classification involves organizing the preprocessed fingerprint data into distinct groups representing different indoor locations.

DBSCAN is a robust clustering algorithm well-suited for data with noise and clusters of varying shapes and sizes. The algorithm utilizes two key parameters: \(\min Pts\) and epsilon (\(\varepsilon\)). The notion of epsilon is used to denote the maximum distance that lies between two points which can still make them neighbors, whereas \(\min Pts\) is the minimum number of points needed to be regarded as a solid area identified with clusters.

DBSCAN iterates through the dataset to form clusters of density-reachable points and identify noise points that do not belong to any cluster.

Mathematically, let \(D\) be the dataset of RSSI fingerprints. For each point \(p\) in \(D\) and it can be arithmetically given in Eq. (4),

$$N_{\varepsilon } \left( p \right) = \left\{ {q \in D\left| {dis\tan ce\left( {p,q} \right) \le \varepsilon } \right.} \right\}$$
(4)

A point \(p\) is a core point if \(\left| {N_{\varepsilon } \left( p \right)} \right| \ge \min Pts\). A point \(p\) is directly density-reachable from \(q\) it \(p \in N_{\varepsilon } \left( p \right)\) and \(q\) is a core point. The distance metric used is often Euclidean distance.

Labeling clusters with location coordinates

Once the clusters are identified using DBSCAN, the next step is to label each cluster with corresponding location coordinates. This involves determining a representative point, usually the centroid, for each cluster. The centroid can be calculated by averaging the coordinates of all points in the cluster.

For \(C_{i}\) containing points \(p_{1} ,p_{2} , \ldots ,p_{n}\),where each point \(p_{j}\) has coordinates \(\left( {x_{j} ,y_{j} } \right)\), the centroid is computed in Eq. (5)

$$C_{i} = \left( {\frac{1}{n}\sum\limits_{j = 1}^{n} {x_{j} ,\frac{1}{n}\sum\limits_{j = 1}^{n} {y_{j} } } } \right)$$
(5)

Every point in cluster \(C_{i}\) is then labeled with the coordinates of this centroid and it is given in Eq. (6),

$$label\,of\,p_{j} \in C_{i} = \left( {Centroid_{x} ,Centroid_{y} } \right)$$
(6)

Each data point is assigned to its location coordinates based on the cluster it belongs to in the process of marking. This labeled dataset will establish a strong base for deep learning models that are being trained to solve indoor positioning issues.

Phase 2: online phase

Feature extraction

Building an indoor positioning system requires one to gather beneficial data from raw data. How this is done is by getting useful features that machine learning models can use for predicting the position of a device in a building. Positioning models use the derived attributes to evaluate accurately the location of a device using observed data.

(i) Signal-Based Features: Received Signal Strength Indicator (RSSI), Signal-to-Noise Ratio (SNR), Channel Information and Signal Stability.

(ii) Spatial and Temporal Features: Location Coordinates and Time-Based Features.

(iii) Motion features: Motion State.

(iv) Environmental Features: Room and Floor Identification.

(v) Statistical Features: Histogram of RSSI Values.

Hybrid optimization for deep CNN

A Hybrid Optimization algorithm is employed for hyperparameter tuning (weight optimization) and to escape local minima, ensuring the model achieves optimal performance. Here the exploration phase from reptile search is employed and the exploitation phase from tuna optimization is employed.

Reptile search algorithm

The algorithm of reptile search is metaphorically depicted through the hunting habits of crocodiles that exist in the jungles. With two primary processes, it is about surroundings and hunting. These two sequences change by dividing the number of iterations into four.

Initialization

The search method of the reptile starts by randomly forming an initial set of potential solutions and it is shown in Eq. (7),

$$z_{jl} = rand \times \left( {UB - LB} \right) + LB,l = 1,2, \ldots ,n$$
(7)

The initiating matrix is referred to as \(z_{jl}\), where \(j\) varies from \(1,2, \ldots ,P\). \(P\) here is the size of the population (rows of the initiating matrix), while \(n\) represents the dimensions (columns of the initiating matrix) of the current optimization problem. \(LB\) is the short form for lower bound, \(UB\) for the upper limit, whereas rand for randomly generated values.

The fitness is computed as

$$Fitness = \min \left( {Error} \right)$$
Encircling (exploration)

The encircling phase is about exploring a high-density area. This phase requires walking and belly movements that copy crocodile movements which are so critical. These are not meant to catch prey but just to move long distances. Moreover, it can be arithmetically given in Eq. (8),

$$\begin{array}{*{20}c} {z_{jl} \left( {\chi + 1} \right) = Best_{l} \left( \chi \right) \times \left( { - \eta_{jl} \left( \chi \right)} \right) \times \alpha - \left( {T_{jl} \left( \chi \right) \times rand} \right),\,\,\chi \le \frac{{\chi_{\max } }}{4}} \\ {z_{jl} \left( {\chi + 1} \right) = Best_{l} \left( \chi \right) \times z_{{\left( {s_{1} ,l} \right)}} \times EV\left( \chi \right) \times rand,\chi \le 2\frac{{\chi_{\max } }}{4}and\chi > \frac{{\chi_{\max } }}{4}} \\ \end{array}$$
(8)

At \(l^{th}\) position, \(Best_{l} \left( \chi \right)\) depicts the finest solution identified as well as \(\chi\) represents the ongoing iteration while \(rand\) is an arbitrary number while \(\chi_{\max }\) is the maximum iterations. Hunting service’s amount to solution \(j\) in position \(l\) is reflected upon by \(\eta_{jl}\). The value of m \(\eta_{jl}\) is obtained by means of the following Eq. (9),

$$\eta_{{\left( {\eta_{j,l} } \right)}} = Best_{l} \left( \chi \right) \times Q_{{\left( {j,l} \right)}}$$
(9)

The sensitivity of parameter \(\alpha\) shows how accurate the exploration is, while \(G_{{\left( {j.l} \right)}}\) represents a different function, through which the exploration area is reduced in the following Eq. (10),

$$G_{{\left( {j.l} \right)}} = \frac{{Best_{l} \left( \chi \right) \times Q_{{\left( {s_{2} ,l} \right)}} }}{{Best_{l} \left( \chi \right) + \tau }}$$
(10)

In this case, \(s_{1}\) is to be taken as a random integer between 1 and \(N\), where \(N\) is the total number of candidate solutions. The random position for the \(l^{th}\) solution is given as \(z_{{\left( {s_{1} ,l} \right)}}\). On the other hand, \(s_{2}\) is a random integer in the interval between 1 and \(N\) but \(\tau\) is assumed to be a small positive value. The mathematical expression of Evolutionary Sense \(EV\left( \chi \right)\) is denoted as given in Eq. (11).

$$EV\left( \chi \right) = 2 \times s_{3} \times \left( {1 - \frac{1}{{\chi_{\max } }}} \right)$$
(11)

where \(s_{3}\) is any random number. \(Q_{{\left( {j,l} \right)}}\) can be calculated using Eq. (12),

$$Q_{{\left( {j,l} \right)}} = \beta + \frac{{z_{{\left( {j,l} \right) - AP\left( {z_{j} } \right)}} }}{{Best_{l} \left( \chi \right)z\left( {UB_{\left( l \right)} - LB_{\left( l \right)} } \right) + \varphi }}$$
(12)

where \(\beta\) is the sensitivity limit that determines exploration accuracy. \(AP\left( {z_{j} } \right)\) represents the average position of the \(j^{th}\) solution and can be determined using Eq. (13),

$$AP\left( {z_{j} } \right) = \frac{1}{n}\sum\nolimits_{l = 1}^{n} {z_{{\left( {j,l} \right)}} }$$
(13)
Hunting (exploitation)

Hunting is divided into two stages which is hunting coordination for cases when iterates lie in \(\chi \le 3\frac{{\chi_{\max } }}{4}and\,\chi > 2\frac{{\chi_{\max } }}{4}\), while hunting cooperation happens when \(\chi \le \chi_{\max } and\,\chi > 3\frac{{\chi_{\max } }}{4}\). Stochastic coefficients are used to search the local search space in order to generate optimal solutions. In Eqs. (14), (15), exploitative operations are applied:

$$z_{{\left( {j,l} \right)}} \left( {\chi + 1} \right) = Best_{l} \left( \chi \right) \times \left( {Q_{{\left( {j,l} \right)}} } \right) \times rand,\,\,\,\chi \le 3\frac{{\chi_{\max } }}{4}and\,\chi > 2\frac{{\chi_{\max } }}{4}\,$$
(14)
$$z_{{\left( {j,l} \right)}} \left( {\chi + 1} \right) = Best_{l} \left( \chi \right) - \eta_{{\left( {\eta_{j.l} } \right)}} \left( \chi \right) \times \varphi - \left( \chi \right) \times rand,\,\chi \le \chi_{\max } and\,\chi > 3\frac{{\chi_{\max } }}{4}$$
(15)

\(Best_{l} \left( \chi \right)\) in this case denotes the \(l^{th}\) position attained in the top solution during this iteration, whereas \(\eta_{{\left( {\eta_{j,l} } \right)}}\) signifies the hunting operator.

Tuna optimization algorithm

Parabolic foraging

Tuna love herrings and eels more than any other kind of fish, they use their power of contra-directional movement while being pursued by enemies so that it becomes impossible for them to be caught and eaten. Whenever they attack, the prey’s motion provides a blueprint pattern which the hunters use by covering it in a curved line and it can be given in Eqs. (16), (17),

$$Z_{i}^{{t + 1}} = \left\{ {\begin{array}{*{20}c} {Z_{{best}}^{t} + rand \cdot \left( {Z_{{best}}^{t} - Z_{i}^{t} } \right) + RV \cdot q^{2} \cdot \left( {Z_{{best}}^{t} - Z_{i}^{t} } \right),\,\,\,if\,rand < 0.5} \\ {RV \cdot q^{2} \cdot Z_{i}^{t} ,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,rand \ge 0.5} \\ \end{array} } \right.$$
(16)
$$q = \left( {1 - \frac{\chi }{{\chi_{\max } }}} \right)^{{\left( {{\chi \mathord{\left/ {\vphantom {\chi {\chi_{\max } }}} \right. \kern-0pt} {\chi_{\max } }}} \right)}}$$
(17)

where \(\chi\) represents the current iteration, \(\chi_{\max }\) represents predefined maximum. \(RV\) is randomly chosen at − 1 or 1.

Spiral foraging

Apart from the parabolic foraging strategy, there is an alternate effective cooperative approach known as the spiral foraging strategy. This approach is described mathematically in Eq. (18),

$$Z_{i}^{{t + 1}} = \left\{ {\begin{array}{*{20}c} {\beta _{1} \cdot \left( {Z_{{rand}}^{t} + \rho \cdot \left| {Z_{{rand}}^{t} - Z_{i}^{t} } \right| + \beta _{2} \cdot Z_{i}^{t} } \right),\,i = 1} \\ {\beta _{1} \cdot \left( {Z_{{rand}}^{t} + \rho \cdot \left| {Z_{{rand}}^{t} - Z_{i}^{t} } \right| + \beta _{2} \cdot Z_{{i - 1}}^{t} } \right),\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,rand < \frac{t}{{t_{{\max }} }},i = 2,3, \ldots ,P} \\ {\beta _{1} \cdot \left( {Z_{{best}}^{t} + \rho \cdot \left| {Z_{{rand}}^{t} - Z_{i}^{t} } \right| + \beta _{2} \cdot Z_{i}^{t} } \right),\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,rand \ge \frac{t}{{t_{{\max }} }},\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,i = 1} \\ {\beta _{1} \cdot \left( {Z_{{rand}}^{t} + \rho \cdot \left| {Z_{{best}}^{t} - Z_{i}^{t} } \right| + \beta _{2} \cdot Z_{{i - 1}}^{t} } \right),\,\,\,\,\,\,\,\,\,\,\,\,\,i = 2,3, \ldots .P} \\ \end{array} } \right.$$
(18)

where \(Z_{i}^{t + 1}\) is one of the tunas in the \(t + 1\) round and refer to it as the \(i^{th}\) fish at this point. \(Z_{best}^{t}\) is a way to denote the current top best solution while \(Z_{rand}^{t}\) stands for an arbitrary reference one from the shoal of fish. The amount of pull each member has towards tips or neighbors respectively is directed by coefficient \(\beta_{1}\) whereas other tunas’ movement is determined by \(\beta_{2}\). On top of that, parameter \(\rho\) plays a role in determining the gap between individual tunas as well as optimal or randomly selected points of reference. This model’s expression is as given in Eq. (19), (21), (22),

$$\beta_{1} = b + \left( {1 - b} \right) \cdot \frac{t}{{t_{\max } }}$$
(19)
$$\beta_{2} = \left( {1 - b} \right) - \left( {1 - b} \right) \cdot \frac{t}{{t_{\max } }}$$
(20)
$$\rho = e^{cu} \cdot \cos \left( {2\pi c} \right)$$
(21)
$$u = e^{{3\cos \left( {\left( {\left( {t_{\max } + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 t}}\right.\kern-0pt} \!\lower0.7ex\hbox{$t$}}} \right) - 1} \right)\pi } \right)}}$$
(22)

where A is a constant, which shows how much tuna fish attracts, while b is a random number from 0 to 1 evenly spread across the spectrum.

Algorithm 1
figure a

Hybrid reptile tuna optimization algorithm.

Positioning

The Deep Spatial–Temporal Attention Network proposed is a hybrid classification model that blends CNN, Visual Transformers, LSTM, and attention mechanisms.

CNN

A CNN is a type of deep learning network that is designed for grid-like data, such as images. One reason for their popularity is that CNNs can detect spatial patterns as well as relationships between various parts of the data by using special convolutional and pooling layers in addition to conventional fully connected layers.

LSTM

In indoor positioning, LSTM is illustrated in Fig. 2 is used to extract temporal features from the collected signal data, such as RSSI sequences, device motion patterns, and other time-dependent contextual information. By capturing these temporal dynamics, LSTMs contribute to more accurate and reliable location estimation.

Fig. 2
figure 2

Architecture of LSTM.

The features extracted by CNNs capture spatial dependencies, while the features extracted by LSTMs capture temporal dependencies. The final output from the ViT is used for precise location prediction is given in Eq. (23).

$$f_{concat} = \left[ {f_{CNN} ,f_{LSTM} } \right]$$
(23)

Vision transformer (ViT)

The concatenated feature vector is input into the Vision Transformer as illustrated in Fig. 3.

Fig. 3
figure 3

Architecture of ViT.

The concatenated feature vector \(f_{concat}\) is input into the Vision Transformer.

Input embedding

Moreover, the following Eq. (24) shows the mathematical deliberation for input embedding.

$${z}_{0}=Linear\left({f}_{concat}\right)+{E}_{pos}$$
(24)

where \(f_{concat}\) denotes the concatenated feature vector. \(Linear\left( . \right)\) represents a linear transformation (fully connected layer). \(E_{pos}\) represents positional embeddings that encode the position information of the input sequence.

Self-attention mechanism

ViT applies self-attention to the input embeddings to capture relationships between diverse portions of the sequence and it can be mathematically given in Eqs. (25), (26), (27), (28)

$$Q = W_{Q} X$$
(25)
$$K = W_{K} X$$
(26)
$$V = W_{V} X$$
(27)
$$A = soft\max \left( {\frac{{QK^{T} }}{{\sqrt {d_{k} } }}} \right)V$$
(28)

where \(X\) is the input sequence, \(Q\), \(K\), and \(V\) are the query, key, and value matrices, correspondingly. \(W_{Q}\), \(W_{K}\), and \(W_{V}\) are learnable weight matrices. \(d_{k}\) signifies the dimensionality of the key vectors.

Phase 3: security enhancement with Galois field-based cryptographic primitives

A key innovation of Deep-STAN is its use of Galois Field-based Elliptic Curve Cryptography (ECC) for securing signal data at various stages of the IPS. This cryptographic method operates within finite fields, offering:

Elliptic Curve Diffie-Hellman (ECDH): Secure exchange of keys between devices collecting and analyzing signal data.

Elliptic Curve Diffie-Hellman or ECDH, is a type of cryptographic protocol that makes possible a safe key exchange between two entities. This fact makes it appropriate for applications with sensitive data in transmission. The mathematical properties underlying this protocol are characteristics related to the elliptic curves, which may be defined by the following Eq. (29),

$$E:{y}^{2}={x}^{3}+ax+b$$
(29)

Here, \(a\) and \(b\) ensure the curve is non-singular. ECDH operates over finite fields, typically \({F}_{p}\).

Process for key exchange

  1. 1.

    Parameter Selection: Select an elliptic curve \(E\) and a base point \(G\).

  2. 2.

    Generation of Key:

    • Sensor 1 selects a private key \(a\) and computes the public key:

      $${P}_{A}=a\cdot G$$
    • Sensor 1 selects a private key \(a\) and computes the public key:

      $${P}_{A}=a\cdot G$$
    1. 3.

      Exchange of Public Key: Sensor 1 and Sensor 2 exchange their public key \({P}_{A}\) and \({P}_{B}\)

    2. 4.

      Shared Secret Computation:

    • Sensor 1:

      $${S}_{A}=a\cdot {P}_{B}$$
    • Sensor 2:

      $${S}_{B}=b.{P}_{A}$$

They both arrive to the same mutual secret:

$${S}_{A}={S}_{B}=ab\cdot G$$

ECDH is one of the key exchange mechanisms and relies on elliptic curves in safeguarding the information exchanged during communication. In these applications where sensitive information needs to be transferred, it becomes a necessary tool. Its efficiency coupled with strong security suggests its preference in modern systems of cryptography.

Elliptic curve digital signature algorithm (ECDSA)30

Elliptic Curve Digital Signature Algorithm is a cryptographic protocol. Its purpose is a version of the Digital Signature Algorithm. It employs elliptic curve scalar multiplication instead of modular exponentiation for implementation purposes. An elliptic curve \(E\) over a prime field \({F}_{p}\) is determined as \({E}_{p}\left(a,b\right):{y}^{2}={x}^{3}+ax+b modp,\) where \(p>3, a,b\in {F}_{p}\) and the condition \(4{a}^{3}+27{b}^{2}modp\ne 0\) is satisfied. The elliptic curve group \(E({E}_{p})\) contains all such points \((x,y)\) which satisfy the elliptic curve \({E}_{p}\left(a,b\right)\) and point at the infinity \({O}_{\infty }.\)

Galois field arithmetic (GF(2n))31

A Galois field \(GF({2}^{m})\) is a finite field of size \({2}^{m},\) where \(m\) is the number of bits per element. For each element \(a\epsilon GF({2}^{m})\) also the addition and multiplication in the Galois field are determined as Eqs. (30), (31), (32). The Galios field variations and their parameters are manifested in Table 2.

$$a={a}_{m-1}{x}^{m-1}+{a}_{m-2}{x}^{m-2}+\cdots +{a}_{1}x+{a}_{0}$$
(30)
$$a+b=\left({a}_{m-1}\oplus {b}_{m-1}\right){x}^{m-1}+\cdots +({a}_{0}\oplus {b}_{0})$$
(31)
$$a\times b=\left(a\left(x\right)\cdot b\left(x\right)\right)mod P(x)$$
(32)

where \({a}_{i}\epsilon GF\left(2\right)=\left\{\text{0,1}\right\}\) and \(\oplus\) denotes XOR. In the encryption process, plain text block \(P\) is divided into blocks \({P}_{1},{ P}_{2},\cdots ,{ and P}_{N}\) of length \(L\) (in bits), where \({P}_{i}\) is of size \({m}_{i}\), such that \({P}_{i}\epsilon GF\left({2}^{{m}_{i}}\right).\) Then, apply the arithmetic and multiplication operations for each block to correct the data. After the block \({P}_{i}\) of the encryption is performed, then field size \({m}_{i}\) is performed for the next block \({P}_{i+1}\).

Table 2 Galois field (GF) variations and their parameters.

Phase 4: system security with QR codes and blockchain integration

Blockchain integration

It makes use of blockchain to maintain decentralized ledgers of location fingerprints and QR code data. The projected blockchain incorporated architecture is manifested in Fig. 4.

Fig. 4
figure 4

Proposed blockchain integration architecture.

This will record all the scans and updates made by the respective QR codes so that there is an immutable history of location tags. Blockchain technology is integrated into the proposed indoor positioning system (IPS) to enhance the security, integrity, and immutability of the positioning data. In this system, blockchain plays a crucial role in ensuring that the location data, once recorded, cannot be tampered with or altered, providing a secure and transparent history of the user’s movement. When users’ positions are tracked in indoor environments using Wi-Fi, Bluetooth, and magnetometers, the system records and stores positioning data in the form of encrypted transactions. These transactions are then logged into a blockchain, where each new entry is linked to the previous one, creating a secure and unchangeable record of the user’s movement. The novel application of blockchain in this indoor positioning system (IPS) lies in its combination with cryptographic methods such as Error Correction Codes (ECC) and Secure Box (S-box) for enhanced security. By incorporating ECC, a cryptographic technique based on Galois fields, the system ensures that the signal data, transmitted over potentially insecure networks, remains protected against potential interference or attacks. ECC provides low-latency encryption and decryption, which is essential for real-time positioning applications, ensuring that security measures do not impede system performance. Moreover, the use of S-box further strengthens the system’s resilience by introducing a mechanism for obfuscating the data, adding an additional layer of protection against unauthorized access or manipulation.

Phase 5: blockchain-based storage and security

In modern IPS, security, integrity, and authenticity of location data become the core issues of maintaining and processing.

Immutability: Once a signal fingerprint has been recorded, it becomes part of a permanent ledger. This means prior location data can always be referenced or verified without the possibility of tampering.

Verifiability: Since every block is cryptographically coupled to the previous block, any method that could alter the information will be detected immediately.

Cryptographic security: Metadata is cryptographically signed before being written to the blockchain. Strong digital signatures ensure that a given piece of information really comes from where it claims to originate, and that information is not tampered with after the fact. Should anyone try to tamper with or alter the data, their signature would be invalidated and their crime easily detected by the recipient.

This S-box is essential for ensuring that patterns in the plain text are obscured in the ciphertext. The main source of nonlinearity in symmetric-key algorithms is substitution boxes or S-boxes. S-boxes are vectorial Boolean functions that map a predetermined number of input bits to a predetermined number of output bits. A formal definition of a \(n\times m\) S-box is determined in Eq. (33)

$$S:F_{2}^{n} \to F_{2}^{m}$$
(33)

where, \(F_{2}^{n}\) and \(F_{2}^{m}\) represents vector spaces over the Galois field \(GF(2)\) with dimensions \(n\) and \(m\). The cryptographic strength of an S-box is defined through various critical properties.

Non-linearity: The measure of the distance between the S-box and the set of entire affine functions. For an \(n\times n\) S-box, the nonlinearity is determined in Eq. (34),

$$NL\left( S \right) = {{2^{n - 1} - \frac{1}{2}a \in F_{2}^{n} ,b \in F_{2}^{n} } \mathord{\left/ {\vphantom {{2^{n - 1} - \frac{1}{2}a \in F_{2}^{n} ,b \in F_{2}^{n} } {0\,\max \left| {\sum\nolimits_{{x \in F_{2}^{n} }} {\left( { - 1} \right)^{b \cdot S\left( x \right) \oplus a \cdot x} } } \right.}}} \right. \kern-0pt} {0\,\max \left| {\sum\nolimits_{{x \in F_{2}^{n} }} {\left( { - 1} \right)^{b \cdot S\left( x \right) \oplus a \cdot x} } } \right.}}$$
(34)

Were. represents the dot product and \(\oplus\) denotes bitwise XOR.

Differential uniformity: When the input is changed it enumerates the uniformity of output changes. The differential uniformity is determined in Eq. (35),

$$\delta = a \ne 0,b\max \left| {x \in F_{2}^{n} :S\left( x \right) \oplus S\left( {x \oplus a} \right) = b} \right|$$
(35)

Algebraic degree: The highest degree between the component Boolean functions of S. The algebraic degree for an \(n\times m\) is defined in Eq. (36),

$$\deg \deg \left( S \right) = v \in F_{2}^{m} \backslash 0\max \deg \left( {v \cdot S} \right)$$
(36)

Balancedness: An S-box is balanced if each output occurs with equal probability when the input is uniformly distributed.

Algebraic immunity: A measure of resistance against algebraic attacks. For an S-box \(S:F_{2}^{n} \to F_{2}^{m}\), the algebraic immunity is determined in Eq. (37),

$$AI\left(S\right)=min\{degdeg \left(P\right) ,P\epsilon I(S)\}$$
(37)

where, \(I\left(S\right)\) is the ideal generated through the polynomials representing the S-box in Eq. (38),

$$I\left(S\right)=\left({y}_{1}-{f}_{1}\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right), {y}_{2}-{f}_{2}\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right), \cdots , {y}_{m}-{f}_{m}({x}_{1},{x}_{2},\cdots ,{x}_{n})\right)$$
(38)

S-boxes provide algebraic immunity, which is essential to their defense against cryptanalytic attacks. It is computed by building the ideal’s smallest reduced Gröbner basis and identifying the lowest degree polynomial. This idea of measuring cipher resistance was first presented by Faugère and Perret. Consider a Boolean function \(f_{s} :F_{2}^{n + m} \to F_{2}\) is defined in Eq. (39),

$${f}_{s}\left({x}_{1},{x}_{2},\cdots ,{x}_{n},{y}_{1,}{y}_{2},\cdots ,{y}_{m}\right)=\{1, if \forall i,j:{f}_{i}\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)={y}_{j}; 0, if \exists i,j:{f}_{i}\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)\ne {y}_{j}.$$
(39)

The algebraic immunity of the S-box \(S\) is equivalent to the minimum degree of non-zero polynomials in the annihilator of \({f}_{s}\) is determined in Eq. (40),

$$AI\left(S\right)=degdeg \left(g\right) | g\in Ann({f}_{s})$$
(40)

In indoor positioning systems (IPS), the algebraic properties of cryptographic algorithms—nonlinearity, differential uniformity, and algebraic immunity—are essential for enhancing security against various attack vectors.

Tamper-resistance: Since blockchain is a decentralized ledger of sorts that, through its consensus mechanisms, relies on getting power over the majority of the network, any attempt to alter the underlying data would be highly impractical in most blockchain systems.

Cryptographic verification

Data integrity: When a QR code is generated for a location, the data is encrypted using Galois Field-based ECC. This means that the location data, once written to the blockchain, cannot be read, or modified without the appropriate decryption keys, ensuring that only authorized users can access or alter the data.

Verification process: When the QR code is scanned, the system retrieves the associated data from the blockchain. To verify its integrity, the system checks the cryptographic hash of the retrieved data against the hash stored in the blockchain. If the hashes match, it confirms that the data has not been tampered with since it was recorded.

Access control: Users can be granted specific permission to read or write data to the blockchain. When a QR code is scanned, the system verifies the user’s credentials through cryptographic signatures.

Experimental results

The suggested approach has been implemented in Python for a precise indoor positioning system. The proposed Deep-STAN method is tested along with existing techniques such as RF20, DNN21, CNN22, and TCN26 on the dataset available WiFi RSS Fingerprint Localization Dataset (https://www.kaggle.com/datasets/tareqalhmiedat/wifi-rss-fingerprint-dataset?select=RSSISensors_Large.csv). Here, 70% and 80% of data are utilized for training the model and the remaining data is utilized for assessing the performance. The analysis is based on the metrics such as precision, NPV, FNR, sensitivity, accuracy, MCC, FPR, specificity, and F-measure.

Metrics analysis

The metrics utilized for validating the proposed model are shown in Table 3.

Table 3 Metrics evaluation.

Analysis of the suggested model for 70% training data

The comparison of the proposed approach with the existing models such as RF20, DNN21, CNN22, and TCN26 with 70% of the database used for training. To compare the results of each approach, accuracy, sensitivity, MCC, and FNR metrics were used. The findings (shown in Table 4 and Fig. 5) also show that the newly proposed system is more efficient than the existing techniques.

Table 4 Performance analysis with 70% training data.
Fig. 5
figure 5

Graphic representation of (a) accuracy, (b) precision, (c) sensitivity, (d) specificity, (e) F-measure, (f) MCC, (g) NPV, (h) FPR, (i) FNR for proposed and other existing models.

As for the accuracy, the proposed system yields 99.37% which is significantly higher than RF20 with 84.29% and all the other models, and this is evident from Table 5 and Fig. 6, respectively. This high accuracy is due to the incorporation of the DBSCAN clustering method that increases the accuracy of the indoor positioning system. For sensitivity, the proposed model yields 98.98%, which is significantly higher than the competitors, indicating the model’s capability of identifying correct instances. Furthermore, the proposed system yields an MCC of 98.74% which is higher than RF at 70.82%20, DNN at 78.38%21, CNN at 80.20%22 and TCN at 70.99%26. It is important to note that the incorporation of the Galois Field cryptography enhanced the sensitivity of the system and the MCC rate.

Table 5 Performance analysis with 80% training data.
Fig. 6
figure 6

Graphic representation of (a) accuracy, (b) precision, (c) sensitivity, (d) specificity, (e) F-measure, (f) MCC, (g) NPV, (h) FPR, (i) FNR for proposed and other existing models.

Analysis of the suggested model for 80% training data

A comparison of the proposed approach with related methods like RF20, DNN21, CNN22, and TCN26 has been done, and the results acquired are manifested in Table 5 and Fig. 6, respectively. The comparison is based on evaluation metrics like accuracy, precision, Negative Predictive Value (NPV), and False Positive Rate (FPR). The findings also demonstrate that the proposed method has improved performance than the existing models in terms of all the evaluated measures.

Concerning the accuracy, the proposed approach obtains 98.04% which is higher than the RF20 with 88.02%, CNN22 with 91.12%, and all other models. Furthermore, in precision, the proposed method achieves 97.22%, which is higher than those of the methods compared. This higher precision, which is essential for the stability and security of the indoor positioning system (IPS), is made possible by the use of QR codes and blockchain technology. For NPV, the proposed approach achieved 98.77%, which is higher than the NPV of TCN26 at 93.62% and other related models. The addition of the Deep-STAN is also helpful in improving the performance of the system in NPV, and other measures. Also, the proposed approach has the lowest FPR of 0.0244, while the DNN21 model has the highest FPR of 0.1875.

These findings substantiate the fact that the proposed method enhances the accuracy and reliability of the indoor positioning system than the existing methods.

Rank-based analysis on cryptographic techniques

Ranked analysis on cryptographic techniques has performance metrics evaluation in Table 6. The Key Size is defined as the length of key which is needed for 128-bit security purposes. Security Level can be translated as encryption strength against attacks, higher values imply tougher security. Efficiency is measured in computational speed or how it uses the resources, the higher the measure the faster the processing. Latency is the time spent doing either encryption or decryption. Complexity of Implementation refers to the difficulty in implementing the encryption algorithm into hardware or software. Hardware support is the degree of optimization achieved for running on specialized hardware like FPGA or ASIC. Resistance to Side Channel Attacks is denoting the ability of the algorithm to resist hardware-level vulnerabilities like power consumption or timing analysis. Finally, Scalability means that an algorithm is able to adapt well to support very large applications or growing data loads so that it would still be possible to fit its operations to different environments.

Table 6 Performance metrics evaluation for rank-based analysis on cryptographic techniques.

In Table 7, the suggested cryptographic technique, namely Galois Field-Based ECC, is evaluated with the current cryptographic techniques such as RSA and AES. They are compared based on performance metrics like security level, efficiency, latency, implementation complexity, hardware support, resistance to side-channel attacks, scalability, and applications in IPS suitability.

Table 7 Rank based comparative analysis on cryptographic techniques.

Figure 7 shows the comparison of Galois Field-Based ECC, RSA, and AES across multiple metrics highlights ECC’s superiority in security, efficiency, and scalability, making it highly suitable for Indoor Positioning Systems (IPS). ECC achieves a higher security level (9) than RSA (5) and AES (7) due to its reliance on elliptical curve mathematics, which provides robust encryption with smaller key sizes. It also exhibits higher efficiency (9), and lower latency (9) compared to RSA (5), making it ideal for real-time applications like IPS. While AES matches ECC in efficiency and latency, ECC outperforms in scalability (9 vs. 3 for RSA), enabling seamless expansion in large-scale IPS environments. Although its implementation complexity (6) is slightly higher than RSA (5) and AES (5), its enhanced resistance to side-channel attacks (7 vs. 6 for RSA and 9 for AES) ensures data security in dynamic indoor settings. However, ECC’s hardware support (5) is lower than AES (9), indicating that specialized hardware may be required for optimized performance. Given its perfect suitability score (10) for IPS, ECC stands out as the most secure, efficient, and scalable encryption method for safeguarding positioning data in smart manufacturing, logistics, and other indoor applications.

Fig. 7
figure 7

Comparative analysis on cryptographic techniques.

SHapley additive exPlanations (SHAP) analysis

In Fig. 8(a–l), SHAP analysis between the proposed Galois Field-Based ECC with the RSA and AES is graphically represented.

Fig. 8
figure 8figure 8figure 8

(al): SHAP analysis between the proposed Galois field-based ECC with the RSA and AES.

The integration of cryptography into indoor positioning systems (IPS) significantly enhances security, which in turn influences various performance metrics such as sensitivity, Matthews Correlation Coefficient (MCC), and others. By utilizing Galois Field-based ECC, the system encrypts signal data, ensuring data integrity and preventing unauthorized modifications, which enhances sensitivity by providing accurate and reliable input for location classification.

Figures 9, 10 indicates that the performance of various methods in indoor positioning systems varies significantly, as reflected in their Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics.

Fig. 9
figure 9

Comparison of MAE and RMSE.

Fig. 10
figure 10

Comparative analysis (a) MAE (b) RMSE (c) accuracy (d) F1-score (e) execution time.

Comparison of Laska and Blankenbach Method Models Random Forest (RF), Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Temporal Convolutional Networks (TCN) is shown in Table 8 for that method. Laska & Blankenbach’s method has the lowest MAE of 1.10, and all others show values higher, with TCN performing best among them at 1.45. RMSE is measured in the same manner; Laska & Blankenbach’s method returns a performance of 1.70, while TCN has the least RMSE (2.35) from the rest of the alternatives.

Table 8 Comparative analysis based on Laska and Blankenbach’s method.

Laska and Blankenbach’s method demonstrates superior performance in indoor positioning with a Mean Absolute Error (MAE) of 1.10 and a Root Mean Squared Error (RMSE) of 1.70, significantly lower than traditional methods. It achieves an impressive accuracy of 98.5% and an F1-score of 0.95, indicating high reliability and precision.

The comparison emphasizes that the model suggested, based on ECC with Galois Field, has the minimum latency (5.2 ms) and maximum computational efficiency (92%) and is hence appropriate for real-time indoor positioning. KD-CNN with AES-256 performs moderately with 12.5 ms latency and 85% efficiency, whereas CNN with RSA-2048 exhibits much greater latency (18.3 ms) and reduced efficiency (78%) and hence is less suitable for real-time purposes. RCNN with normal ECC achieves an optimum trade-off of 9.7 ms latency and 88% efficiency, showing excellent security with acceptable performance. TCN with DES-3 has the highest latency (22.1 ms) and lowest efficiency (72%) and is not suitable for real-time implementation. The proposed approach surpasses others by guaranteeing negligible latency, maximum security, and efficiency in computation. Thus, the Comparison of Cryptographic Enhancements in Different Models for Real-Time Indoor Positioning is added in the following Table 9.

Table 9 Comparison of cryptographic enhancements in different models for real-time indoor positioning.

Conclusions

The proposed deep spatial–temporal attention network is a new hybrid model that is a combination of CNNs, vision transformers, attention mechanisms, and LSTM networks to capture spatial–temporal patterns for better location classification. The application of hybrid optimization also improves performance in multiple indoor environments. One of the major contributions of the work is the incorporation of the Galois Field-based Elliptic Curve Cryptography (ECC) with an S-box for data security during positioning. But the outcomes also demonstrated that system performance depends on the similarity between the training data and the test data, which means that more attention should be paid to data collection in the future. Experimental results on the WiFi RSS Fingerprint Localization Dataset show robust performance. When 70% of data was used the model had an accuracy of 0.9937, precision of 0.987, sensitivity of 0.9898, and specificity of 0.9878, while when 80% of data were used it had an accuracy of 0.9804, precision of 0.9722, sensitivity of 0.9859, and specificity of 0.9756. These outcomes prove that the proposed system is stable and flexible enough to be used in indoor positioning applications.