Background & Summary

Assessing an individual’s overall health and well-being can partially can be based on precise breath measurements and continuous respiratory rate monitoring1. The accurate evaluation of respiratory parameters is important in diagnosing and managing various medical conditions, facilitating early detection of respiratory distress or failure1. In particular, sudden changes in breathing patterns may serve as indicators for conditions such as pneumonia, asthma, or respiratory infections2,3. Also, it can be an important factor in describing the general condition of healthy individuals. Our study primarily targets individuals from the second group. In our research, we aim to develop algorithms that enhance human well-being by promoting proper breathing techniques. Automatic breath analysis methods can play a important role in this effort by providing real-time feedback to individuals about their breathing patterns during daily activities. This feedback allows individuals to make long-term improvements in their breathing habits, ultimately contributing to better health and overall well-being.

The respiratory rate is the number of breaths a person takes per minute. Typically it is within the range of 12 to 20 breaths for a normal adult at rest4. Deviations from this range can signify underlying health issues. This vital sign is commonly monitored using different methods5. Actual challenges in monitoring respiration lie in reconciling the precision and measuring unobtrusively for the end user. This is particularly important during long-term monitoring and in situations where the individual moves what can generate noise.

The primary objective of our research is to prepare dataset for building AI-powered methods dedicated to monitoring the respiratory system by detailed analysis of respiratory rate. This analysis aims to offer comprehensive insights into human breathing patterns with machine-learning-based support.

To ensure the effective training of numerical models of the respiratory rate made using eg.: neural networks, a large volume of data is required. Our contribution involves the construction of the datasets to streamline the process for fellow researchers interested in building solutions for respiratory monitoring. This research is primarily focused on monitoring respiratory rates, by classification of particular breathing phases, with a focus on diverse breathing patterns and their variability under different conditions.

It should be noticed that publicly available datasets for respiratory rhythm analysis contain information only on inhalations and exhalations phases, see eg. datasets: Peter Charlton, Youngjun’s research group, CapnoBase6,7,– 8. However, the proposed dataset go beyond this convention by including four distinct classes describing phases: inhalation, exhalation, post-inhalation retention, and post-exhalation retention. Firstly, the inclusion of post-inhalation and post-exhalation retention phases allows for a more comprehensive understanding of respiratory patterns. Secondly, the additional classes enhance the accuracy of the respiratory rate calculation. By distinguishing between different phases of the breathing cycle, proposed datasets enable more precise measurement of respiratory frequency. Additionally, our datasets include measurements done during different respiratory conditions that may be used for the classification of breathing types, cough detection, or apnea detection which is the subject of intensive research9,10,11,12,13. Moreover, the multi-modal data acquisition14, in our case from tensometers and accelerometers, allows for the development of methods for transfer labeling across data from different sources. By employing transfer labeling, annotated data from one device can be seamlessly transferred to another, facilitating the integration of diverse datasets. This approach fosters comprehensive analyses and a deeper understanding of respiratory abnormalities. As the signal from the sensors can contain artifacts we also add a special class in our data that describes areas in the signal that are related to noise and may be used for building filters.

Our dataset fills the gap where respiratory research results have not been published along with the data e.g.15. We meticulously designed the data to encompass a broad spectrum of scenarios related to breathing types and breathing rates. This approach facilitates the effective training of mathematical models, enabling to analyze and interpret complex breathing patterns with high accuracy and reliability.

The recordings of the breath series consist of four types of datasets: Labeled Tensometer Values, Labeled Accelerometer Values, Raw Tensometer Values, Raw Accelerometer Values.

It should be stressed that to maintain the integrity of the training and validation process, the training and testing datasets should be kept distinct, with no sharing of data between the two subsets. The stringent protocol ensures unbiased evaluation and robust validation of the mathematical model’s capabilities in analyzing and interpreting breathing patterns across various contexts.

The objective of our dataset series is to furnish researchers with an extensive resource, that may streamline the studies on respiratory analysis. These datasets aim to support the development and evaluation of robust models by presenting a varied collection of scenarios that encapsulate different breathing patterns. The goal is to form a basis for building models capable of accurately discerning breathing characteristics under diverse conditions. Our dataset establishes a foundation for researchers looking to refine the precision and efficacy of respiratory monitoring applications. Additionally, they contribute to a broader understanding of respiratory health by providing a comprehensive resource for advancing studies in this field.

Our data opens the way to research on precision and non-invasive respiratory rate classification by employing two modalities, opening up the possibility for adding data from new devices (eh. microphones) that do not significantly disturb the daily activities of the monitored individual. This approach leverages the strengths of multiple data sources to overcome the limitations of a single measurement modality prone to noise. By integrating data from different types of sensors, it becomes possible to achieve both high precision of respiratory measurements and maintain the comfort and unrestricted mobility of the monitored individual.

Methods

The datasets have been based on data recorded using devices: HX711 tensometer chip, iNode Nav accelerometer and Witmotion accelerometer. First, the data was recorded using the application that connects the sensors and stores the data in text files. Then, depending on the type of sensor we perform the data processing. The samples were taken from three subjects: a 23-year-old healthy and physically active male, and two 55-year-old healthy but non-active individuals (one male and one female). All participants signed the Consent Form to take part in the experiment and granted permission for the publication and processing of recordings of their breath patterns. The Ethics Commission of Gdańsk University of Technology has approved the experiment, granting permission under RN 2/2025.

The dataset contains additional noise that was intentionally introduced in the form of coughing, hyperventilation, and similar disturbances (details are given in Table 1. Despite this, the results achieved bt test models remained within satisfactory limits indicating their ability to overcome noise in the data. Furthermore, due to the normalization process applied to the data, these factors did not impact the overall quality, ensuring reliable outcomes for the study’s objectives.

Table 1 Ranges of recorded test dataset for tensometer and accelerometer.

Data collected from one participant has been divided into multiple files, each corresponding to a particular type of breathing pattern. For the other two participants, each has a 10-minute recording stored in separate files for the tensometer and accelerometer data, respectively. This approach allows for a structured organization of the dataset while maintaining its integrity for analysis. During the course of the study, a Human-in-the-Loop approach was used that involved iterative data acquisition. Similarly, data labeling required multiple reviews and corrections of misclassified data points.

Tensometer

The HX711 chip, an amplifier for strain gauge beam connected to the ESP32 microcontroller that allows to send the samples at a frequency of 10 Hz. It was attached to the ESP32 with rubber which is then mounted around the diaphragm. Data was sent to the dedicated mobile application. The values of recorded particular breath cycle depends on the size of the tester’s diaphragm. On the average size tester with 185 centimeters in height and 85 kilograms of weight, they were within the range of 500,000 – 900,000.

After gathering raw data, it was normalized to the range of (−1,1) within the window of 150 samples. The window of 150 was used, as the target application was using the Neural Network for classifying and displaying phases of the breath in the chart of size 150 on the x-axis.

The initial classification of respiratory rate was based on the method using a change in the sign of the derivative of the signal. The value of the derivative in the range of +/− 0.0079 indicates a retention breath. Values above 0.0079 indicates that there was a breath in, and values lower the -0.0079 mean that there was a breath out. After the process of classification filtering was run to remove strings shorter than 5. Data was saved to a temporary text file.

After the normalization and initial classification, the Gated Recurrent Unit16 the model was trained using datasets from the previous step.

The phase of the breath has been divided into four classes: the timespan related to retention after breath-out state was an assigned value of 1.0, and the breath-in state was an assigned value of 2.0. The breath-out state was denoted with the value of 0.0. The retention after breath-in state was marked with the value of 3.0. Each of the rows in the datasets consisted of 5 values denoting the measurement of 0.5 second (because 0.5 second of measurement using a 10Hz device, provides 5 samples in that time).

The last step was to evaluate classification manually using our dedicated tool, shown in Fig. 9.

Accelerometer

To enrich our study we divided the accelerometer dataset. One part was recorded using the WitMotion WT901BLECL accelerometer, and the other with the smaller iNode Nav accelerometer. By comparing these two devices, we aim to better understand how sensor characteristics affect the utility of the recorded data.

WitMotion WT901BLECL

The WitMotion WT901BLECL sensor is a Bluetooth 5.0 device equipped with a range of sensors that enable the measurement of acceleration, angular velocity, angle, and magnetic field. It utilizes Bluetooth 5.0 for data transmission, sending up to 20 bytes of data. Thanks to its durable casing and compact size, it is perfectly suited for industrial applications such as machine condition monitoring and predictive maintenance. The device can be configured for various applications, allowing data interpretation using algorithms and Kalman filtering.

In the experiment, the data was based on readings obtained from the tilt angles (Pitch, Roll, and Yaw), allowing for precise determination of the device’s orientation using the 10 Hz frequency. The 10 Hz frequency is the default setting, as it provides a balanced compromise between data accuracy and transmission speed. For more accurate measurements, users can select from a range of frequencies, including 0.1 Hz, 0.5 Hz, 1 Hz, 2 Hz, 5 Hz, 10 Hz (default), 20 Hz, 50 Hz, 100 Hz, and 200 Hz.

iNode Nav

The iNode Nav accelerometer provides a portion of the data used in our study, relying on the CSR 101x chip for data acquisition. This chip allows for gathering 14-bit motion sensor using 3-axis data sampled with the frequency of [1.5625Hz, 3.25Hz, 6.25Hz, 25Hz, 50Hz].

The 25 Hz frequency has been chosen as while testing it had the best accuracy to the speed of sending samples. During the tests, a 50 Hz frequency exhibited excessive noise, while lower frequencies proved to be too slow, as the accelerometer should ideally match the speed of the tensometer. Additionally, using a frequency of 6.25 Hz or lower would have resulted in capturing only three or fewer samples for prediction, which is insufficient information for the model to predict accurately. The device was attached to the body using tape with the x-axis heading to the ceiling.

Data gathered from the accelerometer consisted of 3 axis values each coded as a 14-bit signed value from the range of (−1, 1). The process of normalization, initial classification, Gated Recurrent Unit Model training, and manual classification was the same as the tensometer one and can be summarised as follows:

  1. 1.

    Raw data from the recorded files, were normalized to the range of (−1,1) in the sliding window of: 375 samples which is the equivalent of 15 seconds of measure in a 25 Hz frequency for iNode Nav accelerometer and 150 samples which is the equivalent of 15 seconds of measure in a 10 Hz frequency for WitMotion WT901BLECL sensor.

  2. 2.

    The derivative of the function was used to pre-label the data (see Section Tensometer).

  3. 3.

    Manual review and correction of erroneous labels has been performed for obtaining the final labels of data using the dedicated application, shown in Fig. 9, that displays the time series of the data painted with the color corresponding to the particular class. The application allows the end user to change assignments by selecting a color for a particular time range. Our procedure involved one person labeling the data, while a second person validated it to ensure accuracy.

  4. 4.

    Using the Human-in-the-Loop approach where additional data are iteratively acquired, reviewed, and labeled with human expertise. This method focuses on identifying misclassified segments of the signal and providing extra data for those areas. By concentrating data acquisition on difficult-to-process segments, the dataset is expanded with more descriptive examples, enhancing the model’s ability to classify challenging data. As a result, the model can achieve higher precision.

Data Records

The datasets described in this paper are availalble for download from Gdansk Tech repository Most Wiedzy as a one single file17. The archive file Breath_dataset_and_apps.zip contains TWO subfolders: code and data. The data directory contains subfolders for each sensor, including two subdirectories for labeled and raw data. Each dataset is divided into TXT files, each containing specific types of respiratory patterns like hyperventilation, slow breathing, breath-holding, and others (described further). These recordings can be used to train the machine learning models to classify respiratory rate during different breathing patterns. In the folders, there are also files sensor_second_subject and sensor_third_subject containing data recorded from two additional persons. The prefix sensor in the file names denotes the method of recording: (acc) accelerometer or (tens) tensometer.

In each raw data TXT file, each line corresponds to one value standardized to fall within the range of -1 to 1, making it consistent and straightforward to compare across different datasets and elapsed time in seconds from the beginning of the measurement separated by the comma. This normalization ensures that the data is ready to be used in various analytical tools without additional preprocessing.

In each labeled tensometer values dataset, each row contains a standardized values, along with indications of breath type and elapsed time in seconds from the beginning of the measurement separated by commas. Similarly, in the accelerometer dataset, the same data format is present, with each row containing a standardized value with an indication of breath type and elapsed time from the beginning. Notably, a value of 1.0 denotes retention after breath out, 0.0 signifies inhalation, 2.0 represents exhalation, and 3.0 represents retention after breath in both datasets. As the sensor does not show 100% accuracy, the label 999.0 has been added. It means that the data portion has been contaminated by too much noise and should not be used to train models. However, this label could be used for building a model for noise filtering.

The tensometer dataset represents 70 minutes of recorded data, while the iNode accelerometer dataset covers 45 minutes and the WitMotion accelerometer dataset spans 25 minutes. Data collection for the tensometer and WitMotion accelerometer was conducted at 10Hz, whereas for the iNode accelerometer, it was at 25Hz.

Datasets comprise various breath types, each indicative of distinct respiratory patterns as seen in Table 1 (test dataset) and Table 2 (training dataset):

  1. 1.

    Normal Breath: Characterized by the subject’s typical, unaltered breathing pattern.

  2. 2.

    Shallow Breath: The subject engages in breathing characterized by abbreviated inhalations and exhalations.

  3. 3.

    Inhale and Stop: The subject inhales, halts at approximately 50% lung capacity for a brief period resumes inhalation to maximum lung capacity, and subsequently exhales naturally.

  4. 4.

    Exhale and Stop: Following inhalation, the subject exhales to around 50% lung capacity pauses briefly, and then continues with a natural exhalation.

  5. 5.

    Slow Breath: The subject executes inhalations and exhalations at an extended pace, each lasting more than 5 seconds.

  6. 6.

    Hyperventilation: The subject exhibits rapid, shallow breathing characterized by swift inhalations and exhalations.

  7. 7.

    Cough: The subject coughs in a manner consistent with natural flu-induced coughing.

  8. 8.

    Inhale and Pause: After inhaling to maximum lung capacity, the subject momentarily halts.

  9. 9.

    Exhale and Pause: Following exhalation until minimal air remains in the lungs, the subject pauses momentarily.

  10. 10.

    Yellow: In the additional dataset recorded by the tensometer sensor, we provide 10 minutes of retention after breathing. This dataset was included as initially, this class has an insufficient number of these specific samples which we observed during technical validation.

Table 2 Different breath patterns in training dataset and their location in the dataset.

Raw Tensometer Values

dataset contains raw tensometer data, providing a comprehensive collection of unprocessed measurements. These raw values represent the intrinsic output from the tensometer without any specific categorization or classification. This raw data is valuable for various analytical purposes, allowing researchers and practitioners to explore and interpret the fundamental characteristics of tensometer measurements without any labels.

Raw Accelerometer Values

dataset contains raw accelerometer data, providing a collection of unprocessed measurements. These raw values represent the intrinsic output from the accelerometer without any specific categorization or classification. This raw data is valuable for various analytical purposes, allowing researchers and practitioners to explore and interpret the fundamental characteristics of accelerometer measurements without any labels.

Labeled Tensometer Values

includes values that have been associated with specific breath types. In this dataset, each value is labeled or categorized based on the corresponding breath-related characteristic it represents. These labeled values serve as information for training machine learning models especially neural networks due to the size of the data. The presence of labeled data allows for supervised learning approaches, enabling algorithms to learn and detect patterns associated with different breath types. The size of this dataset consists of 39,963 lines, each containing the standardized value, an indication of breath type, and time elapsed from the beginning.

Labeled Accelerometer Values

includes values that have been associated with specific breath types. In this labeled dataset, each value is labeled or categorized based on the corresponding breath-related characteristic it represents. These labeled values serve as a crucial resource for training neural networks and other machine learning models. The presence of labeled data allows for supervised learning approaches, enabling algorithms to learn and recognize patterns associated with different breath types. The size of this dataset consists of 79,689 lines, each containing standardized value, an indication of breath type, and time elapsed from the beginning.

Technical Validation

All labeled samples in the datasets have been manually tested, and the wrong labels have been corrected. Special attention to data validation accuracy was applied to the test set, which includes 60 seconds of normal deep breathing, 30 seconds of shallow breathing, 30 seconds of inhale and stop, 30 seconds of exhale and stop, 30 seconds of very slow breathing, 30 seconds of hyperventilation, 30 seconds of coughing, 30 seconds of inhale and pause (which means that the subject was breathing in, then paused in the middle of the breathing, then continued), 30 seconds of exhale and pause (which means that subject was breathing out then paused in the middle of the breathing, then continued). The number of samples for each respiratory type in the test set is shown in Tables 1 Different respiratory types in training sets are described by the name of the file as shown in Table 2.

All datasets have been divided into appropriate sequences (5 for the tensometer and WitMotion accelerometer, 11 for the iNode accelerometer). This means that, for example, for the tensometer, 5 samples were taken, and a label was assigned to the first sample of such a sequence. Then, a Gated Recurrent Unit18 model was trained with 200 epochs and a batch size of 500. The model was tested on a prepared test set it had never seen before.

After each model training session, predicted values returned by the model for the test set were manually reviewed. This was done using an interactive chart, where the top part of the screen displayed a line chart of the predicted set, and the bottom part showed a chart of the test set as seen in Figs. 1, 2, 3, 4, 5, 6. Points were colored for easier analysis. Using these charts, it is easy to determine the moments at which the model has problems to correctly classify the particular respiratory state. For an example trained model, significant deficiencies are evident in the case of short apneas.

Fig. 1
figure 1

Tensometer sensor. The upper graph shows predicted data by runtime model (89.49% accuracy), lower the actual data for the test labeled dataset.

Fig. 2
figure 2

Tensometer sensor. The upper graph shows predicted data by offline model (90.19% accuracy), lower the actual data for the test labeled dataset.

Fig. 3
figure 3

iNode sensor. The upper graph shows predicted data by runtime model (77.53% accuracy), lower the actual data for the test labeled dataset.

Fig. 4
figure 4

iNode sensor. The upper graph shows predicted data by offline model (85.34% accuracy), lower the actual data for the test labeled dataset.

Fig. 5
figure 5

WitMotion sensor. The upper graph shows predicted data by runtime model (79.90% accuracy), lower the actual data for the test labeled dataset.

Fig. 6
figure 6

WitMotion sensor. The upper graph shows predicted data by offline model (80.85% accuracy), lower the actual data for the test labeled dataset.

In this way, moments when the model made errors were identified, and additional samples with such states were recorded. Labeling of these samples has been using the same methodology as described above.

The classifier used for data validation consisted of four labels. The higher the number of labels in the predicted data that matched the actual values from the manually tagged test set, the higher the score obtained by the model. The results achieved were 89.49% accuracy for the tensometer model, 77.53% accuracy for the iNode accelerometer model and 79.90% for the WitMotion accelerometer model. These percentages reflect the performance of a runtime model operating in real-time with a very limited context. However, when using an offline model with an expanded context, the accuracy improves to 90.19% for the tensometer, 85.34% for the iNode accelerometer and 80.85% for WitMotion accelerometer. It should be noted that offline processing is a significant simplification compared to runtime processing, however, it is sufficient for testing the dataset as shows the ability to train machine learning models on this data.

The runtime model, as its name suggests, must operate in real-time. Consequently, the model is constrained by a specific time window, which limits its ability to see only past samples. In contrast, the offline model does not have these limitations. It is designed to utilize a much broader time window, using not only past samples but also future events. This model operates based on a context three times larger than that of the runtime model (i.e., 15 samples for the tensometer and 33 samples for the accelerometer). The label for such a dataset is determined based on the central sample within the selected sequence. This approach allows the model to learn from data where the current state is preceded by past samples and followed by future samples.

For the data from the tensometer, the accuracy of the model was tested in successive epochs. Analyzing these data, it can be concluded that the GRU model after only 5 epochs achieves high efficiency (Fig. 7). In this chart, “accuracy” denotes the effectiveness of the model within the training data that was divided into the validation set. In turn, “val_accuracy” denotes the effectiveness of the model on the actual test set.

Fig. 7
figure 7

Validation accuracy for 200 epochs for GRU tensometer model.

During further tests on the training set from the tensometer, the effectiveness of the GRU model was tested on the dependence of different number of epochs (Fig. 8). It can be seen that increasing the number of training epochs do not significantly affect the model’s quality and in each case, the model achieves an efficiency of 89% on average.

Fig. 8
figure 8

Validation accuracy when the GRU tensometer model was trained with a certain number of epochs.

The project aims to develop an application that, based on raw data, will return tagged results depending on the breathing state. If sensors are not available, one can visualize the data with a graph and assess how the model performs manually.

Fig. 9
figure 9

Interface of a dedicated application for labeling data.

Fig. 10
figure 10

Interface of mobile application.

By utilizing both sensors simultaneously, the data obtained in this manner are parallel. That is, for a specified time range, the measurement results from the tensometer and accelerometer should be close to each other. The tensometer provides much more qualitative results in a short time, whereas considerable effort is required for the accelerometer. To expedite this process, it is worthwhile to employ transfer labeling, meaning to utilize the labels from the tensometer for the accelerometer data, which is possible due to the parallel nature of the data.

In the future, it would be beneficial to utilize additional sensors to provide additional modalities from other sensors such as a microphone/spirometer that can extend the dataset and in consequence lead to more precise respiratory rhythm classification in noisy environments.

Usage Notes

Special focus should be given to the fact that each sensor while operates in real-time mode imposes limitations on the amount of samples that can be delivered to the model. This is most often due to the speed of data transmission from the tensor itself or the noise it introduces. The proposed solution assumes that the end user will only see a 0.5-second delay. For example, if the device operates at 10Hz, a tagged data row of size 5 should be provided, which is equivalent to a half-second measurement.

Another issue is the size of the data samples being examined. In such cases, it is best to apply normalization to raw data, which will create a universal solution.

It is also worth noting the presence of signal noise generated by highly sensitive sensors, such as accelerometers. In such cases, various types of filters should be applied, such as the Savitzky-Golay filter or the median filter. Based on the labels marked as 999, custom filters can also be created to improve the quality of data directly from the sensors.