Deep learning network for NMR spectra reconstruction in time-frequency domain and quality assessment

Luo, Yao; Chen, Wenhan; Su, Zhenhua; Shi, Xiaoqi; Luo, Jie; Qu, Xiaobo; Chen, Zhong; Lin, Yanqin

doi:10.1038/s41467-025-57721-w

Download PDF

Article
Open access
Published: 08 March 2025

Deep learning network for NMR spectra reconstruction in time-frequency domain and quality assessment

Yao Luo ORCID: orcid.org/0009-0005-5719-3813^1,2,3,
Wenhan Chen^1,2,3,
Zhenhua Su^1,2,3,
Xiaoqi Shi^1,2,3,
Jie Luo^1,2,3,
Xiaobo Qu ORCID: orcid.org/0000-0002-8675-5820^1,2,3,
Zhong Chen ORCID: orcid.org/0000-0002-1473-2224^1,2,3 &
…
Yanqin Lin ORCID: orcid.org/0000-0003-0596-0740^1,2,3

Nature Communications volume 16, Article number: 2342 (2025) Cite this article

8059 Accesses
15 Altmetric
Metrics details

Subjects

Abstract

High-quality nuclear magnetic resonance (NMR) spectra can be rapidly acquired by combining non-uniform sampling techniques (NUS) with reconstruction algorithms. However, current deep learning (DL) based reconstruction methods focus only on single-domain reconstruction (time or frequency domain), leading to drawbacks like peak loss and artifact peaks and ultimately failing to achieve optimal performance. Moreover, the lack of fully sampled spectra makes it difficult, even impossible, to determine the quality of reconstructed spectra, presenting challenges in the practical applications of NUS. In this study, a joint time-frequency domain deep learning network, referred to as JTF-Net, is proposed. It effectively combines time domain and frequency domain features, exhibiting better reconstruction performance on protein spectra across various dimensions compared to traditional algorithms and single-domain DL methods. In addition, the reference-free quality assessment metric, denoted as REconstruction QUalIty assuRancE Ratio (REQUIRER), is proposed base on an established quality space in the field of NMR spectral reconstruction. The metric is capable of evaluating the quality of reconstructed NMR spectra without the fully sampled spectra, making it more suitable for practical applications.

Artificial intelligence enhanced two-dimensional nanoscale nuclear magnetic resonance spectroscopy

Article Open access 16 September 2020

Multidimensional terahertz probes of quantum materials

Article Open access 10 February 2025

AI-designed NMR spectroscopy RF pulses for fast acquisition at high and ultra-high magnetic fields

Article Open access 12 July 2023

Introduction

Exploring the structure, function, and interactions of proteins is crucial for understanding the molecular basis of diseases, thereby facilitating the development of precise and effective therapeutic interventions. NMR, X-ray crystallography, and cryo-electron microscopy (cryo-EM), as powerful techniques, are commonly applied in protein research. Compared to X-ray crystallography and cryo-EM techniques, the multidimensional (nD) NMR technique offers the unique advantage of providing structure, dynamic, and interaction information of proteins in solution at physiological conditions, without disrupting or modifying endogenous conformations^1,2. Unfortunately, obtaining nD NMR spectra is usually a time-consuming process. The sampling time of nD NMR spectra grows exponentially with the increase of the number of indirect dimensions. Non-uniform sampling (NUS) technology greatly shortens the sampling time by undersampling the indirect dimensions of the NMR spectrum and is widely applied in the rapid acquisition of NMR spectra. The acquired NUS data requires reconstruction algorithms to obtain high-quality NMR spectra. Therefore, developing robust reconstruction algorithms becomes imperative. For example, Orekhov et al.^3,4,5 demonstrated multi-dimensional decomposition (MDD) and compressed sensing (CS) as effective tools for reconstructing the NUS data. Other traditional reconstruction algorithms, such as SMILE⁶, and hmsIST⁷ also be proposed. However, these traditional algorithms require manual adjustments of key parameters, resulting in suboptimal performance. In addition, some traditional algorithms (like MDD) have long computation times. Fortunately, deep learning (DL) has witnessed successful applications in various fields like image processing^8,9 and speech recognition^10,11. Similarly, DL has also achieved great success in the field of NMR¹². For instance, Li et al.¹³ introduced DEEP Picker, which accomplishes peak picking in complicated NMR spectra. Schmid et al.¹⁴ also introduced a DL-based approach to accomplish peak picking in 1D NMR spectra. Klukowski et al.¹⁵ proposed ARTINA which enables the completely automated analysis of protein NMR data within hours after completing the measurements. Wu et al.¹⁶ proposed DeNoising Unet (DN-Unet) to suppress noise in NMR spectra. Karunanithy et al.¹⁷ demonstrated that by processing NMR spectra of protonated, uniformly ¹³C-labelled samples with FID-Net, the resulting spectra are comparable to typical deuterated methyl-TROSY spectra. In addition, DL has also made progress in the NUS reconstruction of NMR spectra. For example, frequency domain reconstruction networks based on DenseNet¹⁸ and HRN¹⁹ have been proposed by Qu et al.²⁰ and Luo et al.²¹, respectively. In addition, Hansen et al.^22,23 introduced an LSTM-based reconstruction network²⁴ and WaveNet-based FID-Net²⁵ for the time domain reconstruction of NMR spectra. However, these DL-based reconstruction methods focus only on single domains (either time domain or frequency domain) with each having its advantages and challenges. Specifically, the advantage of the frequency-domain reconstruction method is that it can use the sparsity of the NMR spectra to achieve high-quality reconstruction even with few sampling points. The shortcoming is that the artifacts generated from high-amplitude peaks may even appear stronger than the weak peaks, which leads to the occurrence of peak missing and artifacts in the reconstructed spectra. Time-domain reconstruction directly processes sampled time-domain data, avoiding the interference of under-sampled artifacts caused by strong peaks on the reconstruction of weak peaks, and thus benefiting effective reconstruction. However, time domain reconstruction predicts the unsampled data based on the sampled points, requiring a sufficient number of sampled points to ensure reconstruction accuracy. When there are few sampled points, the effectiveness of time domain reconstruction is seldom as high as frequency domain reconstruction. Therefore, the combination of information from both domains has the potential to produce better results.

In addition, a long-standing and important issue has remained unresolved. This issue is that users are unable to assess the quality of the reconstructed spectra without full-sampled spectra, which are not acquired in practical applications. Commonly employed indicators for quality assessment in the studies of algorithm development include the pearson correlation coefficient (PCC), root mean square deviation (RMSD), relative L2 norm error (RLNE), and so on. These indicators all require the fully sampled spectrum as a reference. However, when the NUS technique is needed to shorten the experimental time, it always means that fully sampled spectra are difficult and often impossible to obtain. Therefore, full-reference assessment is impractical in real-world applications. Without quality metrics, the algorithm user worries: Can I trust this reconstructed result?

In this paper, a joint time-frequency domain deep learning network called JTF-Net is proposed. Overcoming the limitations of single-domain DL reconstruction, JTF-Net shows excellent performance in the reconstruction of nD protein NMR spectra. More importantly, the referenced-free quality assessment metric for NMR reconstructed spectra is proposed in this paper, denoted as the reconstruction quality assurance ratio (REQUIRER). Using simulated spectra, a quality space that reflects the correlation between uncertainties and the RLNE of the reconstructed spectra is constructed. And, the quality threshold, defined by a specific RLNE value to assess potential peak loss or artifacts in the reconstructed spectra, was determined through testing on simulated spectra. The quality thresholds were set as 0.35 for 2D spectra and 0.55 for 3D spectra. Based on this quality space and quality thresholds, the REQUIRER metric can be obtained after completing the reconstruction of real experimental spectra, providing users of the JTF-Net with the quality assessment result of the reconstructed spectra without the fully sampled spectra.

Results and discussion

Comparison with current reconstruction algorithms

In this section, JTF-Net performance is demonstrated by reconstructing the 2D ¹⁵N-¹H HSQC (Heteronuclear Single Quantum Coherence) spectra of GB1 (601 × 170) and T4L L99A protein²² (335 × 256), with system sizes are 56 and 172, respectively. The reconstructed results were compared with traditional algorithms SMILE⁶ and hmsIST⁷, and DL reconstruction algorithms FID-Net²³ (time domain) and EDHRN²¹ (frequency domain). All spectra were undersampled with the 12.5% Poisson-Gap sampling schemes. Figure 1 shows the results of the 2D ¹⁵N-¹H HSQC spectrum of protein GB1. The reconstructed spectra from all methods show no peak loss and artifact peaks, and the reconstruction spectrum by JTF-Net has the lowest RLNE, which means that the reconstructed spectrum is closest to the real fully sampled spectra. The reconstruction of protein T4L L99A is challenging because of the large number of spectral peaks and the presence of some peaks with low intensity. Figure 2 shows the results of the 2D ¹⁵N-¹H HSQC spectra of protein T4L L99A, and the red and black boxes represent artifact peaks and peak loss respectively. There are numerous peak losses when employing single-domain DL algorithms and SMILE, while hmsIST does better, with only a few peak losses. In terms of artifact peaks, both the traditional and single-domain DL algorithms show a few. Notably, JTF-Net achieves the lowest RLNE value without any peak loss and artifact peaks. It is noted that, although spectra of the two samples have different sizes and sampling schemes, JTF-Net does not require re-training and could still achieves high-quality reconstruction with one trained model, reflecting its universality.

**Fig. 1: 2D ¹⁵N-¹H HSQC (Heteronuclear Single Quantum Coherence) spectra of protein GB1.**

**Fig. 2: 2D ¹⁵N-¹H HSQC (Heteronuclear Single Quantum Coherence) spectra of protein T4L L99A.**

To demonstrate the robustness of JTF-Net on different sampling rates, JTF-Net was used to reconstruct the HSQC spectra of T4L L99A protein, which were undersampled by Poisson-Gap sampling schemes with sampling rates of 10%, 12.5%, 15%, 17.5%, and 20%. Each sampling rate includes 10 different sampling schemes. Figure 3 shows that JTF-Net can consistently achieve the lowest RLNEs across different sampling rates. The hmsIST method also performs well but there is still a significant gap between hmsIST and JTF-Net at sampling rates of 10% and 12.5%. However, this gap decreases as the sampling rates increase. In addition, compared with EDHRN, SMILE, and FID-Net, JTF-Net always has a greater advantage in the RLNE indicator. Thus, JTF-Net consistently provides stable and superior performance across different sampling rates, showcasing further its universality.

**Fig. 3: Comparison of RLNE (relative L2 norm error) among different methods at different sampling rates.**

In addition, JTF-Net can also be applied to reconstructing 3D spectra. The HNCO spectrum of protein Azurin with the system size of 128, which was undersampled using an 8% Poisson-Gap sampling scheme, was used to demonstrate the performance of JTF-Net. As shown in Fig. 4, the reconstructed spectra from all methods (FID-Net cannot reconstruct 3D spectra) show no peak loss and artifact peaks, but JTF-Net achieves the best RLNE indicator.

**Fig. 4: 3D HNCO spectra of protein Azurin and the CN plane projections of HNCO.**

In the above 2D and 3D spectral reconstruction, JTF-Net achieves the optimal RLNE indicator. Especially in comparison to single-domain reconstruction algorithms like FID-Net and EDHRN, JTF-Net demonstrates a clear advantage, which arises from its combination of time and frequency domain information during the reconstruction process. Data consistency ensures the efficient utilization of the sampled information, preventing excessive alterations to the sampled data during the reconstruction process. In addition, compared with traditional methods SMILE and hmsIST, JTF-Net is fully automatic and does not need to manually set key parameters. In terms of reconstruction time, JTF-Net achieves the shortest single reconstruction times for both the 2D ¹⁵N-¹H HSQC spectra of the GB1 and T4L L99A proteins and the 3D HNCO spectra of the Azurin protein among these methods, with details available in Supplementary Part 9. Data preprocessing, such as apodization functions, baseline correction, zero-filling, and phase correction, can affect the reconstruction results, as provided in Supplementary Part 10. Based on the testing results in SI, it is recommended to perform phase correction, apodization function, and baseline correction on the data before reconstruction. For the apodization functions, the EM window provided the best results on the HSQC spectrum of the GB1 protein. However, this does not mean that the EM window is the best for all data. It is advised to first try the EM window for reconstruction. Additionally, if the number of points in the indirect dimension is not divisible by 8, zero-filling to the close integer that is a multiple of 8 is recommended. The number of residues also affects reconstruction quality. Test results show that as the number of residues increases, the reconstruction quality of JTF-Net decreases, but REQUIRER can still provide good assessments of the quality of the reconstructed spectra. In addition, the reconstruction of spectra of intrinsically disordered proteins and spectra with strong peak overlap are relatively difficult for JTF-Net. However, the test results show that JTF-Net remains effective in these two cases. Further details are available in Supplementary Part 11.

Reference-free quality assessment of reconstruction spectra by REQUIRER

To validate the feasibility of the REQUIRER metric, it is tested on many 2D and 3D protein spectra. It is expected that the spectrum with high SNR can be reconstructed with high quality and low RLNE, and low SNR with low quality and high RLNE. Here, Gaussian noises with different levels were artificially added to the fully sampled spectra, producing a series of spectra with different SNRs. For 2D data, the ¹⁵N-¹H HSQC spectra of the GB1 and T4L L99A proteins were used for testing. The evaluation result using the REQUIRER metric for the GB1 protein at different SNR levels is shown in Fig. 5. The REQUIRER values of (a–d) are consistently high (above 79%), indicating a high likelihood that the RLNE indicator will be below 0.35 (2D quality threshold). Indeed, the actual RLNE indicators for these spectra are all below 0.35. There is one artifact peak in Fig. 5e, the REQUIRER value is lower than (a–d), and its actual RLNE is larger than (a–d). The reconstruction result (f) shows several artifact peaks, and the REQUIRER of (f) is lower than 50% with the actual RLNE indicator of (f) above 0.35. The T4L L99A protein was also used for validation of REQUIRER under different SNR levels, the results are shown in Figure S10 in Supplementary Part 12, and REQUIRER consistently provided good assessments of the quality of the reconstructed spectra.

**Fig. 5: The reconstructed 2D ¹⁵N-¹H HSQC (Heteronuclear Single Quantum Coherence) spectra of protein GB1 with different signal-to-noise ratios (SNR) by JTF-Net.**

In addition, it is expected that a higher sampling rate generally leads to higher reconstruction quality and lower RLNEs, and a lower sampling rate generally leads to lower reconstruction quality and higher RLNEs. The fully sampled ¹⁵N-¹H HSQC spectra of the GB1 and T4L L99A proteins were undersampled using different sampling schemes with different sampling rates (3%–20%). As shown in Fig. 6, the reconstructed spectra at sampling rates of 20%, 15%, and 12.5% exhibit no peak loss and artifact peaks. The REQUIRER values for these reconstructed spectra of protein T4L L99A are remarkably high (REQUIRER > 91%), and the actual RLNE values are significantly below the quality threshold. When the sampling rate is reduced to 8%, the reconstructed spectrum exhibited few peak loss and artifact peaks, and the REQUIRER metric is lower than (a–c). The actual RLNE value is larger than the quality threshold. As the sampling rate further decreases, the reconstruction results in Fig. 6e–f exhibit noticeable peak loss and artifact peaks compared to the fully sampled spectrum. The REQUIRER of (e–f) also significantly decreases compared to (d). Indeed, the actual RLNE indicators of (e–f) are all above 0.35 (2D quality threshold). In Supplementary Part 13, REQUIRER also provides a good quality assessment of the reconstruction results for the GB1 protein at different sampling rates.

**Fig. 6: The reconstructed 2D ¹⁵N-¹H HSQC (Heteronuclear Single Quantum Coherence) spectra of protein T4L L99A with different sampling rates (SR) by JTF-Net.**

Additionally, the 3D HNCO spectrum (732 × 60 × 60) of the Azurin protein was also used for testing. Artificial noise was added to the spectra, gradually reducing the SNR of fully sampled spectra, and the sampling scheme used an 8% Poisson-Gap. The evaluation results are shown in Fig. 7. The REQUIRER values of (a–c) are consistently high (above 82%), indicating a high likelihood that the RLNE indicator will be below 0.55 (3D quality threshold), and the actual RLNE values are lower than the 3D quality threshold. The REQUIRER metrics of (d,e) are lower than (a–c), and the actual RLNEs of (d,e) are larger than (a–c). When the reconstruction quality is poor (Fig. 7f), REQUIRER is a small value.

**Fig. 7: The projections on ¹³C-¹⁵N planes of the 3D HNCO spectra of protein Azurin reconstructed by JTF-Net with different signal-to-noise ratios (SNR).**

To further validate the universality of the REQUIRER, four 2D spectra and nine 3D spectra from the Biological Magnetic Resonance Bank (BMRB) were also tested. Different reconstruction methods were applied to these different types of spectra, and the results are displayed in Table 1. The test results show that JTF-Net achieves the optimal RLNE indicator for all spectra, showing a clear advantage over other methods. In addition, SNR has a great effect on the reconstruction results. For the HNCO spectra of At1g24000.1, A3DK08, and O64736 with high SNR, high reconstruction qualities were achieved. In contrast, the HNCO spectrum of Probable 30S Ribosomal protein showed the poor reconstruction result due to its low SNR. Similarly, the HNCACB spectrum of thiamine triphosphatase showed better reconstruction quality than BT_p548217 due to its higher SNR.

Table 1 Comparison of reconstruction indicators of different reconstruction methods

Full size table

In terms of REQUIRER, the reconstruction result of the HMQC spectrum of protein Yqal has a value of REQUIRER that is not very high, with its actual RLNE indicator slightly higher than the quality threshold. The reconstructed HSQC spectrum of BH09830 has a low REQUIRER value, with the actual RLNE indicator significantly exceeding the quality threshold. For 3D spectra, the reconstruction results of the HNCO spectra of proteins At1g24000.1, A3DK08, and O64736 have high REQUIRER values, and their actual RLNE indicators are all smaller than the quality threshold, while the reconstruction result of the HNCO spectrum of the probable 30S ribosomal protein has a low REQUIRER value, with its actual RLNE value higher than the quality threshold. The reconstruction results of the CBCA(CO)NH spectrum of protein YorP, the HNCACB spectrum of thiamine triphosphatase, and the HBHA(CO)NH spectrum of protein Ykvr have high REQUIRER values, and their actual RLNE indicators are smaller than the quality threshold, while the reconstruction result of the HNCACB spectrum of the BT_p548217 protein has a low REQUIRER value, with its actual RLNE value higher than the quality threshold. The reconstructed spectra are shown in Supplementary Part 14. Although JTF-Net achieved the best performance in the data presented here, it can’t test all NMR data. Therefore, we cannot guarantee that it will achieve the best results on every dataset. In addition, based on testing results (Supplementary Part 15), it is suggested to use JTF-Net on 2D data with a sampling rate of 10%–50% and 48–2048 points in the indirect dimension, and on 3D data with a sampling rate of 5%–70% and 24 × 24 − 256 × 256 points in the indirect dimensions, with Poisson-Gap sampling schemes for both 2D and 3D spectra.

A higher REQUIRER value indicates a greater likelihood that the reconstructed spectra will have an actual RLNE lower than the quality threshold, suggesting good quality of reconstruction. Conversely, a lower REQUIRER value indicates a higher probability that the actual RLNE will exceed the quality threshold, suggesting poor quality in the reconstructed spectra. Although REQUIRER has proven to be feasible, it is not without its flaws. For example, a 99% REQUIRER still leaves a 1% chance of poor quality, and an 18% REQUIRER means that while 82% might exceed the quality threshold, 18% could still be good. For example, although the actual RLNE for the HNCOCA spectrum of the E. coli YfgJ protein in Table 1 is below the quality threshold, its REQUIRER is not very high. Additionally, for NOESY-related spectra, the loss of their small peaks does not lead to significant changes in the RLNE metric, and thus REQUIRER is unsuitable in this case. Thus, REQUIRER must be used judiciously. Even with the flaws, the overall effectiveness and reliability of REQUIRER are evident. When full-sample spectra are not available, REQUIRER serves as the only metric capable of evaluating the quality of reconstructed spectra.

Methods

Uncertainty

Compared to non-Bayesian convolutional neural networks (BNNs) that can only perform regression and classification tasks, BNNs also provide uncertainty that includes epistemic uncertainty and aleatoric uncertainty, aiding in better evaluating the reliability of predictive outcomes. Epistemic uncertainty mainly arises from a lack of knowledge or incomplete information. For the NUS reconstruction of NMR spectra, if the NUS data deviates from the training set, the model may not reconstruct the NUS spectra accurately, leading to higher epistemic uncertainty. Meanwhile, aleatoric uncertainty refers to uncertainty that arises from the data itself. In the case of NUS NMR spectra, aleatoric uncertainty primarily arises from unsampled points. One common method to implement BNNs is using dropout which is considered as a Bayesian approximation^26,27. Therefore, dropout is incorporated into JTF-Net to obtain uncertainty. During the training process, a certain proportion of neurons are randomly dropped out at each layer. Dropout is kept active during multiple test processes for the same input data. Each test process randomly drops different neurons, generating a series of reconstructed results. The variance of these reconstructed results is epistemic uncertainty, which measures the uncertainty of the trained model respective to the input data. And aleatoric uncertainty is caused by factors related to the input data. To accurately predict the aleatoric uncertainty of input data, a branch is added to JTF-Net, as shown in Fig. 8(a), and trained with a loss function including aleatoric uncertainty σ expressed as²⁶:

$${L}_{{{{\rm{ale}}}}}=\frac{1}{2\sigma {(x)}^{2}}{\Vert y-{f}^{{{{\rm{W}}}}}(x)\Vert }^{2}+\frac{1}{2}\,\log \sigma {(x)}^{2},$$

(1)

where σ(x) denotes the aleatoric uncertainty for input x, f^W denotes the trained model with the parameters W, and y represents the groundtruth of the reconstructed spectrum.

**Fig. 8: JTF-Net and REQUIRER workflow.**

Network Structure

JTF-Net (Fig. 8a) consists of the time domain reconstruction modules (t-modules) and the frequency domain reconstruction modules (f-modules). Both the t-modules and f-modules in JTF-Net employ the Encoding and Decoding (ED) structure. The key difference is that the t-modules use dilated convolutions to capture long-term dependencies in time domain reconstruction, and the f-modules use standard convolutions. The details of the structure of ED and convolution kernel parameters are provided in Supplementary Part 1. The optimal numbers of t-modules and f-modules are 8 for the 1D JTF-Net model (used for reconstructed 2D spectra) and 14 for the 2D JTF-Net model (used for reconstructed 3D spectra), with the results shown in Supplementary Part 5. The undersampled FID and its frequency-domain spectrum (after Fourier transformation) are input into the t-modules and f-modules, respectively. After completing the reconstruction in a t-module/f-module, feature fusion is performed before passing the data to the next t-module/f-module. Time-domain and frequency-domain feature fusion (TFF/FFF) can be represented as:

$${{{{\rm{FFF}}}}}_{{{{\rm{n}}}}}= \frac{1}{2}({f}_{{{{\rm{n}}}}}({x}_{{{{\rm{n}}}}}^{{{{\rm{f}}}}})+F({t}_{{{{\rm{n}}}}}({x}_{{{{\rm{n}}}}}^{{{{\rm{t}}}}})))\\ {{{{\rm{TFF}}}}}_{{{{\rm{n}}}}}= \frac{1}{2}({t}_{{{{\rm{n}}}}}({x}_{{{{\rm{n}}}}}^{{{{\rm{t}}}}})+{F}^{-1}({f}_{{{{\rm{n}}}}}({x}_{{{{\rm{n}}}}}^{{{{\rm{f}}}}}))),$$

(2)

where f_n and t_n denote the n-th f-module/t-module, x_n^f and x_n^t refer to the frequency-domain and time-domain data fed into the n-th f-module/t-module. F and F⁻¹ denote the Fourier and inverse Fourier transforms, respectively. The t-modules predict unsampled points based on the sampled points. At this stage, time-domain reconstruction is free from the interference of undersampling artifacts caused by strong peaks on the reconstruction of weak peaks, preserving weak peaks better. Thus, after FFF, these peaks that were lost during the f-module may be recovered. In addition, after TFF, the time-domain reconstruction module can access more information, enabling it to achieve better reconstruction results at low sampling rates. Additionally, to fully utilize the available sampled data information, a data consistency (DC) module is employed to constrain the reconstruction results after the last transposed layer (see Supplementary Part 2). After multiple rounds of reconstruction by the t-modules and f-modules, the features from the last t-module and f-module in JTF-Net were fused by FFF. These fused features are then processed by an ED module and multiple convolutional layers for final adjustments to output the reconstructed spectrum. Moreover, the aleatoric uncertainty can be obtained from the aleatoric uncertainty branch (green rectangle in Fig. 8a), which consists of three transposed convolution layers. The loss function of JTF-Net consists of three parts: time-domain reconstruction module loss L_t, frequency-domain reconstruction module loss L_f (see Supplementary Part 3 for the specific formula of L_t and L_f), and aleatoric uncertainty loss L_ale, expressed as

$$L={L}_{{{{\rm{t}}}}}+{L}_{{{{\rm{f}}}}}+{L}_{{{{\rm{ale}}}}}.$$

(3)

Dataset and training

DL requires a large amount of training data to train models. However, the quantity of data acquired through NMR experiments is usually limited. Starting from NMR physics, the theoretical expressions provided in Supplementary Part 4 are used to generate a substantial amount of simulated NMR data that closely approximates real experimental data. Additionally, JTF-Net employs a dimensionality reduction approach for reconstructing NMR spectra. Specifically, it utilizes a 1D model for reconstructing 2D spectra and a 2D model for reconstructing 3D spectra. The numbers of simulated spectra in the training set and validation set are 40,000 and 4000 respectively. The JTF-Net model was trained on a server with 8 NVIDIA Tesla GPUs. The following fixed mutual hyperparameters were used: initial learning rate was 0.001 and batch size was 4. Adam optimization with the default settings was adopted. The model was trained by using early stopping and the learning rate was halved when the loss of validation set did not decrease.

Reconstruction Quality Assurance Ratio (REQUIRER)

When applying JTF-Net to a real NUS spectrum, multiple reconstructions are performed with different outputs in the presence of dropout. As shown in Fig. 8b, the final reconstructed spectrum is the mean of these reconstructed spectra, while the epistemic uncertainty is their variance. The final aleatoric uncertainty is the mean of aleatoric uncertainties. Although full-reference quality assessment indicators like RLNE cannot be obtained due to the lack of fully sampled spectra, aleatoric and epistemic uncertainties can be obtained when the reconstruction process is complete. If the relationship between these uncertainties and the RLNE indicator can be explored, it would be possible to infer the RLNE indicator of the reconstructed spectra based on the uncertainties. Here, the trained JTF-Net was used to reconstruct a large number of simulated spectra and obtained the reconstructed spectra, aleatoric uncertainties, and epistemic uncertainties. Since the spectra are simulated, the fully sampled spectra can be obtained, and the RLNE indicators for all reconstructed spectra can be calculated. To correlate RLNE with aleatoric uncertainties and epistemic uncertainties, the quality space was constructed, as shown in Fig. 8(c) and Supplementary part 6. In this space, every point, denoted as the quality point, is formed by the three indicators: epistemic uncertainty, aleatoric uncertainty, and RLNE, which are derived from the reconstruction of the individual simulated spectrum. It is seen that as the uncertainty increases, RLNE exhibits a growing trend. This serves as the foundation for our reference-free quality assessment method.

If a reconstructed spectrum has peak loss or artifacts, it will not be considered as a high-quality spectrum in our evaluation. However, determining whether a reconstructed experimental spectrum has peak loss or artifacts through RLNE is a challenging issue. To address this issue, 2D and 3D NUS reconstructions were conducted on a large of simulated spectra with varying sampling rates, peak numbers, and signal-to-noise ratios (SNR) to determine the threshold of RLNE. The results show that when the artifact peaks or peak loss (A/L) is observed, the minimum RLNEs are 0.3535 for the 2D spectra and 0.5562 for the 3D spectra. To be on the safe side, the quality thresholds in 2D and 3D reconstructed spectra are set to 0.3500 and 0.5500, respectively. The details of determining quality thresholds can be found in Supplementary part 7. In the real-world application, JTF-Net reconstructs a practical experimental NUS spectrum, offering its aleatoric uncertainty (A0) and epistemic uncertainty (E0). The point (E0, A0) can be extended into a line along the RLNE axis in the quality space, as shown in Fig. 8c. The real RLNE value of the reconstructed spectrum maybe one of the values in this line. Within an narrow cuboid centered on this line, there are quality points whose uncertainties are close to that of this reconstructed spectrum. This cuboid is called the quality cuboid with its establishment in Supplementary part 8. Then the REQUIRER is calculated with

$${{{\rm{REQUIRER}}}} = \frac{M}{N} \times 100 \%, $$

(4)

where N is the total number of quality points in the quality cuboid (N is about 100 in practical use), and M represents the number of quality points whose RLNE falls below the quality threshold in the quality cuboid. As M is not larger than N, the REQUIRER ranges from 0% to 100%.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The GB1 data used in this study are available in the NMRPipe Demonstration Data under https://www.ibbr.umd.edu/nmrpipe/demo.html; the Azurin data used in this study are available in the MddNMR under http://mddnmr.spektrino.com/ComparisonExample; the T4L L99A data used in this study are available in the website of HANSEN lab under https://www.ucl.ac.uk/hansen-lab/deep-nus.html; the BMRB data used in this study are available in the BMRB database under accession codes 7192, 15610, 17008, 18180, 6585, 15100, 15822, 7175, 15063, 15217, 16101, 16691, 15999, 15317, 15338, 15139, 15258, 16806, 15430, 18145, 19329, 18156, 15139. The data generated in this study have been deposited in the Zenodo under (https://doi.org/10.5281/zenodo.14793153)²⁸ and in the Supplementary Information/Source Data file. Source data are provided with this paper.

Code availability

The code for the generated training set, the training code, and the testing code can be found at https://github.com/LinYanqin/JTFNet-Requirer or https://doi.org/10.5281/zenodo.14789032²⁹. The codes supporting tensorflow V1.14 and tensorflow V2.9 are both provided under the Non-Profit Open Software License version 3.0.

References

Sekhar, A. & Kay, L. E. An NMR view of protein dynamics in health and disease. Annu. Rev. Biophys. 48, 297–319 (2019).
Article CAS PubMed MATH Google Scholar
Inomata, K. et al. High-resolution multi-dimensional NMR spectroscopy of proteins in human cells. Nature 458, 106–109 (2009).
Article ADS CAS PubMed MATH Google Scholar
Kazimierczuk, K. & Orekhov, V. Y. Accelerated NMR spectroscopy by using compressed sensing. Angew. Chem. Int. Ed. 50, 5556–5559, (2011).
Article CAS MATH Google Scholar
Orekhov, V. Y. & Jaravine, V. A. Analysis of non-uniformly sampled spectra with multi-dimensional decomposition. Prog. Nucl. Magn. Reson. Spectrosc. 59, 271–292, (2011).
Article CAS PubMed MATH Google Scholar
Jaravine, V., Ibraghimov, I. & Yu Orekhov, V. Removal of a time barrier for high-resolution multidimensional NMR spectroscopy. Nat. Methods 3, 605–607 (2006).
Article CAS PubMed MATH Google Scholar
Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multidimensional iterative lineshape-enhanced (SMILE) reconstruction of both non-uniformly sampled and conventional NMR data. J. Biomol. NMR 68, 101–118 (2017).
Article CAS PubMed Google Scholar
Hyberts, S. G., Milbradt, A. G., Wagner, A. B., Arthanari, H. & Wagner, G. Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling. J. Biomol. NMR 52, 315–327, (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Minaee, S. et al. Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3523–3542 (2021).
MATH Google Scholar
Zhao, Z., Zheng, P., Xu, S. & Wu, X. Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30, 3212–3232 (2019).
Article PubMed MATH Google Scholar
Minaee, S. et al. Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. 54, 1–40 (2021).
Article MATH Google Scholar
Yadav, A. & Vishwakarma, D. K. Sentiment analysis using deep learning architectures: a review. Artif. Intell. Rev. 53, 4335–4385 (2020).
Article MATH Google Scholar
Luo, Y. et al. Deep learning and its applications in nuclear magnetic resonance spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc. 146-147, 101556 (2025).
Article CAS Google Scholar
Li, D. W., Hansen, A. L., Yuan, C., Lei, B. L. & Brüschweiler, R. Deep picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12, 5229 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Schmid, N. et al. Deconvolution of 1D NMR spectra: a deep learning-based approach. J. Magn. Reson. 347, 107357 (2023).
Article CAS PubMed MATH Google Scholar
Klukowski, P., Riek, R. & Güntert, P. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat. Commun. 13, (2022).
Wu, K. et al. Improvement in signal-to-noise ratio of liquid-state NMR spectroscopy via a deep neural network DN-Unet. Anal. Chem. 93, 1377–1382 (2021).
Article CAS PubMed MATH Google Scholar
Karunanithy, G., Shukla, V. K. & Hansen, D. F. Solution-state methyl NMR spectroscopy of large non-deuterated proteins enabled by deep neural networks. Nat. Commun. 15, 5073 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2261–2269 (IEEE, 2017).
Sun, K., Xiao, B., Liu, D. & Wang, J. Deep high-resolution representation learning for human pose estimation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5693–5703 (IEEE, 2019).
Qu, X. et al. Accelerated nuclear magnetic resonance spectroscopy with deep learning. Angew. Chem. Int. Ed.132, 10383–10386 (2020).
Article ADS MATH Google Scholar
Luo, J., Zeng, Q., Wu, K. & Lin, Y. Fast reconstruction of non-uniform sampling multidimensional NMR spectroscopy via a deep neural network. J. Magn. Reson. 317, 106772 (2020).
Article CAS PubMed Google Scholar
Hansen, D. F. Using deep neural networks to reconstruct non-uniformly sampled NMR spectra. J. Biomol. NMR 73, 577–585 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Karunanithy, G. & Hansen, D. F. FID-Net: a versatile deep neural network architecture for NMR spectral reconstruction and virtual decoupling. J. Biomol. NMR 75, 179–191 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780, (1997).
Article CAS PubMed MATH Google Scholar
Oord, A. V. D. et al. Wavenet: a generative model for raw audio. arXiv Prepr. arXiv 1609, 03499 (2016).
MATH Google Scholar
Kendall, A. & Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proc. Advances in Neural Information Processing Systems (Curran Associates, Inc., 2017).
Gal, Y. & Ghahramani, Z. Dropout as a bayesian approximation: representing model uncertainty in deep learning. In Proc. International Conference on Machine Learning 1050–1059 (PMLR, 2016).
Lin, Y. Reconstructed data of JTF-Net. Zenodo, https://doi.org/10.5281/zenodo.14793153, (2025).
Lin, Y. Deep learning network for NMR spectra reconstruction in time-frequency domain and quality assessment. Zenodo. https://doi.org/10.5281/zenodo.14789032.
Ramelot, T. & Kennedy, M., NMR structure of YqaI from Bacillus sbutilis. Biol. Magn. Resonance Bank (2006).
Ding, K., et al. Solution NMR structure of BH09830 from bartonella henselae modeled with one Zn+2 bound. Biol. Magn. Resonance Bank (2008).
Ertekin, A. et al. Solution NMR structure of peptide methionine sulfoxide reductase msrB from Bacillus subtilis. Biol. Magn. Resonance Bank (2010).
Rossi, P. et al. Solution NMR structure of the uncharacterized protein from gene locus rrnAC0354 of Haloarcula marismortui. Northeast structural genomics consortium target HmR11. Biol. Magn. Resonance Bank (2012).
Song, J., Zhao, Q., Lee, M. S. & Markley, J. L. ¹H, ¹⁵N and ¹³C resonance assignments of the putative Bet v1 family protein At1g24000. 1 from Arabidopsis Thaliana. J. Biomol. NMR 32, 335–335 (2005).
Article CAS PubMed Google Scholar
Ramelot, T., et al. NMR structure of E.coli YfgJ protein modelled with two Zn+2 bound. Northeast structural genomics consortium target ER317. Biol. Magn. Resonance Bank (2007).
Swapna, G., et al. NMR solution structure of A3DK08 protein from Clostridium thermocellum: Northeast structural genomics consortium target CmR9. Biol. Magn. Resonance Bank (2008).
Ramelot, T. & Kennedy, M. NMR structure of YorP from Bacillus sbutilis. Biol. Magn. Resonance Bank (2006).
Song, J., Bettendorff, L., Tonelli, M. & Markley, J. L. Structural basis for the catalytic mechanism of mammalian 25-kDa thiamine triphosphatase. J. Biol. Chem. 283, 10939–10948, (2008).
Article CAS PubMed PubMed Central Google Scholar
Swapna, G., et al. NMR solution structure of Ykvr protein from bacillus subtilis: Northeast structural genomics consortium target SR358. Biol. Magn. Resonance Bank (2006).
G. V. T., S, et al. NMR solution Structure of O64736 protein from Arabidopsis thaliana: Northeast structural genomics consortium MEGA target AR3445A. Biol. Magn. Resonance Bank (2009).
Liu, G. et al. Solution NMR structure of probable 30S ribosomal protein PSRP-3 (Ycf65-like protein) from synechocystis sp. (strain PCC 6803), Northeast structural genomics consortium target target SgR46. Biol. Magn. Resonance Bank (2010).
Ramelot, T. et al. Solution NMR structure of HTH_XRE family transcriptional regulator BT_p548217 from Bacteroides thetaiotaomicron. Northeast structural genomics consortium target BtR244. Biol. Magn. Resonance Bank (2008).

Download references

Acknowledgements

The 2D ¹⁵N−¹H HSQC of protein GB1 was downloaded from https://www.ibbr.umd.edu/nmrpipe/demo.html; The 3D HNCO spectrum of azurin protein was downloaded from http://mddnmr.spektrino.com; Dennis Flemming Hansen for providing the 2D HSQC spectrum of T4L L99A; The 2D and 3D spectra were downloaded from BMRB; Y.Q.L. is supported by the National Natural Science Foundation of China (Grants 22374124 and 22174118).

Author information

Authors and Affiliations

Department of Electronic Science, Xiamen University, Xiamen, China
Yao Luo, Wenhan Chen, Zhenhua Su, Xiaoqi Shi, Jie Luo, Xiaobo Qu, Zhong Chen & Yanqin Lin
Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Xiamen University, Xiamen, China
Yao Luo, Wenhan Chen, Zhenhua Su, Xiaoqi Shi, Jie Luo, Xiaobo Qu, Zhong Chen & Yanqin Lin
State Key Laboratory for Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen, China
Yao Luo, Wenhan Chen, Zhenhua Su, Xiaoqi Shi, Jie Luo, Xiaobo Qu, Zhong Chen & Yanqin Lin

Authors

Yao Luo
View author publications
Search author on:PubMed Google Scholar
Wenhan Chen
View author publications
Search author on:PubMed Google Scholar
Zhenhua Su
View author publications
Search author on:PubMed Google Scholar
Xiaoqi Shi
View author publications
Search author on:PubMed Google Scholar
Jie Luo
View author publications
Search author on:PubMed Google Scholar
Xiaobo Qu
View author publications
Search author on:PubMed Google Scholar
Zhong Chen
View author publications
Search author on:PubMed Google Scholar
Yanqin Lin
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L. and Y.Q.L. conceived and designed the project, and wrote, reviewed, and edited the manuscript. Y.L. wrote the code, built and trained JTF-Net, and created the quality space, as well as tested the comparison methods. W.H.C. tested the quality thresholds and comparison methods. Z.H.S., X.Q.S., and J.L. contributed to data collection for JTF-Net. X.B.Q. and Y.Q.L provided resources. Z.C. and Y.Q.L. were responsible for funding acquisition.

Corresponding author

Correspondence to Yanqin Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, Y., Chen, W., Su, Z. et al. Deep learning network for NMR spectra reconstruction in time-frequency domain and quality assessment. Nat Commun 16, 2342 (2025). https://doi.org/10.1038/s41467-025-57721-w

Download citation

Received: 06 August 2024
Accepted: 28 February 2025
Published: 08 March 2025
DOI: https://doi.org/10.1038/s41467-025-57721-w