Abstract
Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer’s effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.
Y. Fang and Z. Wang—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Auge, D., Mueller, E.: Resonate-and-fire neurons as frequency selective input encoders for spiking neural networks (2020)
Basu, A., Deng, L., Frenkel, C., Zhang, X.: Spiking neural network integrated circuits: a review of trends and future directions. In: 2022 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–8. IEEE (2022)
Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., Andreopoulos, Y.: Graph-based object classification for neuromorphic vision sensing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 491–501 (2019)
Bochner, S., Chandrasekharan, K.: Fourier transforms, No. 19. Princeton University Press (1949)
Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., Huang, T.: Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks. In: International Conference on Learning Representations (2021)
Burkitt, A.N.: A review of the integrate-and-fire neuron model: I. homogeneous synaptic input. Biol. Cybern. 95, 1–19 (2006)
Cao, Y., Chen, Y., Khosla, D.: Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vision 113(1), 54–66 (2015)
Chen, S., Ye, T., Bai, J., Chen, E., Shi, J., Zhu, L.: Sparse sampling transformer with uncertainty-driven ranking for unified removal of raindrops and rain streaks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13106–13117 (2023)
Chen, S., et al.: MSP-former: Multi-scale projection transformer for single image desnowing. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Dao, T., et al.: Monarch: expressive structured matrices for efficient and accurate training. In: International Conference on Machine Learning, pp. 4690–4721. PMLR (2022)
Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018)
Davies, M., et al.: Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE 109(5), 911–934 (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Deng, S., Li, Y., Zhang, S., Gu, S.: Temporal efficient training of spiking neural network via gradient re-weighting. arXiv preprint arXiv:2202.11946 (2022)
Ding, J., Yu, Z., Tian, Y., Huang, T.: Optimal ANN-SNN conversion for fast and accurate inference in deep spiking neural networks. arXiv preprint arXiv:2105.11654 (2021)
Duan, C., Ding, J., Chen, S., Yu, Z., Huang, T.: Temporal effective batch normalization in spiking neural networks. In: Advances in Neural Information Processing Systems, vol. 35, pp. 34377–34390 (2022)
Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., Tian, Y.: Deep residual learning in spiking neural networks. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21056–21069 (2021)
Frady, E.P., et al.: Efficient neuromorphic signal processing with resonator neurons. J. Signal Process. Syst. 94(10), 917–927 (2022)
Gaudart, L., Crebassa, J., Petrakian, J.P.: Wavelet transform in human visual channels. Appl. Opt. 32(22), 4119–4127 (1993)
Gu, P., Xiao, R., Pan, G., Tang, H.: STCA: spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. pp. 1366–1372. International Joint Conferences on Artificial Intelligence Organization, Macao, China (2019). https://doi.org/10.24963/ijcai.2019/189
Guo, Y., Zhang, L., Chen, Y., Tong, X., Liu, X., Wang, Y., Huang, X., Ma, Z.: Real spike: Learning real-valued spikes for spiking neural networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 52–68. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19775-8_4
He, C., et al.: Camouflaged object detection with feature decomposition and edge reconstruction. In: CVPR, pp. 22046–22055 (2023)
He, C., Li, K., Zhang, Y., Xu, G., Tang, L.: Weakly-supervised concealed object segmentation with SAM-based pseudo labeling and multi-scale feature grouping. In: NeurIPS (2024)
He, C., et al.: Diffusion models in low-level vision: a survey. arXiv preprint arXiv:2406.11138 (2024)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Hopkins, M., Pineda-Garcia, G., Bogdan, P.A., Furber, S.B.: Spiking neural networks for computer vision. Interface Focus 8(4), 20180007 (2018)
Hu, Y., Tang, H., Pan, G.: Spiking deep residual networks. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 5200–5205 (2021)
Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks towards deep residual learning. arXiv preprint arXiv:2112.08954 (2021)
Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks toward deep residual learning. IEEE Trans. Neural Netw. Learn. Syst. (2024)
Ji, M., Wang, Z., Yan, R., Liu, Q., Xu, S., Tang, H.: SCTN: event-based object tracking with energy-efficient deep convolutional spiking neural networks. Front. Neurosci. 17, 1123698 (2023)
Jiménez-Fernández, A., et al.: A binaural neuromorphic auditory sensor for FPGA: a spike signal processing approach. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 804–818 (2016)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lee, D., Yin, R., Kim, Y., Moitra, A., Li, Y., Panda, P.: TT-SNN: tensor train decomposition for efficient spiking neural network training. arXiv preprint arXiv:2401.08001 (2024)
Lee, I., Kim, J., Kim, Y., Kim, S., Park, G., Park, K.T.: Wavelet transform image coding using human visual system. In: Proceedings of APCCAS’94-1994 Asia Pacific Conference on Circuits and Systems, pp. 619–623. IEEE (1994)
Li, H., Liu, H., Ji, X., Li, G., Shi, L.: CIFAR10-DVS: an event-stream dataset for object classification. Front. Neurosci. 11 (2017)
Li, Q., Shen, L., Guo, S., Lai, Z.: Wavelet integrated CNNs for noise-robust image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7245–7254 (2020)
Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 631–649. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20071-7_37
Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. arXiv preprint arXiv:2203.06145 (2022)
Liu, Q., Xing, D., Tang, H., Ma, D., Pan, G.: Event-based action recognition using motion information and spiking neural networks. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pp. 1743–1749. International Joint Conferences on Artificial Intelligence Organization, Montreal, Canada (2021). https://doi.org/10.24963/ijcai.2021/240
López-Randulfe, J., Duswald, T., Bing, Z., Knoll, A.: Spiking neural network for Fourier transform and object detection for automotive radar. Front. Neurorobot. 15, 688344 (2021)
López-Randulfe, J., et al.: Time-coded spiking Fourier transform in neuromorphic hardware. IEEE Trans. Comput. 71(11), 2792–2802 (2022)
Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10(9), 1659–1671 (1997)
Maro, J.M., Ieng, S.H., Benosman, R.: Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Front. Neurosci. 14, 275 (2020)
Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., Luo, Z.Q.: Training high-performance low-latency spiking neural networks by differentiation on spike representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12444–12453 (2022)
Meng, Q., Yan, S., Xiao, M., Wang, Y., Lin, Z., Luo, Z.Q.: Training much deeper spiking neural networks with a small number of time-steps. Neural Netw. 153, 254–268 (2022)
Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)
Miao, S., et al.: Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection. Front. Neurorobot. 13, 38 (2019)
Orchard, G., Jayawant, A., Cohen, G.K., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neuroscience 9 (2015)
Park, N., Kim, S.: How do vision transformers work? arXiv preprint arXiv:2202.06709 (2022)
Pei, J., et al.: Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572(7767), 106–111 (2019)
Rao, A., Plank, P., Wild, A., Maass, W.: A long short-term memory for AI applications in spike-based neuromorphic hardware. Nature Mach. Intell. 4(5), 467–479 (2022)
Rathi, N., et al.: Exploring neuromorphic computing based on spiking neural networks: algorithms to hardware. ACM Comput. Surv. 55(12), 1–49 (2023)
Rathi, N., Srinivasan, G., Panda, P., Roy, K.: Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. arXiv preprint arXiv:2005.01807 (2020)
Roy, K., Jaiswal, A., Panda, P.: Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784), 607–617 (2019)
Schuman, C.D., Kulkarni, S.R., Parsa, M., Mitchell, J.P., Kay, B., et al.: Opportunities for neuromorphic computing algorithms and applications. Nature Comput. Sci. 2(1), 10–19 (2022)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shen, S., Zhao, D., Shen, G., Zeng, Y.: TIM: an efficient temporal interaction module for spiking transformer. arXiv preprint arXiv:2401.11687 (2024)
Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., Yan, S.: Inception transformer. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23495–23509 (2022)
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)
Stewart, K.M., Neftci, E.O.: Meta-learning spiking neural networks with surrogate gradient descent. Neuromorphic Comput. Eng. 2(4), 044002 (2022)
Su, Q., et al.: Deep directly-trained spiking neural networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6555–6565 (2023)
Tripura, T., Chakraborty, S.: Wavelet neural operator for solving parametric partial differential equations in computational mechanics problems. Comput. Methods Appl. Mech. Eng. 404, 115783 (2023)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Viale, A., Marchisio, A., Martina, M., Masera, G., Shafique, M.: CarSNN: an efficient spiking neural network for event-based autonomous cars on the Loihi neuromorphic research processor. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE (2021)
Wang, Z., Fang, Y., Cao, J., Xu, R.: Bursting spikes: efficient and high-performance SNNs for event-based vision. arXiv preprint arXiv:2311.14265 (2023)
Wang, Z., Fang, Y., Cao, J., Zhang, Q., Wang, Z., Xu, R.: Masked spiking transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1761–1771 (2023)
Wu, H., Yang, Y., Chen, H., Ren, J., Zhu, L.: Mask-guided progressive network for joint raindrop and rain streak removal in videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 7216–7225 (2023)
Yang, Y., Wu, H., Aviles-Rivero, A.I., Zhang, Y., Qin, J., Zhu, L.: Genuine knowledge from practice: diffusion test-time adaptation for video adverse weather removal. arXiv preprint arXiv:2403.07684 (2024)
Yang, Z., et al.: DashNet: a hybrid artificial and spiking neural network for high-speed object tracking. arXiv preprint arXiv:1909.12942 (2019)
Yao, M., Gao, H., Zhao, G., Wang, D., Lin, Y., Yang, Z., Li, G.: Temporal-wise attention spiking neural networks for event streams classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10221–10230 (2021)
Yao, M., Hu, J., Zhou, Z., Yuan, L., Tian, Y., Xu, B., Li, G.: Spike-driven transformer. arXiv preprint arXiv:2307.01694 (2023)
Yao, M., et al.: Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9393–9410 (2023)
Ye, C., Kornijcuk, V., Yoo, D., Kim, J., Jeong, D.S.: LaCERA: layer-centric event-routing architecture. Neurocomputing 520, 46–59 (2023)
Ye, T., Zhang, Y., Jiang, M., Chen, L., Liu, Y., Chen, S., Chen, E.: Perceiving and modeling density for image dehazing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 130–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19800-7_8
Zhang, J., et al.: Spiking transformers for event-based single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8801–8810 (2022)
Zheng, H., Wu, Y., Deng, L., Hu, Y., Li, G.: Going deeper with directly-trained larger spiking neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11062–11070 (2021)
Zhou, C., et al.: Spikingformer: spike-driven residual learning for transformer-based spiking neural network. arXiv preprint arXiv:2304.11954 (2023)
Zhou, Z., et al.: Spikformer: when spiking neural network meets transformer. arXiv preprint arXiv:2209.15425 (2022)
Zhu, R.J., Wang, Z., Gilpin, L., Eshraghian, J.K.: Autonomous driving with spiking neural networks. arXiv preprint arXiv:2405.19687 (2024)
Acknowledgements
This work is supported by the Guangzhou-HKUST(GZ) Joint Funding Program (Grant No. 2023A03J0682) and partially supported by a collaborative project with Brain Mind Innovation, inc. Special thanks to Mr. Yijian He.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fang, Y., Wang, Z., Zhang, L., Cao, J., Chen, H., Xu, R. (2025). Spiking Wavelet Transformer. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15134. Springer, Cham. https://doi.org/10.1007/978-3-031-73116-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-73116-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73115-0
Online ISBN: 978-3-031-73116-7
eBook Packages: Computer ScienceComputer Science (R0)