Spiking Wavelet Transformer

Fang, Yuetong; Wang, Ziqing; Zhang, Lingfeng; Cao, Jiahang; Chen, Honglei; Xu, Renjing

doi:10.1007/978-3-031-73116-7_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15134))

Included in the following conference series:

European Conference on Computer Vision

866 Accesses
5 Citations

Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer’s effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.

Y. Fang and Z. Wang—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction

Article 04 May 2023

Real Spike: Learning Real-Valued Spikes for Spiking Neural Networks

A Rapid and Precise Spiking Neural Network for Image Recognition

References

Auge, D., Mueller, E.: Resonate-and-fire neurons as frequency selective input encoders for spiking neural networks (2020)
Google Scholar
Basu, A., Deng, L., Frenkel, C., Zhang, X.: Spiking neural network integrated circuits: a review of trends and future directions. In: 2022 IEEE Custom Integrated Circuits Conference (CICC), pp. 1–8. IEEE (2022)
Google Scholar
Bellec, G., Salaj, D., Subramoney, A., Legenstein, R., Maass, W.: Long short-term memory and learning-to-learn in networks of spiking neurons. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., Andreopoulos, Y.: Graph-based object classification for neuromorphic vision sensing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 491–501 (2019)
Google Scholar
Bochner, S., Chandrasekharan, K.: Fourier transforms, No. 19. Princeton University Press (1949)
Google Scholar
Bu, T., Fang, W., Ding, J., Dai, P., Yu, Z., Huang, T.: Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks. In: International Conference on Learning Representations (2021)
Google Scholar
Burkitt, A.N.: A review of the integrate-and-fire neuron model: I. homogeneous synaptic input. Biol. Cybern. 95, 1–19 (2006)
Google Scholar
Cao, Y., Chen, Y., Khosla, D.: Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vision 113(1), 54–66 (2015)
Article MathSciNet Google Scholar
Chen, S., Ye, T., Bai, J., Chen, E., Shi, J., Zhu, L.: Sparse sampling transformer with uncertainty-driven ranking for unified removal of raindrops and rain streaks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13106–13117 (2023)
Google Scholar
Chen, S., et al.: MSP-former: Multi-scale projection transformer for single image desnowing. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Google Scholar
Dao, T., et al.: Monarch: expressive structured matrices for efficient and accurate training. In: International Conference on Machine Learning, pp. 4690–4721. PMLR (2022)
Google Scholar
Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018)
Article Google Scholar
Davies, M., et al.: Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE 109(5), 911–934 (2021)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Deng, S., Li, Y., Zhang, S., Gu, S.: Temporal efficient training of spiking neural network via gradient re-weighting. arXiv preprint arXiv:2202.11946 (2022)
Ding, J., Yu, Z., Tian, Y., Huang, T.: Optimal ANN-SNN conversion for fast and accurate inference in deep spiking neural networks. arXiv preprint arXiv:2105.11654 (2021)
Duan, C., Ding, J., Chen, S., Yu, Z., Huang, T.: Temporal effective batch normalization in spiking neural networks. In: Advances in Neural Information Processing Systems, vol. 35, pp. 34377–34390 (2022)
Google Scholar
Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., Tian, Y.: Deep residual learning in spiking neural networks. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21056–21069 (2021)
Google Scholar
Frady, E.P., et al.: Efficient neuromorphic signal processing with resonator neurons. J. Signal Process. Syst. 94(10), 917–927 (2022)
Article Google Scholar
Gaudart, L., Crebassa, J., Petrakian, J.P.: Wavelet transform in human visual channels. Appl. Opt. 32(22), 4119–4127 (1993)
Google Scholar
Gu, P., Xiao, R., Pan, G., Tang, H.: STCA: spatio-temporal credit assignment with delayed feedback in deep spiking neural networks. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. pp. 1366–1372. International Joint Conferences on Artificial Intelligence Organization, Macao, China (2019). https://doi.org/10.24963/ijcai.2019/189
Guo, Y., Zhang, L., Chen, Y., Tong, X., Liu, X., Wang, Y., Huang, X., Ma, Z.: Real spike: Learning real-valued spikes for spiking neural networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 52–68. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19775-8_4
He, C., et al.: Camouflaged object detection with feature decomposition and edge reconstruction. In: CVPR, pp. 22046–22055 (2023)
Google Scholar
He, C., Li, K., Zhang, Y., Xu, G., Tang, L.: Weakly-supervised concealed object segmentation with SAM-based pseudo labeling and multi-scale feature grouping. In: NeurIPS (2024)
Google Scholar
He, C., et al.: Diffusion models in low-level vision: a survey. arXiv preprint arXiv:2406.11138 (2024)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hopkins, M., Pineda-Garcia, G., Bogdan, P.A., Furber, S.B.: Spiking neural networks for computer vision. Interface Focus 8(4), 20180007 (2018)
Article Google Scholar
Hu, Y., Tang, H., Pan, G.: Spiking deep residual networks. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 5200–5205 (2021)
Article Google Scholar
Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks towards deep residual learning. arXiv preprint arXiv:2112.08954 (2021)
Hu, Y., Deng, L., Wu, Y., Yao, M., Li, G.: Advancing spiking neural networks toward deep residual learning. IEEE Trans. Neural Netw. Learn. Syst. (2024)
Google Scholar
Ji, M., Wang, Z., Yan, R., Liu, Q., Xu, S., Tang, H.: SCTN: event-based object tracking with energy-efficient deep convolutional spiking neural networks. Front. Neurosci. 17, 1123698 (2023)
Article Google Scholar
Jiménez-Fernández, A., et al.: A binaural neuromorphic auditory sensor for FPGA: a spike signal processing approach. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 804–818 (2016)
Article Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lee, D., Yin, R., Kim, Y., Moitra, A., Li, Y., Panda, P.: TT-SNN: tensor train decomposition for efficient spiking neural network training. arXiv preprint arXiv:2401.08001 (2024)
Lee, I., Kim, J., Kim, Y., Kim, S., Park, G., Park, K.T.: Wavelet transform image coding using human visual system. In: Proceedings of APCCAS’94-1994 Asia Pacific Conference on Circuits and Systems, pp. 619–623. IEEE (1994)
Google Scholar
Li, H., Liu, H., Ji, X., Li, G., Shi, L.: CIFAR10-DVS: an event-stream dataset for object classification. Front. Neurosci. 11 (2017)
Google Scholar
Li, Q., Shen, L., Guo, S., Lai, Z.: Wavelet integrated CNNs for noise-robust image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7245–7254 (2020)
Google Scholar
Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 631–649. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20071-7_37
Li, Y., Kim, Y., Park, H., Geller, T., Panda, P.: Neuromorphic data augmentation for training spiking neural networks. arXiv preprint arXiv:2203.06145 (2022)
Liu, Q., Xing, D., Tang, H., Ma, D., Pan, G.: Event-based action recognition using motion information and spiking neural networks. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pp. 1743–1749. International Joint Conferences on Artificial Intelligence Organization, Montreal, Canada (2021). https://doi.org/10.24963/ijcai.2021/240
López-Randulfe, J., Duswald, T., Bing, Z., Knoll, A.: Spiking neural network for Fourier transform and object detection for automotive radar. Front. Neurorobot. 15, 688344 (2021)
Article Google Scholar
López-Randulfe, J., et al.: Time-coded spiking Fourier transform in neuromorphic hardware. IEEE Trans. Comput. 71(11), 2792–2802 (2022)
Article Google Scholar
Maass, W.: Networks of spiking neurons: the third generation of neural network models. Neural Netw. 10(9), 1659–1671 (1997)
Article Google Scholar
Maro, J.M., Ieng, S.H., Benosman, R.: Event-based gesture recognition with dynamic background suppression using smartphone computational capabilities. Front. Neurosci. 14, 275 (2020)
Article Google Scholar
Meng, Q., Xiao, M., Yan, S., Wang, Y., Lin, Z., Luo, Z.Q.: Training high-performance low-latency spiking neural networks by differentiation on spike representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12444–12453 (2022)
Google Scholar
Meng, Q., Yan, S., Xiao, M., Wang, Y., Lin, Z., Luo, Z.Q.: Training much deeper spiking neural networks with a small number of time-steps. Neural Netw. 153, 254–268 (2022)
Article Google Scholar
Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014)
Article Google Scholar
Miao, S., et al.: Neuromorphic vision datasets for pedestrian detection, action recognition, and fall detection. Front. Neurorobot. 13, 38 (2019)
Article Google Scholar
Orchard, G., Jayawant, A., Cohen, G.K., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neuroscience 9 (2015)
Google Scholar
Park, N., Kim, S.: How do vision transformers work? arXiv preprint arXiv:2202.06709 (2022)
Pei, J., et al.: Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572(7767), 106–111 (2019)
Article Google Scholar
Rao, A., Plank, P., Wild, A., Maass, W.: A long short-term memory for AI applications in spike-based neuromorphic hardware. Nature Mach. Intell. 4(5), 467–479 (2022)
Article Google Scholar
Rathi, N., et al.: Exploring neuromorphic computing based on spiking neural networks: algorithms to hardware. ACM Comput. Surv. 55(12), 1–49 (2023)
Article Google Scholar
Rathi, N., Srinivasan, G., Panda, P., Roy, K.: Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. arXiv preprint arXiv:2005.01807 (2020)
Roy, K., Jaiswal, A., Panda, P.: Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784), 607–617 (2019)
Article Google Scholar
Schuman, C.D., Kulkarni, S.R., Parsa, M., Mitchell, J.P., Kay, B., et al.: Opportunities for neuromorphic computing algorithms and applications. Nature Comput. Sci. 2(1), 10–19 (2022)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Shen, S., Zhao, D., Shen, G., Zeng, Y.: TIM: an efficient temporal interaction module for spiking transformer. arXiv preprint arXiv:2401.11687 (2024)
Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., Yan, S.: Inception transformer. In: Advances in Neural Information Processing Systems, vol. 35, pp. 23495–23509 (2022)
Google Scholar
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)
Google Scholar
Stewart, K.M., Neftci, E.O.: Meta-learning spiking neural networks with surrogate gradient descent. Neuromorphic Comput. Eng. 2(4), 044002 (2022)
Article Google Scholar
Su, Q., et al.: Deep directly-trained spiking neural networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6555–6565 (2023)
Google Scholar
Tripura, T., Chakraborty, S.: Wavelet neural operator for solving parametric partial differential equations in computational mechanics problems. Comput. Methods Appl. Mech. Eng. 404, 115783 (2023)
Article MathSciNet Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Viale, A., Marchisio, A., Martina, M., Masera, G., Shafique, M.: CarSNN: an efficient spiking neural network for event-based autonomous cars on the Loihi neuromorphic research processor. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE (2021)
Google Scholar
Wang, Z., Fang, Y., Cao, J., Xu, R.: Bursting spikes: efficient and high-performance SNNs for event-based vision. arXiv preprint arXiv:2311.14265 (2023)
Wang, Z., Fang, Y., Cao, J., Zhang, Q., Wang, Z., Xu, R.: Masked spiking transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1761–1771 (2023)
Google Scholar
Wu, H., Yang, Y., Chen, H., Ren, J., Zhu, L.: Mask-guided progressive network for joint raindrop and rain streak removal in videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 7216–7225 (2023)
Google Scholar
Yang, Y., Wu, H., Aviles-Rivero, A.I., Zhang, Y., Qin, J., Zhu, L.: Genuine knowledge from practice: diffusion test-time adaptation for video adverse weather removal. arXiv preprint arXiv:2403.07684 (2024)
Yang, Z., et al.: DashNet: a hybrid artificial and spiking neural network for high-speed object tracking. arXiv preprint arXiv:1909.12942 (2019)
Yao, M., Gao, H., Zhao, G., Wang, D., Lin, Y., Yang, Z., Li, G.: Temporal-wise attention spiking neural networks for event streams classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10221–10230 (2021)
Google Scholar
Yao, M., Hu, J., Zhou, Z., Yuan, L., Tian, Y., Xu, B., Li, G.: Spike-driven transformer. arXiv preprint arXiv:2307.01694 (2023)
Yao, M., et al.: Attention spiking neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9393–9410 (2023)
Article Google Scholar
Ye, C., Kornijcuk, V., Yoo, D., Kim, J., Jeong, D.S.: LaCERA: layer-centric event-routing architecture. Neurocomputing 520, 46–59 (2023)
Article Google Scholar
Ye, T., Zhang, Y., Jiang, M., Chen, L., Liu, Y., Chen, S., Chen, E.: Perceiving and modeling density for image dehazing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, pp. 130–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19800-7_8
Zhang, J., et al.: Spiking transformers for event-based single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8801–8810 (2022)
Google Scholar
Zheng, H., Wu, Y., Deng, L., Hu, Y., Li, G.: Going deeper with directly-trained larger spiking neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 11062–11070 (2021)
Google Scholar
Zhou, C., et al.: Spikingformer: spike-driven residual learning for transformer-based spiking neural network. arXiv preprint arXiv:2304.11954 (2023)
Zhou, Z., et al.: Spikformer: when spiking neural network meets transformer. arXiv preprint arXiv:2209.15425 (2022)
Zhu, R.J., Wang, Z., Gilpin, L., Eshraghian, J.K.: Autonomous driving with spiking neural networks. arXiv preprint arXiv:2405.19687 (2024)

Download references

Acknowledgements

This work is supported by the Guangzhou-HKUST(GZ) Joint Funding Program (Grant No. 2023A03J0682) and partially supported by a collaborative project with Brain Mind Innovation, inc. Special thanks to Mr. Yijian He.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen & Renjing Xu
Northwestern University, Evanston, IL, USA
Ziqing Wang

Authors

Yuetong Fang
View author publications
Search author on:PubMed Google Scholar
Ziqing Wang
View author publications
Search author on:PubMed Google Scholar
Lingfeng Zhang
View author publications
Search author on:PubMed Google Scholar
Jiahang Cao
View author publications
Search author on:PubMed Google Scholar
Honglei Chen
View author publications
Search author on:PubMed Google Scholar
Renjing Xu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Renjing Xu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1268 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, Y., Wang, Z., Zhang, L., Cao, J., Chen, H., Xu, R. (2025). Spiking Wavelet Transformer. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15134. Springer, Cham. https://doi.org/10.1007/978-3-031-73116-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-73116-7_2
Published: 31 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73115-0
Online ISBN: 978-3-031-73116-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics