Results 11 to 20 of about 3,552,310 (382)
Wavenet based low rate speech coding [PDF]
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit ...
Kleijn, W. Bastiaan +6 more
core +2 more sources
Scaling Transformers for Low-Bitrate High-Quality Speech Coding [PDF]
The tokenization of speech with neural audio codec models is a vital part of modern AI pipelines for the generation or understanding of speech, alone or in a multimodal context.
Julian D. Parker +6 more
openalex +2 more sources
End-to-End Neural Speech Coding for Real-Time Communications [PDF]
Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC.
Xue Jiang +5 more
semanticscholar +1 more source
Generative Speech Coding with Predictive Variance Regularization [PDF]
The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present ...
W. Kleijn +7 more
semanticscholar +1 more source
NESC: Robust Neural End-2-End Speech Coding with GANs [PDF]
Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge.
N. Pia +4 more
semanticscholar +1 more source
Latent-Domain Predictive Neural Speech Coding [PDF]
Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a ...
Xue Jiang +4 more
semanticscholar +1 more source
Disentangled Feature Learning for Real-Time Neural Speech Coding [PDF]
Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs. This is mostly achieved by following the VQ-VAE paradigm where blind features are learned, vector-quantized and ...
Xue Jiang +3 more
semanticscholar +1 more source
A Streamwise Gan Vocoder for Wideband Speech Coding at Very Low Bit Rate [PDF]
Recently, GAN vocoders have seen rapid progress in speech synthesis, starting to outperform autoregressive models in perceptual quality with much higher generation speed.
Ahmed Mustafa +5 more
semanticscholar +1 more source
Scalable and Efficient Neural Speech Coding: A Hybrid Design [PDF]
We present a scalable and efficient neural waveform coding system for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as a neural waveform codec
Kai Zhen +4 more
semanticscholar +1 more source
Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders [PDF]
Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech out-put.
Jonah Casebeer +5 more
semanticscholar +1 more source

