Neural Audio Codecs have become powerful tools for audio processing, offering learnable compression methods that balance high compression ratios with perceptual quality. This paper introduces a signal processing system that utilizes the latent space of Neural Audio Codecs for signal reconstruction and feature extraction in edge computing environments. We design a lightweight NAC encoder inspired by SoundStream, optimized for resource-constrained devices. Our evaluation on speech recognition and classification tasks highlights the system's adaptability to Internet of Things applications. The proposed design achieves a 40× audio waveform compression with only a 3% increase in word error rate for transcription tasks and a 94.6% accuracy on end-to-end intent classification, demonstrating its practicality for real-world deployment. Additionally, the encoder operates at a real-time factor of 1.77 on an ARM Cortex-A53 using a single thread for intra/inter-operation, ensuring efficient real-time compression and 12-8 times less energy consumption compared to the original model encoder.

Exploiting Neural Audio Codecs for Edge-to-Gateway Speech Processing

Ciapponi Stefano;Farella Elisabetta
2025-01-01

Abstract

Neural Audio Codecs have become powerful tools for audio processing, offering learnable compression methods that balance high compression ratios with perceptual quality. This paper introduces a signal processing system that utilizes the latent space of Neural Audio Codecs for signal reconstruction and feature extraction in edge computing environments. We design a lightweight NAC encoder inspired by SoundStream, optimized for resource-constrained devices. Our evaluation on speech recognition and classification tasks highlights the system's adaptability to Internet of Things applications. The proposed design achieves a 40× audio waveform compression with only a 3% increase in word error rate for transcription tasks and a 94.6% accuracy on end-to-end intent classification, demonstrating its practicality for real-world deployment. Additionally, the encoder operates at a real-time factor of 1.77 on an ARM Cortex-A53 using a single thread for intra/inter-operation, ensuring efficient real-time compression and 12-8 times less energy consumption compared to the original model encoder.
2025
978-9-4645-9362-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/364934
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact