ISSN: XXXX-XXXX

Enhancing Speech Codec Efficiency with Intra-Inter Broad Attention Mechanism

Abstract

This paper introduces a new approach in speech compression through advanced attention mechanisms, integration of LSTM, and dual-branch conformer structures for optimizing codec efficiency. The study focuses on five research questions, which are: intra-inter broad attention, multi-head attention networks, LSTM for sequence modeling, redundancy elimination, and comparative performance of IBACodec against traditional codecs. The study uses a quantitative methodology with performance metrics that include bitrate efficiency and quality evaluation. Results confirm that IBACodec significantly enhances context awareness, compression efficiency, sequence modeling, and redundancy elimination compared to existing solutions. These findings position IBACodec as a leading solution for speech compression. Further research is needed to explore real-time applications and broader datasets.

References

  1. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. A., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Proceedings of NeurIPS, 30, 5998-6008.
  2. Oord, A. V. D., Dieleman, S., & Zen, H. (2016). WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
  3. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. Proceedings of ICLR.
  4. Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. Proceedings of NeurIPS, 27, 2067-2075.
  5. Kim, Y., & Lee, S. (2020). Redundancy removal in speech codecs using conformer-based models. IEEE Transactions on Audio, Speech, and Language Processing, 28, 1820-1833.
  6. Zhang, S., Cheng, J., & Zhao, Z. (2019). Efficient speech signal representation using deep learning: A review. IEEE Access, 7, 104153-104162.
  7. Li, X., & Sun, L. (2021). A survey of neural speech codec approaches. IEEE Transactions on Signal Processing, 69, 175-189.
  8. Xu, Y., & Sun, X. (2022). Speech compression using hybrid attention-based mechanisms. Proceedings of ICASSP, 1587-1591.
  9. Li, Y., & Wang, Y. (2020). LSTM-enhanced neural speech codecs for low-latency real-time compression. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2315-2319.
  10. Shanmugam, A., & Wei, J. (2022). Performance comparison of traditional and neural speech codecs for modern applications. Journal of Audio Engineering Society, 70(12), 929-939.
Download PDF

How to Cite

Gnanzou, D, (2025-01-07 17:54:25.164). Enhancing Speech Codec Efficiency with Intra-Inter Broad Attention Mechanism. Abhi International Journal of Information Processing Management, Volume bgjq8zH6kshH0MaeJVQL, Issue 1.