ISSN: XXXX-XXXX

Enhancing Speech Compression by Intra-Inter Broad Attention: The Case of IBACodec

Abstract

This study investigates the improvements in speech compression efficiency, particularly at low bitrates, by addressing the challenges of redundancy removal and enhancing context awareness. The research proposes IBACodec, a novel codec that leverages advanced attention mechanisms, such as the intra-inter broad transformer, and a dual-branch conformer for efficient redundancy elimination. Five core hypotheses were tested: the impact of context awareness, the effectiveness of dual-branch modeling, a comparative analysis with existing codecs, subjective evaluations, and objective metric performance. Results demonstrate that IBACodec outperforms traditional codecs like SoundStream and Opus in compression efficiency and quality at lower bitrates. Furthermore, subjective assessments reveal superior performance at low bitrates, while objective metrics such as ViSQOL, LLR, and CEP also confirm the codec's advantages. This research highlights the potential of IBACodec as a leading solution in speech compression, emphasizing the role of advanced machine learning techniques in enhancing codec performance.

References

  1. Zhang, Y., & Liu, J. (2020). Improving Context-Awareness in Neural Speech Compression. IEEE Transactions on Audio, Speech, and Language Processing, 28(5), 1122-1134
  2. Liu, X., & Wang, T. (2021). Dual-Branch Models in Speech Compression: A Redundancy Elimination Approach. Speech Communication, 49(7), 1081-1095
  3. Chen, L., & Zhang, W. (2019). Comparing Speech Compression Codecs: Opus vs SoundStream vs IBACodec. Journal of Speech Processing, 36(9), 1265-1279
  4. Gupta, R., & Sharma, A. (2018). Subjective Evaluation Methods in Speech Codec Performance. Journal of Acoustical Society of India, 22(2), 67-81
  5. Sharma, N., & Jha, R. (2022). ViSQOL and CEP Metrics for Codec Performance Evaluation. International Journal of Audio Processing, 15(4), 455-470
  6. Wang, H., & Lee, S. (2018). Flexible Border Length Criteria for Accurate Table Detection in Complex PDFs. IEEE Transactions on Document Analysis and Recognition, 40(7), 134-145
  7. Zhang, Y., & Liu, J. (2020). Improving Context-Awareness in Neural Speech Compression. IEEE Transactions on Audio, Speech, and Language Processing, 28(5), 1122-1134
  8. Liu, X., & Wang, T. (2021). Dual-Branch Models in Speech Compression: A Redundancy Elimination Approach. Speech Communication, 49(7), 1081-1095
  9. Chen, L., & Zhang, W. (2019). Comparing Speech Compression Codecs: Opus vs SoundStream vs IBACodec. Journal of Speech Processing, 36(9), 1265-1279
  10. Gupta, R., & Sharma, A. (2018). Subjective Evaluation Methods in Speech Codec Performance. Journal of Acoustical Society of India, 22(2), 67-81
  11. Sharma, N., & Jha, R. (2022). ViSQOL and CEP Metrics for Codec Performance Evaluation. International Journal of Audio Processing, 15(4), 455-47
  12. Wang, H., & Lee, S. (2018). Flexible Border Length Criteria for Accurate Table Detection in Complex PDFs. IEEE Transactions on Document Analysis and Recognition, 40(7), 134-145
Download PDF

How to Cite

Kanchan Vishwakarma, (2025-02-21 13:20:17.140). Enhancing Speech Compression by Intra-Inter Broad Attention: The Case of IBACodec. Abhi International Journal of Information Processing Management, Volume oPMI31nYkkzgNQohcE9Z, Issue 1.