语音识别开源工具PyTorch-Kaldi：兼顾Kaldi效率与PyTorch灵活性

浏览：作者：来源：时间：2023-05-07 分类：新闻

作者：Nurhachu Null1 背景1.1 语音识别系统的组成1.2 业界的基本现状1.3 Why pytorch-kaldi？2 PyTorch-Kaldi 简介2.1 配置文件2.2 语音特征2.2 语音特征和标签2.3 chunk 和 minibatch 的组成2.4 DNN 声学模型、解码和打分3 总结就其整体架构和 Mirco Ravanelli 等人表现出来的「野心」来看，PyTorch-Kaldi 的潜力是比较大的。项目文档中关于下一个版本的描述是这样写的：「The architecture of the toolkit will be more

就其整体架构和 Mirco Ravanelli 等人表现出来的“野心”来看，PyTorch-Kaldi 的潜力是比较大的。项目文档中关于下一个版本的描述是这样写的：“The architecture of the toolkit will be more modular and flexible. Beyond speech recognition, the new toolkit will be suitable for other applications such as speaker recognition, speech enhancement, speech separation, etc.”。当然，这只是一个工具而已，如果没有对语音识别技术的深刻理解，肯定是做不出更好东西的。许愿：有更多的人力和资源积极地投入到这个领域，帮助让 PyTorch-Kaldi 变得更好，或者打造出全新的比 PyTorch-Kaldi 更好的工具。

参考资料

[1] M. Ravanelli, T. Parcollet and Y. Bengio, "The Pytorch-kaldi Speech Recognition Toolkit," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 6465-6469.doi: 10.1109/ICASSP.2019.8683713

[2] D. Yu and L. Deng, Automatic Speech Recognition – A Deep Learning Approach, Springer, 2015.

[3] Kaldi 文档（kaldi-asr.org/doc/）(http://kaldi-asr.org/doc/%EF%BC%89)

[4] PyTorch-Kaldi Github 仓库（https://github.com/mravanelli/pytorch-kaldi）(https://github.com/mravanelli/pytorch-kaldi%EF%BC%89)

[5] 王赟. 语音识别技术的前世今生（https://www.zhihu.com/lives/843853238078963712%EF%BC%89）(https://www.zhihu.com/lives/843853238078963712%EF%BC%89%EF%BC%89)

更多关注微信公众号：jiuwenwang

上一篇: 招代理软文大全，微商招代理软文范例

下一篇: Chanel的康朋街31号重现，美好的旧时光又回来了