Abstract
With the rapid development of speech recognition technology, Chinese speech-to-text (STT) systems play an important role in the production of subtitles and are often used in instructional videos. However, due to the complexity of the Chinese language and the large number of homophones, there is still significant room for improvement in the accuracy of existing STT systems. In this study, we proposed two optimization methods based on large language models (LLM), including language model-assisted editing and fine-tuned language model-assisted text editing, to improve the accuracy of Chinese STT, and verified them by producing subtitles for instructional videos in various domains and calculating the Levenshtein distance between two strings with dynamic programming. The results indicated that the fine-tuned language model-assisted text editing approach is significantly better than the language model-assisted editing approach in terms of text accuracy, and it can generate fine-tuning strategies for specific language characteristics to recognize language nuances more efficiently, thus significantly improving the accuracy of Chinese speech-to-text systems.
Author Information
Chih Chang Yang, National Taiwan Normal University, Taiwan
Tzren-Ru Chou, National Taiwan Normal University, Taiwan
Shu Wei Liu, National Taiwan University of Science and Technology, Taiwan
Comments
Powered by WP LinkPress