Implementing and evaluating OpenAI whisper for accurate speaking assessment and skill development in Indonesian EFL classroom

Salsabila Latifa; Nirwanto Maruf

Authors

Salsabila Latifa Universitas Muhammadiyah Gresik
Nirwanto Maruf Universitas Muhammadiyah Gresik https://orcid.org/0000-0002-4077-169X

Keywords:

: Openai Whisper, Automatic Speech Recognition (ASR) System Speaking Skills, AI-Driven Assessment, Digital Literacy, Speaking Skills, AI-Driven Assessment, Digital Literacy

Abstract

This study investigates the implementation and effectiveness of an OpenAI Whisper-based automatic speech recognition (ASR) system for evaluating and improving the speaking skills of Indonesian EFL students. Employing a mixed-methods, one-group pretest-posttest design, the research involved 40 undergraduate students from Universitas Muhammadiyah Gresik. Quantitative data were collected through standardized speaking tests rated by both the Whisper system and expert human assessors, focusing on fluency, pronunciation, and coherence. Qualitative insights were obtained from classroom observations and in-depth interviews with students and lecturers, exploring user experiences and contextual factors influencing system performance. The results demonstrate that the Whisper-based assessment system achieved high inter-rater reliability with human experts (Cohen’s Kappa = 0.81; ICC = 0.87) and yielded significant improvements in learners’ speaking skills across all assessed dimensions, with the most notable gains in pronunciation. The system’s immediate, actionable feedback fostered greater learner engagement and self-directed improvement. However, the study also identified critical contextual factors—such as technological infrastructure, digital literacy, and classroom environment—that influenced the system’s effectiveness and reliability. These findings highlight the need for robust infrastructure, comprehensive teacher training, and equitable access to technology to maximize the benefits of AI-driven assessment. This research advances both theory and practice by validating a multidimensional, context-adaptive framework for AI-based speaking evaluation and providing practical guidelines for integrating advanced ASR technology into EFL curricula. The study’s implications inform educators, policymakers, and technologists seeking scalable, objective, and equitable solutions for language assessment in Indonesia and similar educational contexts.

Downloads

Download data is not yet available.

References

Alfredo, R., Echeverria, V., Jin, Y., Yan, L., Swiecki, Z., Gašević, D., & Martinez-Maldonado, R. (2024). Human-centred learning analytics and AI in education: A systematic literature review. In Computers and Education: Artificial Intelligence (Vol. 6). https://doi.org/10.1016/j.caeai.2024.100215
Alharbi, S., Alrazgan, M., Alrashed, A., Alnomasi, T., Almojel, R., Alharbi, R., Alharbi, S., Alturki, S., Alshehri, F., & Almojil, M. (2021). Automatic Speech Recognition: Systematic Literature Review. In IEEE Access (Vol. 9). https://doi.org/10.1109/ACCESS.2021.3112535
Arifin, S., Arifani, Y., Maruf, N., & Helingo, A. (2022). A Case Study of EFL Teacher Scaffolding of an ASD Learner’s Shared Reading with a Storybook App. Journal of Asia TEFL, 19(4). https://doi.org/10.18823/asiatefl.2022.19.4.6.1234
Bashori, M., van Hout, R., Strik, H., & Cucchiarini, C. (2024). I Can Speak: improving English pronunciation through automatic speech recognition-based language learning systems. Innovation in Language Learning and Teaching. https://doi.org/10.1080/17501229.2024.2315101
Bhardwaj, V., Kukreja, V., Othman, M. T. Ben, Belkhier, Y., Bajaj, M., Goud, B. S., Rehman, A. U., Shafiq, M., & Hamam, H. (2022). Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review. In Applied Sciences (Switzerland) (Vol. 12, Issue 9). https://doi.org/10.3390/app12094419
Cengiz, B. C. (2023). Computer-Assisted Pronunciation Teaching: An Analysis of Empirical Research. Participatory Educational Research, 10(3). https://doi.org/10.17275/per.23.45.10.3
Chen, W. (2020). The Journal of Asia TEFL ASR for EFL Pronunciation Practice : Segmental Development. 17(3), 824–840.
Cicchetti, D. V. (1994). Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology. Psychological Assessment, 6(4). https://doi.org/10.1037/1040-3590.6.4.284
Coleman, H., Ahmad, N. F., Hadisantosa, N., Kuchah, K., Lamb, M., & Waskita, D. (2024). Common sense and resistance: EMI policy and practice in Indonesian universities. Current Issues in Language Planning, 25(1). https://doi.org/10.1080/14664208.2023.2205792
de Almeida, J. F., Gottardi, W., & Tumolo, C. H. S. (2022). Automatic Speech Recognition and Text-to-Speech Technologies for L2 Pronunciation Improvement: Reflections on their Affordances. Texto Livre, 15, 1–15. https://doi.org/10.35699/1983-3652.2022.36736
Dhouib, A., Othman, A., El Ghoul, O., Khribi, M. K., & Al Sinani, A. (2022). Arabic Automatic Speech Recognition: A Systematic Literature Review. In Applied Sciences (Switzerland) (Vol. 12, Issue 17). https://doi.org/10.3390/app12178898
Ding, S., Zhao, G., & Gutierrez-Osuna, R. (2022). Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning. Computer Speech and Language, 72. https://doi.org/10.1016/j.csl.2021.101302
Fajrina, D., Everatt, J., & Sadeghi, A. (2021). Writing Strategies Used by Indonesian EFL Students with Different English Proficiency. Language Teaching Research Quarterly, 21. https://doi.org/10.32038/ltrq.2021.21.01
Fendji, J. L. K. E., Tala, D. C. M., Yenke, B. O., & Atemkeng, M. (2022). Automatic Speech Recognition Using Limited Vocabulary: A Survey. In Applied Artificial Intelligence (Vol. 36, Issue 1). https://doi.org/10.1080/08839514.2022.2095039
Feng, S., Halpern, B. M., Kudina, O., & Scharenborg, O. (2024). Towards inclusive automatic speech recognition. Computer Speech and Language, 84. https://doi.org/10.1016/j.csl.2023.101567
Geva, E. (2017). Second-Language Oral Proficiency and Second-Language Literacy. In Developing Literacy in Second-Language Learners: Report of the National Literacy Panel on Language Minority Children and Youth. https://doi.org/10.4324/9781315094922-12
Henze, E. E. C., Aspiranti, K. A., & Reynolds, J. L. (2024). Comparing Traditional and Virtual Assessment of Oral Reading Fluency: A Preliminary Investigation. Contemporary School Psychology, 28(3). https://doi.org/10.1007/s40688-024-00492-w
Irham, Huda, M., Sari, R., & Rofiq, Z. (2022). ELF and multilingual justice in English language teaching practices: voices from Indonesian English lecturers. Asian Englishes, 24(3). https://doi.org/10.1080/13488678.2021.1949779
Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2). https://doi.org/10.1080/15434303.2013.769545
Jiang, M. Y. C., Jong, M. S. Y., Lau, W. W. F., Chai, C. S., & Wu, N. (2021). Using automatic speech recognition technology to enhance EFL learners’ oral language complexity in a flipped classroom. Australasian Journal of Educational Technology, 37(2), 110–131. https://doi.org/10.14742/AJET.6798
Jiang, M. Y. C., Jong, M. S. Y., Lau, W. W. F., Chai, C. S., & Wu, N. (2023). Effects of Automatic Speech Recognition Technology on EFL Learners’ Willingness to Communicate and Interactional Features. Educational Technology and Society, 26(3), 37–52. https://doi.org/10.30191/ETS.202307_26(3).0004
Kahng, J. (2023). Exploring Individual Differences in Rating Second Language Speech: Rater’s Language Aptitude, Major, Accent Familiarity, and Attitudes. TESOL Quarterly, 57(4). https://doi.org/10.1002/tesq.3217
Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1). https://doi.org/10.2307/2529310
Liao, J., Eskimez, S., Lu, L., Shi, Y., Gong, M., Shou, L., Qu, H., & Zeng, M. (2023). Improving Readability for Automatic Speech Recognition Transcription. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5). https://doi.org/10.1145/3557894
Malik, M., Malik, M. K., Mehmood, K., & Makhdoom, I. (2021). Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6). https://doi.org/10.1007/s11042-020-10073-7
Maruf, Z., Sandra Rahmawati, A., Siswantara, E., & Murwantono, D. (2020). Long walk to quality improvement: Investigating factors causing low English proficiency among Indonesian EFL students. International Journal of Scientific & Technology Research, 9(03).
McGuire, M. (2025). Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling. http://arxiv.org/abs/2503.06924
Muhonen, R. (2021). Riikka Muhonen USING ASR TECHNOLOGY IN ENGLISH PRONUNCIATION TEACHING Finnish teachers’ and pupils’ first impressions. April.
Munandar, I., & Shaumiwaty, S. (2023). Exploring Indonesian Lecturers’ Perceptions and Practices on English Language Assessment. Vision: Journal for Language and Foreign Language Learning, 12(1). https://doi.org/10.21580/vjv12i217137
Prasandha, D., & Aniq, L. N. (2023). Shifting Language Ideology and Teaching Practice in Multilingual Class: Voices of Indonesian Lecturers in CLIL. JEELS (Journal of English Education and Linguistics Studies), 10(1). https://doi.org/10.30762/jeels.v10i1.434
Saleh, A. J., & Gilakjani, A. P. (2021). Investigating the impact of computer-assisted pronunciation teaching (CAPT) on improving intermediate EFL learners’ pronunciation ability. Education and Information Technologies, 26(1). https://doi.org/10.1007/s10639-020-10275-4
Santhanavijayan, A., Naresh Kumar, D., & Deepak, G. (2021). A semantic-aware strategy for automatic speech recognition incorporating deep learning models. Advances in Intelligent Systems and Computing, 1171. https://doi.org/10.1007/978-981-15-5400-1_25
Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology, 14(August). https://doi.org/10.3389/fpsyg.2023.1210187
Tejedor-Garcia, C., Escudero-Mancebo, D., Camara-Arenas, E., Gonzalez-Ferreras, C., & Cardenoso-Payo, V. (2020). Assessing Pronunciation Improvement in Students of English Using a Controlled Computer-Assisted Pronunciation Tool. IEEE Transactions on Learning Technologies, 13(2). https://doi.org/10.1109/TLT.2020.2980261
Thi-Nhu Ngo, T., Hao-Jan Chen, H., & Kuo-Wei Lai, K. (2024). The effectiveness of automatic speech recognition in ESL/EFL pronunciation: A meta-analysis. ReCALL, 36(1), 4–21. https://doi.org/10.1017/s0958344023000113
Winke, P., & Gass, S. (2013). The influence of second language experience and accent familiarity on oral proficiency rating: A qualitative investigation. TESOL Quarterly, 47(4). https://doi.org/10.1002/tesq.73
Wu, X., Zhang, Y., & Zhu, W. (2023). Study on an English Speaking Practice System based on Automatic Speech Recognition Technology. Journal of Education and Educational Research, 4(1). https://doi.org/10.54097/jeer.v4i1.10273
Yuan, Y., & Liu, X. (2020). An empirical study of the effect of asr-supported english reading aloud practices on pronunciation accuracy. Communications in Computer and Information Science, 1302. https://doi.org/10.1007/978-981-33-4594-2_7

Implementing and evaluating OpenAI whisper for accurate speaking assessment and skill development in Indonesian EFL classroom

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

menu

sinta

citation

associate

lassosiate

issn

statistics

contact

Information