Speech to text software in the evaluation stage

Writer(s): 
James W. Henry, III

 

One of the aims of this column is to suggest new uses for existing tools and technologies. Oral transcription software is used for speech analysis research but has great potential in the classroom. James W. Henry, III at the Research Institute of English Language Education in Kobe outlines one possible practical application for teachers: student-level evaluation at the beginning of a course. However, perhaps we could take it one step further: the software itself could contain an algorithm that generates an automatic score. Any takers on this? Programmers, send your ideas to James, but send them here first!

Speech to text software in the evaluation stage

James W. Henry, III

Speech-to-text software like Nuance’s Dragon Dictation is designed to capture and sort through utterances, transcribing these sounds into written text. If the sounds are accurate, the software algorithms transcribe what the speaker actually said rather than merely what they intended to say. Using this technology as placement testing in oral-based courses could prove highly beneficial. In conjunction with other assessment procedures, a pronunciation/enunciation test based on speech-to-text software could provide objective data on learners’ strengths and weaknesses. This data will thus stream students into oral performance levels for correct placement.

To best assess pronunciation, test sentences should have the greatest variety of vowel and consonant sounds, be composed of subject matter that is not overly technical, and ideally be customized to the learners’ phonological challenges based on their L1 or other relevant criteria. In a multicultural setting, sentences might best be preselected for individuals based on their L1. 

Meaningful sentences of around 10 words would probably be ideal for both speakers and proctors and should work well given the design of the software. Longer sentences with compound ideas might create unnecessary problems with punctuation and shorter sentences would lack the level of challenge and phonological variety to be beneficial.

Rather than just one or two sentences, learners should have to utter five to ten sentences of increasing difficulty and score a certain percentage of overall accuracy (i.e., speech recognition) to pass the cut-off. Their recorded audio can be sent to a text file where they and their instructor can visually assess their strengths and weaknesses based on the comparative accuracy of their utterances. The recording would be done on the spot but the final assessment would be done later when it could be combined with other assessment tools to give an overall score or analysis.