P-CHAT: Formative Self-Assessment using Group Oral Discussion Tasks

Branden Carl Kirchmeyer, Center for Education and Innovation, Sojo University

The group oral discussion task (also known as group discussion test and group oral test) is a popular time-efficient and cost-effective solution for evaluating language learners’ speaking abilities, as it prompts groups of learners to discuss a topic in their target language while a rater observes and evaluates individual speakers simultaneously (Shohamy et al., 1986). The task has been noted as a means to detect changes in speaking proficiency over time (Leaper & Brawn, 2019), and for its ability to generate positive washback in a communicative curriculum (Bonk & Ockey, 2003). Though implementation procedures and utility vary by context, the outcome for learners is often similar: a score (ideally rubric-based) and some feedback (ideally forward-focused). But what if, instead of an evaluation, learners were immediately provided with quantitative data describing their own individual performances? And what if teachers could administer the discussion task to an entire class simultaneously, evaluate individuals later, and track their progress across similar activities over time? Finally, what if researchers could easily collect a range of data types regarding such a task?

In this article, I introduce P-CHAT, an online tool designed to provide lower-proficiency (CEFR A1-B1) Japanese learners of English with the means to conduct meaningful formative self-assessment of their own speaking performances on a group oral discussion task. Furthermore, it allows teachers to evaluate individuals asynchronously and monitor their progress over time while also serving as a research instrument capable of collecting multiple types of data relating to L2 English conversations. Awarded “Best Moodle Innovation of 2020” by the Moodle Association of Japan, P-CHAT is described here in terms of the affordances it provides learners, teachers, and researchers.


What is P-CHAT?

Technically speaking, P-CHAT is a plugin (i.e., supplemental programming which adds specific features and functions to existing software) for the Moodle learning management system. It was funded by a JSPS Kaken Grant (19K13309) and programmed by Poodll Co. Ltd., a certified developer of Moodle-based plugins for language teaching and learning. Pedagogically speaking, P-CHAT is a communicative classroom activity wherein learners are individually guided through a four-step sequence of tasks that center around a group discussion. Though intended for use in face-to-face environments, it has been implemented successfully in tandem with video conferencing technology.

Using the P-CHAT interface on personal or classroom devices, learners first set the conditions for their discussion by confirming their partners’ names, the discussion topic, and the duration of the discussion. As shown in Figure 1, they may also type a personal list of target words or phrases that they can refer to during the conversation. In the second step, learners make individual audio recordings of their own contributions to an unscripted group discussion, conducted in groups of two or three. Figure 2 illustrates the recording interface, in which teachers can also choose to display an image or video to prompt or scaffold the discussion. In the third step, learners listen to their audio recordings and individually transcribe only their own speech, using the transcription interface to divide it into conversational turns (see Figure 3).

Finally, in the fourth step, learners are presented with seven numerical figures that describe their contribution to the discussion in quantitative terms: the total number of words they spoke; the total number of turns they took; their average turn length and their longest turn length (both represented as a number of words spoken); the number of questions they asked; the number of pre-selected target words or phrases they spoke; and an “AI Accuracy” percentage, which is calculated as the amount of overlap between the speaker’s transcription and a separate transcription generated with automatic speech recognition (ASR) technology (specifically, Amazon AWS). Alongside these descriptive statistics, an interactive version of their finished transcription is displayed with ASR discrepancies boldfaced. Clicking on a boldfaced word in this window triggers an automatic playback of that section of audio, and a pop-up window displaying what was heard. In this final step, learners refer to this automatic feedback to answer three reflective prompts set by the teacher, an example of which can be seen at the bottom of Figure 4.


Affordances for Learners

The main intended pedagogical affordance of P-CHAT is its capacity to help lower-proficiency learners conduct actionable formative self-assessment. The provision of objective and easily understandable figures allows learners to make concrete statements about their performances (e.g., “I spoke 72 words and only asked one question.”) and set tangible goals for subsequent attempts (e.g., “Next time I will speak at least 100 words and ask at least two questions.”). P-CHAT also offers learners the ability to track and compare their progress over time with straightforward line charts that plot their metrics across P-CHAT attempts. As shown in Figure 5, learners who engage in this activity cycle are rewarded with an ever-increasing stat line and objective proof that they are able to contribute more to an English language discussion with their peers through continued and dedicated practice.

P-CHAT also leverages task sequencing to the advantage of the student through positive washback. Learners may spend weeks engaging in communicative tasks relating to the topic, learning and reviewing specific conversational strategies, practicing conversations with partners, generating target wordlists, and producing language that can be reused during discussions using P-CHAT. Despite the relatively low stakes of the task, audio recordings can encourage active participation and promote accountability. In transcribing themselves, learners may attend to a variety of linguistic features including phonetic production, word selection, intonation, and spelling. Finally, reflective prompts offer opportunities to not only set goals, but to engage in form-focused activities such as the identification and rectification of grammatical or pragmatic errors.


Affordances for Teachers

P-CHAT affords teachers with the means to conduct higher-stakes assessments, such as the conventional group oral test it was based on. Conceding rater reliability as a valid concern, Van Moere (2006) also concluded that the group oral test is “useful for making general inferences about a candidate’s ability to converse in a foreign language” (p. 436). Figure 6 shows the P-CHAT grading interface which allows teachers to simultaneously evaluate all individuals of a group asynchronously. P-CHAT sessions done face-to-face produce individual audio recordings that were made in proximity, so teachers can choose to listen to one of the recordings and follow along with the three transcriptions, using an interactive and customizable rubric (toggled using the “Grade entry” button) to score each learner. Teachers looking to avoid scheduling challenges inherent with deploying performance-based speaking assessments can administer P-CHAT in a single session and save scoring for a more convenient time.

Teachers will also find the progress reports (see Figure 6) helpful as portfolio submissions, which can be referenced during consultations with individual learners. In addition to the individual progress reports, teachers also have access to similar whole-class progress reports which can help identify larger scale trends, such as the accessibility of a given discussion topic (represented by dips in overall production) or the performance trends of different cohorts.


Affordances for Researchers

Researchers looking to collect and analyze large amounts of data will be pleased to find exportable CSV reports of individual P-CHAT attempts including audio recordings, full student- and machine-generated transcriptions, the seven descriptive metrics, scores, and written responses to reflective questions. Several on-going research projects have made use of P-CHAT as an instrument and are investigating the accuracy of student-generated transcriptions, patterns and correlations between reported metrics and rubric-based rater scores, and learner and teacher perceptions of the tool as a language learning asset. Teachers and researchers interested in using P-CHAT to conduct and participate in research activities are invited to use P-CHAT at no cost on a dedicated Moodle with consultation from the author.



This article has introduced an award-winning new tool for promoting learner-centered formative self-assessment of L2 English discussions. Described as a modern iteration of a conventional group discussion task, P-CHAT functions as a guided sequence of computer mediated language learning activities and a range of affordances for learners, teachers, and researchers.



I would like to thank Justin Hunt of Poodll Co. Ltd for his enthusiasm and dedication to this project, and my colleagues for their constructive suggestions during the development of this resource.



Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89–110. doi:10.1191/0265532203lt245oa

Leaper, D. A., & Brawn, J. R. (2019). Detecting development of speaking proficiency with a group oral test: A quantitative analysis. Language Testing, 36(2), 181–206. https://doi.org/10.1177/0265532218779626

Shohamy, E., Reves, T., & Bejarano, Y. (1986). Introducing a new comprehensive test of oral proficiency. ELT Journal, 40(3), 212–220. doi:10.1093/elt/40.3.212

Van Moere, A. (2006). Validity evidence in university a group oral test. Language Testing, 23(4), 411–440. https://doi.org/10.1191/0265532206lt336oa