If, as many suggest, ChatGPT-like tools will be central to many work practices in the future, then we need to think about how to design course elements that help today’s students and tomorrow’s professionals learn how to use these tools properly. A correct use will not involve humans copying the output of these tools blindly, but rather them using it as a means to enhance their own performance. Hence the simple question: can students properly evaluate and where necessary correct the responses provided by ChatGPT, to improve their grade in an assignment, for instance? Motivated by such considerations, I designed the following assignment in a first-year Masters level course at HEC Paris.
Answering vs. correcting
Students were randomly assigned two cases, and were asked the same question about each. For the first case, students just had to provide the answer, in the traditional way, ‘from scratch’. For the second case, they were provided with an answer to the question: they were asked whether the answer was fully correct, and told to correct or add as required to make it ‘perfect’. They were told that each provided answer had been either produced by ChatGPT or by another student. In reality, in over 60% of cases, the answer had come from ChatGPT.
Whilst the former, answer task is arguably closer to current work practices, the second correct task may correspond more closely to many jobs in the future, if AI tools become as ubiquitous as many predict.
However, the two tasks asked for the same thing – a full reply to the question concerning the case – and the same grading scheme was used for both. The marks for both tasks counted in equal amounts for the course grade, so students were motivated to make the same amount of effort on both.
On this assignment, students do better without the help of ChatGPT
Nevertheless, the students, on average, got a 28% lower grade on the correct task than on the answer task. For a given case, a student correcting an answer provided by ChatGPT got, on average, 28 marks out of 100 less than a student answering the question by themselves. Students, it turns out, did considerably worse when they were given a ChatGPT aid and asked to correct it than if they were asked to provide an answer from scratch.
Students did considerably worse when they were given a ChatGPT aid and asked to correct it than if they were asked to provide an answer from scratch.
A behavioral bias?
Perhaps these results can be explained by postulating high student trust in ChatGPT’s answers. However, students were explicitly primed to be wary of the responses provided: they had been informed that ChatGPT had been tested on a previous, similar assignment and did pretty badly. And previous research suggests that such information typically undermines trust in algorithms. Moreover, no significant difference was found between their grades on the correct task when they thought they were correcting ChatGPT or another student.
Our classroom experiment suggests that the professionals of tomorrow may do a considerably worse job when aided by AI than when working alone.
A perhaps more promising explanation is in terms of the Confirmation Bias – the tendency to insufficiently collect and interpret information contradicting a given belief or position. Inspection of answers shows a clear tendency among many students to provide small modifications to the provided responses, even where larger corrections were in order. Moreover, there is evidence that this bias tends to persist even when people are warned that the base belief has little claim to being correct 1,2. Could the tendency to display insufficient criticism with respect to certain positions – a bias that is taught in business schools worldwide and HEC in particular – be behind potential misuses of ChatGPT and its alternatives?
Chatbots have been touted as having a future role in aiding humans in a range of areas; but this assumes that humans will be capable of using them properly. One important task for humans in such interactions will be to evaluate, and where necessary correct, the output of their chatbots.
Our classroom experiment suggests that the professionals of tomorrow may do a considerably worse job when aided than when working alone – perhaps due to behavioral biases that have been long understood, perhaps due to some that remain to be further explored.
One of the skills of the future, that we will need to learn to teach today, is how to ensure that ChatGPT actually help.
If anything, this argues for more, rather than less, chatbots in the classroom. One of the skills of the future, that we will need to learn to teach today, is how to ensure that they actually help.
References:
1. Kahneman, D. Thinking, fast and slow. (Macmillan, 2011).
2. Nickerson, R. S. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology 2, 175–220 (1998).