Bagtown Economics: Student evaluations can often be misleading

I’m writing this week about an often misunderstood issue: that of using student evaluations to judge the performance of professors. The university administration has proposed to make these student evaluations a mandatory part of annual performance reviews of faculty, and furthermore, the Mount Allison Students’ Union (MASU) has officially supported this proposal. In fact, MASU has boldly claimed that “student evaluations of teaching are a widely recognized metric and tool for ensuring that course material and instruction is of high quality.” However, there is significant evidence to refute that claim. I present some of that evidence here because it will better inform the debate, and it is actually quite amusing.

Let’s start with the most cynically entertaining. There is some evidence—albeit conflicting—that gender and attractiveness affect student evaluations. Specifically, that female instructors receive lower evaluation scores from male students, and that unattractive professors are evaluated more poorly, all else equal. Not all research has found similar effects, but the issue is certainly not settled. Given this, it may be awkward to use students’ evaluations in the performance review of an unattractive female professor who teaches classes to predominantly male students.

Furthermore, I imagine that professors stuck teaching a required course—say, statistics—are evaluated more poorly. It can be hard for students to separate their dislike of a topic from their perceptions of the professor’s performance.

However, for argument’s sake, let’s discount the previous two arguments—that student evaluations suffer from a number of significant biases. What about the argument that students’ are just poor judges of professor quality? While I do not fully endorse this argument, it is not unfounded. I present two pieces of evidence:

Researchers at the University of Southern California and Southern Illinois University hired and coached a confident and articulate actor to present a university-style lecture to eleven trained psychologists, physiatrists, and social work educators. The actor was presented as an expert on the application of mathematics to human behaviour, with an impressive imaginary resume. The lecture was specifically designed to include “excessive use of double-talk, neologisms, non sequiturs, and contradictory statements to be interspersed with humour and meaningless reference to unrelated topics.” At the end of the lecture, the eleven ‘students’ completed an evaluation of the actor and the content of the lecture. In comical fashion, the participants reported a high degree of satisfaction with the lecture, despite it being completely fictitious and illogical. A paid actor had duped eleven trained professionals. While Mount Allison likely never hires fraudulent professors, it is certainly plausible that a room full of eighteen-year-old students can be impressed by charismatic yet unsubstantive lectures.

The Journal of Political Economy recently published a study that followed the performance of 12,597 students over three semesters of calculus at the United States Air Force Academy. They tracked the performance of students across all three levels of calculus (1, 2, and 3), and similarly followed the student evaluations of professors for all course sections. Their results were twofold: First, high-performing students were associated with highly evaluated professors—potentially implying the students reward good teaching. But here is the concerning second finding: Students taught by a positively-evaluated professor in the original course performed worse in follow-up courses. If you studied Calculus 1 with a positively evaluated professor, you would perform more poorly (on-average) in Calculus 2. Think about that: students rewarded professors that failed to prepare them well for further coursework. This study is widely respected, and was conducted in an environment that eliminated typical statistical issues found in other comparable studies.

Student evaluations are not widely recognized as a metric and tool for professor performance, as the MASU has claimed. We ask our students “how would you describe the effectiveness of the instructor,” on a scale of one to five, yet provide no definition or guidance as to what “effective” means. Would students be content to have their performance in a course measured by one test with eighteen multiple choice questions and two open-ended questions? The real discussion should focus on how to better solicit valuable student feedback. I hope that other contributors to The Argosy will accept the challenge, and start this discussion.

– See more at:

Leave a Reply

Your email address will not be published.

Related Articles