But it may be much worse than that. Check out the abstract below for a seminar to be presented in Penn Statistics next week by Philip Stark, a Berkeley statistician (and Associate Dean of the Division of Mathematical and Physical Sciences). Paper here.
TEACHING EVALUATIONS (MOSTLY) DO NOT MEASURE TEACHING EFFECTIVENESS
PHILIP STARK - UNIVERSITY OF CALIFORNIA, BERKELEY
Joint work with Anne Boring (SciencesPo) and Kellie Ottoboni (UC Berkeley)
Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:
· SET are biased against female instructors by an amount that is large and statistically significant
· the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded
· the bias varies by discipline and by student gender, among other things
· it is not possible to adjust for the bias, because it depends on so many factors
· SET are more sensitive to students' gender bias and grade expectations than they are to teaching effectiveness
· gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.
These findings are based on permutation tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.