Teacher Evaluation Policy Series | Deweaponizing teacher evaluation: Using teacher evaluation for growth (Part One) by Sarah L. Hairston in collaboration with Dr. Thomas W. Hairston

Photo by Helloquence on Unsplash

This is the first in our series on teacher evaluation policy. Teacher evaluation has been a major feature of education policy reform in the United States during the past ten years. The triumvirate of Race to the Top (2009) funding competition, the No Child Left Behind waivers (2011), and the passage of the Every Student Succeeds Act (2015) made teacher evaluation central to education reform over the past decade. This series will consider emerging areas of growth in teacher evaluation policy. In the two-part discussion “Deweaponizing teacher evaluation: Using teacher evaluation for growth,” Sarah Hairston and a co-developer of the largest teacher evaluation system in the state of Missouri will look at the snares of teacher evaluations (part one) and discuss how to construct teacher evaluations towards a growth model (part two). Seyma Dagistan, in “Student Voice in the Teacher Evaluation Process – A lost opportunity” considers the case of Massachusetts in the attempt to incorporate student voice in teacher evaluation. In “Responsible Teachers: Teacher Evaluation in Finland,” Hansol Woo looks into the Finnish model of teacher evaluation where teachers evaluate their own progress. These articles will consider why we evaluate teachers and who is best positioned to do so. We invite you to join the discussion be leaving a comment or submitting an essay of your own.

Depending on your association with public education, the term “teacher evaluation” can elicit a variety of reactions. For administrators and teachers, evaluations take on various meanings. For some, evaluations are meaningless compliance to inaccurate reports that influence one’s livelihood. On the other end of the spectrum, evaluations are a reflective coaching process key to professional and school growth. To assess the culture around teacher evaluation I have asked Dr. Thomas W. Hairston (Tom), co-developer of the largest teacher evaluation system in the state of Missouri and who also happens to be my spouse, to join me in a conversation. Dr. Hairston is the Director of Research and Innovation and co-developer of the Network for Educator Effectiveness (NEE) in the Assessment Resource Center at the University of Missouri-Columbia. The network serves 280 districts in 986 buildings reaching over 35,000 teachers and 370,000 students (NEE, 2019). In this two-part discussion, we will look at the snares of teacher evaluations in part one and then bring readers out on the optimistic side with evaluations as a growth model in part two. Before delving into my conversation with Tom, I will layout the federal policy context regarding teacher evaluations.

Federal Policy Context

Teacher evaluation systems became federally mandated with No Child Left Behind (NCLB), signed into law in 2002. The act required states to set up teacher evaluation systems, known as an Annual Professional Performance Review (APPR) which played a significant factor in both employment decisions and professional development at the local level (Forman & Markson, 2015). A significant indicator in the APPR relied on student scores from required standardized assessments in reading and math. Under NCLB states developed content standards as well as targets for adequate yearly progress (AYP) based on results of standardized assessments among other indicators. Schools not meeting AYP were subject to increasingly severe consequences. 

Race to the Top (RTTT, 2009) strengthened the connection between standardized assessments and teacher evaluations by tying evaluations to funding. The key for states winning RTTT funding through teacher evaluations was to place an emphasis on student growth measured through standardized assessments with extra points being awarded to those using the federally released Common Core Standards (CCS, 2010), a universal standard of what a student should know in reading and math (Aguilar & Richerme, 2014). National teacher unions, including both the National Education Association and the American Federation of Teachers, eventually endorsed the RTTT specifically due to the substantial funding component (Aguilar & Richerme, 2014).

Every Student Succeeds Act (ESSA, 2015) addressed some of the “one-size-fits-all” centralized accountability measures established through NCLB by giving much of the authority for accountability and decision-making back to the states and local school districts (National Association of Elementary School Principals, 2016). Teacher evaluations are now in the hands of states.

Missteps of Teacher Evaluation

Over time teacher evaluations have been utilized by a variety of stakeholders in a multitude of ways. Politicians have used them to prop up punitive accountability systems and low teacher pay. States have used them to emphasize standardized test scores and advocate for school choice. And school districts have used them to make personnel decisions, support professional development choices, and meet mandated requirements. So, I started by asking Tom to clarify how he understood the purpose of evaluation systems. “I want to begin by going to the antithetical: most current teacher evaluation systems are not set up to evaluate for growth,” (T. Hairston, personal communication, September 6, 2019).

By and large, Tom views most teacher evaluation practices as focused on the collection and reporting of data over that of improvement.

Focusing on improvement and focusing on the collection and report of data are two different processes – one is growth and one is outcome. And so, when policies dictate that school districts do teacher evaluation, the outcome is the focus – and by design in most of those policies, the outcome is quantitative and based on an end result, not on the process, the story, or the many factors that play into teacher evaluation.
(personal communication, September 6, 2019)

Tom laid out three reasons for the lack of a growth model in teacher evaluations as the following: 1) evaluations do not focus on building accuracy within the evaluators; 2) evaluations over-rely on student test scores; and 3) evaluations use subjective labels that create a binary between “good” teacher versus “bad” teacher (personal communication, September 6, 2019).

The hegemonic discourse around teacher evaluation, from a culmination of media and policy influences, creates a binary of “good” versus “bad” teacher.

Inaccuracy in Evaluating

Currently, there is not sufficient research to support the most widely used evaluation measure, principals conducting classroom observations, as being accurate or reliable. Additionally, evaluating human behavior is psychometric in nature, requiring a specific skillset. Without the required skillset classroom observations are more likely to be based on biases, subjectivities, and feelings. Tom elucidated on biases further:

There are of course the concerns about gender and racial mismatch – and we do see those biases in our own data. But there are additional biases like subject area and experience mismatch. And yet, teacher evaluation systems are built on the idea of their supervisor being the one doing the evaluation, even as biases cloud what is supposed to be a fair system. (personal communication, September 6, 2019)

Reliance on Student Test Scores

State and federal policies have tried to overemphasize the connection between student achievement and teacher effectiveness. Educational research has alluded to the inaccuracy of teacher evaluation systems in measuring quality with little benefit to education as a whole (Darling-Hammond, Amrein-Beardsley, & Rothstein, 2012; Marzano, 2012). Additionally, evidence does not support the notion that teacher effectiveness can be reflected through achievement gains of their students (Darling-Hammond, et al., 2012; Marzano, 2012). This false association seriously hampers and misconstrues data beyond what it should ever be used for, but yet, many teachers find their livelihoods are enmeshed in unfounded practices and policies. Depending on state mandates, districts may have choice in whether or not standardized scores are attached to teacher evaluations. Some states, such as Missouri where NEE was developed, gives flexibility and localized control providing an opportunity for a growth model approach.

Subjective Labels

Many teacher evaluation policies and systems attach artificial and subjective language to the teachers they assess. The National Center on Teacher Quality released a report at the beginning of this year on what type of labels were used in teacher evaluation systems, including: ineffective, unsatisfactory, unacceptable, accomplished, skilled, effective, proficient, advanced, distinguished, exemplary, accomplished, superior, outstanding (Nittler, 2019). So, I asked Tom what happens in conversations when we use those types of terms:

As expected, the derogatory ones produce defensive walls. For the rest, a sense of accomplishment whether it is warranted or not. But ultimately, the focus becomes on the labels and not on the actual teaching practices and the effect they were having on student learning. (personal communication, September 6, 2019)

From a surveillance and marketing perspective, labels work. The public has been conditioned to understand teachers as failing or succeeding. The hegemonic discourse around teacher evaluation, from a culmination of media and policy influences, creates a binary of “good” versus “bad” teacher. This binary produces easily consumable data but does little for growth. As Tom points out:

Growth is not possible if your philosophy as a leader, as a school, a district, a state, a nation is still looking to create that binary. It makes any idea of true coaching impossible. The easiest way to stop that from happening is to keep those subjective and derogatory labels out of the system entirely. Those words are weapons, not supports. We have to go in another direction. (personal communication, September 6, 2019)


And with that, we circled back around to the original question I posed to Tom, what is the purpose of teacher evaluations?

At NEE, we knew we had the opportunity to develop a system that can move past quantitative outcomes by focusing teacher evaluations on the effectiveness of the teaching practices being implemented in the school. With a repositioning of purpose comes the possibility of doing teacher evaluations in a way that is truly meaningful to the teacher, the school, and the learning environment. (T. Hairston, personal communication, September 6, 2019)


Join us for part two of this discussion when Tom and I converse about what it means to localize teacher evaluations toward a growth model – one that benefits the students, teacher, and district as a whole.

Sarah L. Hairston is a Ph.D. student in Educational Leadership and Policy Analysis at the University of Missouri-Columbia. Previously, she taught theatre and public speaking for 16 years and holds an Educational Specialist degree in Educational Leadership and a Masters in Curriculum and Instruction. Her current research interests include educational policy, structural violence in education, student voice, and educational activism.

References

Aguilar, C.E., & Richerme, L.K. (2014). What is everyone saying about teacher evaluation?

Framing the intended and inadvertent causes and consequences of Race to the Top. Arts Education Policy Review, 115, 110-120. DOI: 10.1080/10632913.2014.947908

Darling-Hammond, L., Amrein-Beardsley, E. H., & Rothstein, J. (2012). Evaluating teacher evaluation. Kappan, 93(6), 8-15. DOI: 10.1177/003172171209300603

Forman, K., & Markson, C. (2015). Is “effective” the new “ineffective”? A crisis with the New York state teacher evaluation system. Journal for leadership and Instruction, 14(2), 5-11.

Marzano, R. J. (2012). The two purposes of teacher evaluation. Educational Leadership, 14-19. Retrieved from http://merainc.org/wp-content/uploads/2013/11/Boogren-The-Two-Purposes-of-Teacher-Evaluation.pdf

National Association of Elementary School Principals (NASEP). (2016, May). ESSA 101: New Accountability Measures. Communicator, 39(9). Retrieved from http://www.naesp.org/communicator-may-2016/essa-101-new-accountability-measures

Network for Educator Effectiveness (NEE). (2019). Home. Retrieved from
https://neeadvantage.com/.

Nittler, K. (2019). Words matter: the language of evaluation ratings. Retrieved from
https://www.nctq.org/blog/Words-matter:-the-language-of-evaluation-ratings