Classroom Observation
The systematic watching and recording of teaching and learning. Wajnryb (1992) opens her practitioner manual by calling observation "a multi-faceted tool for learning"; Allwright (1988) treats it as the empirical foundation of language teaching as a profession. Across the literature, observation serves three distinct purposes — training and development, assessment, and research (Malderez 2003) — and the methods, ethics, and politics of each differ enough that conflating them is the single largest source of trouble in the practice (Cosh 1999; O'Leary 2014).
Origins
Systematic observation entered mainstream education research in the 1960s with quantitative coding schemes such as Flanders' Interaction Analysis (Flanders 1970), which classified verbal classroom behaviour into a fixed set of teacher and pupil categories and tallied their frequency. Two ELT-specific schemes followed. Fanselow's FOCUS — Foci for Observing Communications Used in Settings (Fanselow 1977) — extended the coding tradition to language classrooms, distinguishing the medium, use, content, and moves of communication with the explicit aim of replacing impressionistic talk about lessons with operationally defined description. Spada and Fröhlich's COLT — Communicative Orientation of Language Teaching (Spada and Fröhlich 1995, building on the original 1984 Allen-Fröhlich-Spada scheme) — was designed to operationalise communicative-classroom features for research, with categories such as participant organisation, content control, and student modality.
The interpretive turn arrived through discourse analysis. Sinclair and Coulthard (1975), analysing teacher–pupil exchanges in British primary classrooms, identified the now-canonical exchange structure of initiation by the teacher, response by the pupil, follow-up or evaluation by the teacher (the IRF or IRE pattern). Mehan (1979) confirmed and extended the pattern through a year of videotaped observation in an inner-city American elementary classroom. Allwright (1988) and Allwright and Bailey (1991) consolidated a shift away from coding-and-counting toward thick description of what actually happens in language classrooms. Walsh's SETT — Self-Evaluation of Teacher Talk (Walsh 2006) — and his later concept of classroom interactional competence (Walsh 2011) carry this lineage into contemporary practice, training teachers to analyse their own classroom discourse.
Today's practitioner literature (Wajnryb 1992; Richards and Lockhart 1994; Malderez 2003; O'Leary 2014) assumes a hybrid: structured tasks for developmental observation, narrative description for research, and only cautious use of summative coding for high-stakes evaluation.
Purposes and the Choice of Method
Malderez (2003) lays out the four canonical reasons to observe a teacher: training (modelling and learning to notice), assessment (judgement against a standard), development (formative reflection on practice), and research (data for analysis). Wallace (1991), Wajnryb (1992), and Bailey (2006) all stress that the four require different stances. A trainer demonstrates and asks; an assessor judges; a developmental observer describes and explores; a researcher abstracts. Conflating them produces the worst of all worlds: teachers feel judged when development was promised, assessors mistake descriptive notes for objective grades, researchers contaminate their data by giving feedback. The point recurs across the literature on teacher supervision (Bailey 2006), on peer observation (Cosh 1999), and on institutional inspection (O'Leary 2014; Shortland 2004).
| Purpose | Who observes | What it produces | Risk |
|---|---|---|---|
| Training | Trainer, course tutor | Demonstration; trainee's noticing log | Trainee mimicking surface, missing rationale |
| Assessment | Manager, examiner, inspector | Pass/fail, grade, rating against criteria | Performativity; observed lesson unrepresentative |
| Development | Mentor, peer, self | Descriptive evidence for reflection | Drift into evaluation; teacher performs rather than teaches |
| Research | Researcher (often non-participant) | Data for analysis (coded counts or transcripts) | Observer effect; ecological validity; consent ethics |
Paradigms
Two methodological traditions persist side by side.
Quantitative coding. Pre-defined categories applied through time-sampling (every 30 seconds, code the behaviour) or event-sampling (count every instance of teacher-initiated question). Produces tallies, percentages, and ratios, e.g. TTT vs STT, display vs referential question proportions (Long and Sato 1983), or wait time in seconds. Strengths: comparability across observers, replicability, suitability for research at scale. Weaknesses: pre-set categories miss what does not fit them; high-inference codes (e.g. "rapport") inflate observer disagreement; a behaviour count can lose its pedagogical meaning.
Qualitative narrative. Open-ended description, often supported with audio or video, producing field notes, transcripts, and vignettes. Strengths: captures atmosphere, silence, and mid-task adjustments that coding misses; preserves the lesson's meaning structure; surfaces unanticipated phenomena. Weaknesses: time-intensive; observer subjectivity; difficult to compare lessons or aggregate across classrooms.
Most developmental observation in ELT now combines both: a narrow coded focus on one or two behaviours, plus narrative notes on context. Wajnryb's Classroom Observation Tasks (1992) is the canonical practitioner toolkit for this hybrid approach, with thirty-five structured tasks across the learner, the language, the learning process, the lesson, teaching skills, classroom management, and materials.
The Observation Cycle
The standard structure inherits from Cogan's (1973) clinical supervision model in education: pre-observation, observation, post-observation. Almost every developmental scheme in ELT (CELTA, DELTA, in-service training, mentoring) uses this three-stage cycle (Wallace 1991; Randall and Thornton 2001).
Pre-observation conference
The observed teacher describes the lesson plan, intended outcomes, the learners, and any specific concerns. Observer and teacher negotiate the focus: a single skill or a small cluster (e.g. instructions and instruction-checking questions, not "the whole lesson"). The observation instrument — checklist, tally chart, narrative template, video — is agreed in advance. This stage exists to give the teacher control over what is examined and to protect the observer from the impossible task of noticing everything (Malderez 2003; Wajnryb 1992).
The observation itself
The observer records against the agreed focus. Discipline matters most here: distinguish what happened from what it meant. Wajnryb (1992) and Richards and Lockhart (1994) both recommend separating the descriptive notebook column from any inferential commentary. The observer should not interfere; if circumstances force a deviation (e.g. participating in a group), it is recorded as a feature of the data.
Post-observation conference
Begin with the teacher's reflection, classically how do you think it went? (Wallace 1991; Bailey 2006). The observer's role is to describe before evaluating: report what was recorded, then explore why. Two or three discussion points is the documented ceiling; more produces overload and no behaviour change (Brookfield 1995; Richards and Farrell 2005). End with concrete next steps and, for developmental schemes, agreement on whether and when to re-observe.
The asymmetry of the conference matters. An evaluator's "what do you think went well?" is not the same speech act as a peer's. Bailey (2006) and Day (1990) both warn that scripting developmental phrasing onto an evaluative relationship does not neutralise the power dynamic; it just makes it harder to read.
Standard Categories of Focus
A non-exhaustive inventory of what observers look at, drawn from Wajnryb (1992), Richards and Lockhart (1994), Walsh (2006, 2011), and the Cambridge English Teaching Framework. Each category typically requires its own micro-instrument; trying to cover all of them in one observation is the most common rookie mistake.
| Category | Typical observation tools |
|---|---|
| TTT vs STT | Time-sampling tally; transcript percentages |
| Question types | Display vs referential coding (Long and Sato 1983); wait time in seconds |
| Instructions and ICQs | Instruction-stage transcript; ICQ count and learner uptake |
| Corrective Feedback | Six-category coding (Lyster and Ranta 1997: explicit correction, recasts, clarification requests, metalinguistic feedback, elicitation, repetition); learner uptake |
| IRF / IRE structure | Discourse-move coding from Sinclair and Coulthard (1975) and Mehan (1979) |
| Interaction Patterns | T-S / S-S / pair / group / individual time slots; seating diagram with arrows |
| Monitoring | Movement map; intervention count; on-task estimates |
| Pacing | Stage-by-stage timings against plan; transition handling |
| Rapport | Critical-incident narrative (high-inference; treat with care) |
| Classroom modes | SETT modes (Walsh 2006): managerial, materials, skills-and-systems, classroom context |
The Observer Effect
Being watched changes behaviour. The phenomenon is conventionally named after the Hawthorne studies of the 1920s and 1930s (reported in Roethlisberger and Dickson 1939), but the term Hawthorne effect was coined later (French 1953; popularised by Landsberger 1958), and Adair (1984) showed that what most secondary literature calls the Hawthorne effect is methodologically distinct from the original observation. The phenomenon as it concerns classroom observers is therefore better called reactivity or observer effect: subjects modify behaviour because they know they are being studied.
In practice, the observed lesson is often better than the unobserved norm: cleaner staging, more student talk, no awkward fillers, the avoided activity that would normally have caused trouble. Mitigations include longer observation periods (the effect attenuates with familiarity), video over live observers (less pronounced once the camera is forgotten), repeated visits over a course rather than one-off, and the simple methodological acknowledgement that the lesson is a sample, not the whole.
For evaluative observation, the observer effect is structural: high stakes amplify performance. O'Leary (2014) documents the British further-education inspection regime and shows that graded observation reliably produces graded-observation lessons rather than typical practice — a point Shortland (2004) earlier characterised as the compliance trap of peer-observation schemes co-opted into quality assurance.
Developmental versus Evaluative: the Central Tension
The practitioner literature (Wallace 1991; Cosh 1999; Bailey 2006; O'Leary 2014) is unanimous on a single point: developmental and evaluative observation should be kept structurally separate. When the same person performs both roles, typically a manager or senior teacher, the asymmetry of evaluation contaminates the developmental conversation. Teachers self-censor, present idealised practice, and decline the genuinely useful but risky activities. Even with the most skilled observer, the relationship cannot be neutralised by phrasing alone.
Solutions include separating the developmental observer from the evaluative one (peers for development, manager for assessment); separating the occasions (developmental visits explicitly bracketed off from appraisal cycles); and explicit rules that data from one stream does not cross into the other. None of these is fully clean, but all are better than pretending the tension does not exist. This is part of the case Cosh (1999) makes for peer observation as the developmental backbone of an institution: by removing the hierarchy, it removes the structural contamination.
Self-observation
Recording one's own teaching — audio, video, or stimulated recall against a transcript — was advanced as the most powerful and most uncomfortable form of observation by Richards and Lockhart (1994), and elaborated into a full framework as SETT (Walsh 2006, 2011). Strengths: no observer effect from a stranger; complete access to one's own pedagogical reasoning; the capacity to revisit moments at leisure. Weaknesses: severely confronting, especially on first viewing, and self-blind in exactly the areas reflection most needs to penetrate. Brookfield's (1995) warning about the autobiographical lens applies: self-observation is a necessary but not sufficient lens for reflection.
Why It Matters for ELT
- Empirical base for the field. Most evidence about what happens in language classrooms — the IRF structure (Sinclair and Coulthard 1975; Mehan 1979), corrective feedback uptake (Lyster and Ranta 1997), the TTT/STT imbalance, wait-time effects, and discourse modes (Walsh 2006) — comes from classroom observation research.
- Bridge between belief and practice. Teacher cognition research (Borg 2006) consistently finds gaps between what teachers say they do and what observation records they do. Observation is the primary instrument for surfacing those gaps.
- Teacher training and assessment. CELTA and DELTA both build observation into qualification: trainee observation of experienced teachers, and assessed observation of trainees by tutors. Practising teachers continue to be observed throughout their careers (Bailey 2006).
- Quality assurance vs developmental learning. Institutions need both, and the way an institution structures them — same observer or different, separate cycles or merged, graded or descriptive — shapes the local culture of teaching far more than any policy document (O'Leary 2014).
- Method into research. Action research and exploratory practice (Allwright and Hanks 2009) are extensions of self-observation made systematic with hypotheses, data, and reflection cycles.
Key References
- Adair, J. G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69(2), 334–345.
- Allwright, D. (1988). Observation in the Language Classroom. London: Longman.
- Allwright, D. & Bailey, K. M. (1991). Focus on the Language Classroom: An Introduction to Classroom Research for Language Teachers. Cambridge: Cambridge University Press.
- Allwright, D. & Hanks, J. (2009). The Developing Language Learner: An Introduction to Exploratory Practice. Basingstoke: Palgrave Macmillan.
- Bailey, K. M. (2006). Language Teacher Supervision: A Case-Based Approach. Cambridge: Cambridge University Press.
- Borg, S. (2006). Teacher Cognition and Language Education: Research and Practice. London: Continuum.
- Brookfield, S. D. (1995). Becoming a Critically Reflective Teacher. San Francisco: Jossey-Bass.
- Cogan, M. L. (1973). Clinical Supervision. Boston, MA: Houghton Mifflin.
- Cosh, J. (1999). Peer observation: a reflective model. ELT Journal, 53(1), 22–27.
- Day, R. R. (1990). Teacher observation in second language teacher education. In Richards, J. C. & Nunan, D. (eds.), Second Language Teacher Education. Cambridge: Cambridge University Press, 43–61.
- Fanselow, J. F. (1977). Beyond Rashomon: Conceptualizing and describing the teaching act. TESOL Quarterly, 11(1), 17–32.
- Flanders, N. A. (1970). Analyzing Teaching Behavior. Reading, MA: Addison-Wesley.
- Long, M. H. & Sato, C. J. (1983). Classroom foreigner talk discourse: Forms and functions of teachers' questions. In Seliger, H. W. & Long, M. H. (eds.), Classroom-Oriented Research in Second Language Acquisition. Rowley, MA: Newbury House, 268–286.
- Lyster, R. & Ranta, L. (1997). Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in Second Language Acquisition, 19(1), 37–66.
- Malderez, A. (2003). Observation. ELT Journal, 57(2), 179–181.
- Mehan, H. (1979). Learning Lessons: Social Organization in the Classroom. Cambridge, MA: Harvard University Press.
- O'Leary, M. (2014). Classroom Observation: A Guide to the Effective Observation of Teaching and Learning. London: Routledge.
- Randall, M. & Thornton, B. (2001). Advising and Supporting Teachers. Cambridge: Cambridge University Press.
- Richards, J. C. & Farrell, T. S. C. (2005). Professional Development for Language Teachers: Strategies for Teacher Learning. Cambridge: Cambridge University Press.
- Richards, J. C. & Lockhart, C. (1994). Reflective Teaching in Second Language Classrooms. Cambridge: Cambridge University Press.
- Roethlisberger, F. J. & Dickson, W. J. (1939). Management and the Worker: An Account of a Research Program Conducted by the Western Electric Company, Hawthorne Works, Chicago. Cambridge, MA: Harvard University Press.
- Shortland, S. (2004). Peer observation: a tool for staff development or compliance? Journal of Further and Higher Education, 28(2), 219–228.
- Sinclair, J. McH. & Coulthard, R. M. (1975). Towards an Analysis of Discourse: The English Used by Teachers and Pupils. London: Oxford University Press.
- Spada, N. & Fröhlich, M. (1995). COLT — Communicative Orientation of Language Teaching Observation Scheme: Coding Conventions and Applications. Sydney: National Centre for English Language Teaching and Research, Macquarie University.
- Wajnryb, R. (1992). Classroom Observation Tasks: A Resource Book for Language Teachers and Trainers. Cambridge: Cambridge University Press.
- Wallace, M. J. (1991). Training Foreign Language Teachers: A Reflective Approach. Cambridge: Cambridge University Press.
- Walsh, S. (2006). Investigating Classroom Discourse. London: Routledge.
- Walsh, S. (2011). Exploring Classroom Discourse: Language in Action. London: Routledge.
See Also
- Peer Observation: the developmental backbone form, removing the evaluator–observer overlap
- Reflective Practice: the cognitive activity that observation feeds
- Mentoring: observation embedded in a long-term developmental relationship
- Teacher Professional Development: observation as one mechanism within TPD
- Action Research: observation made systematic with hypotheses and data
- Teacher Cognition: what observation surfaces about belief–practice gaps
- Discourse Analysis: the analytic tradition behind IRF/IRE and SETT
- Cambridge English Teaching Framework: competency descriptors mapped to observable categories