Methodological Appendix for the Live Theater Experimental Study
Learning from Live Theater
Education Next, Winter 2015
Empirical Strategy
Because the randomized controlled trial approach has the important feature of generating comparable treatment and control groups, we can use a straightforward set of analytic techniques, designed for use in social experiments, to estimate the impact of a school field trip to see live theater on student outcomes. In its simplest form, this technique can estimate mean differences using the following equation for outcome Y of student i in matched pair m:
_{(1) }Y_{im = }α + β_{1}Treat_{i }+ β_{2}Match_{im + }ε_{im}
The binary variable Treat_{i }is equal to 1 if the student is in the treatment group that was randomly assigned to receive free tickets for a field trip to see live performances of A Christmas Carol or Hamlet, and is equal to 0 otherwise. Because the groups were created using a stratified randomization procedure within matched applicant group pairs, Match_{im} is also included in the model as a vector of dummy variables that have the statistical effect of estimating within, as opposed to across, matched groupings. Finally, ε_{im} is a stochastic error term clustered at the applicant group level to take into account the spatial correlation from students nested within applicant groups.
Proper randomization generates experimental groups that are comparable but not necessarily identical. The basic regression model can, therefore, be improved by adding controls for observable characteristics to increase the reliability of the estimated impact by accounting for minor differences and improving the precision of the overall statistical model. This yields the following equation to be estimated:
(2) Y_{im = }α + β_{1}Treat_{i }+ β_{2}Match_{im} + β_{3}Gender_{i }+ β_{4}Grade_{i }+ β_{5}Minority_{i + }ε_{im}
where Gender_{i }is a dummy variable equal to 1 if the student is a female and 0 otherwise, Grade_{i} is a vector of dummy variables indicating the grade level of student i, and Minority_{i} is a dummy variable equal to 1 if the student does not identify as being white and is 0 otherwise. In this model, β_{1} is the parameter of interest and represents the effect of a school tour for students in the treatment group. Equation (2) is our preferred model for estimating overall impacts.
Comparability of Treatment and Control Groups
Even within randomized controlled trials, treatment and control groups may differ significantly from each other by chance. To explore whether that occurred in our experiment, we compared the observed characteristics of treatment and control group students. We found only one significant difference among the 19 observed characteristics we examined. (See Appendix Table 1.) The treatment group was 5.3% more likely to be from a minority racial or ethnic group, 27.7% for the treatment group versus 22.4% for the control group. With so many comparisons, it is possible that this difference could have been produced by chance, so we are unconcerned that the lottery failed to give us generally comparable treatment and control groups. In addition, we controlled for minority status in our model, which should further alleviate concerns that the groups were different at baseline.
Appendix Table 1: Treatment/Control Balance
Treatment 
Control 
Difference 

Individual 
Average Grade 
9.3 
9.3 
0 
Percentage Female 
61 
62.1 
1.1 

Percentage Minority 
27.7 
22.4 
5.3** 

Percentage Agree “I am a good student.” 
95.2 
97.6 
2.4 

Percentage Agree “School is boring.” 
48.5 
46.2 
2.3 

School 
Average Enrollment 
850.8 
829.9 
20.9 
Percentage Homeless 
3.1 
1.6 
1.5 

Percentage FRL 
40.4 
44 
3.6 

Average School Poverty Index 
75.2 
76.8 
1.6 

Percentage White 
71.7 
71.5 
0.2 

Percentage Hispanic 
12.8 
15.1 
2.3 

Percentage Black 
4.9 
4.2 
0.7 

Percentage Other Race 
11.1 
9.3 
1.8 

Percentage Minority 
28.4 
28.5 
0.1 

Percentage GT 
8.6 
8.6 
0 

Percentage SPED 
8.1 
7.7 
0.4 

Percentage LEP 
9.5 
11.4 
1.9 

Average Miles from Theater 
24.4 
30.5 
6.1 

Average Minutes from Theater 
29.9 
36.2 
6.3 

** p < .05, twotailed.  
Treatment 
Control 

Number of Students 
All 
330 
340 

Christmas Carol 
101 
246 

Hamlet 
229 
94 

Applicant Groups 
All 
22 
27 

Christmas Carol 
6 
18 

Hamlet 
16 
9 
Outcome Scales
We examined five outcomes for students: knowledge of play plots and vocabulary, tolerance, Reading the Mind in the Eyes Test (RMET), desire to participate in theater, and interest in viewing theater. Each of these outcomes consisted of a scale constructed from multiple items on the survey.
For the knowledge scale, students who had applied to see A Christmas Carol were asked,
1) Who is Jacob Marley?
2) What lesson does Ebenezer Scrooge learn in A Christmas Carol?
3) What does the Ghost of Christmas Yet to Come show Scrooge?
4) In A Christmas Carol who says “God bless us, every one!”
5) Who is Belle?
6) Why does Scrooge become a meanspirited miser?
They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):
1) Wanting things owned by others (covetous)
2) Very poor (destitute)
3) A ghost or ghost like image or appearance (apparition)
4) Nonsense, or a trick (humbug)
5) Offensive, or strongly disliked (odious)
Students whose groups applied to see Hamlet were asked,
1) Who are Rosencrantz and Guildenstern?
2) When Hamlet wonders whether “to be, or not to be,” what is he considering?
3) What does the Ghost ask Hamlet to do?
4) Why is Hamlet troubled?
5) What happens to Ophelia?
6) Which of these does Hamlet say? (“What a piece of work is a man!”)
They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):
1) Happiness and laughter (mirth)
2) The appearance of a person’s face (countenance)
3) Wherefore (why)
4) Not working, active, or being used (idle)
5) A roguish or mischievous act (knavery)
The tolerance scale consisted of asking students the extent to which they agreed or disagreed with the following statements, with the negatively framed items reversecoded:
1) People who disagree with my point of view bother me.
2) Plays critical of America should not be allowed to be performed in our community.
3) I am not interested in learning about people different than me.
4) I think people can have different opinions about the same thing.
5) Women are equally able to do the same jobs that men can do.
6) I like to hear views different from my own.
7) There are multiple ways to interpret the same work of drama.
The version of RMET used in our study was the one developed for adolescent subjects as described and validated in BaronCohen, S. Wheelwright, S. Scahill, V. Lawson, J. and Spong, A. (2001). (Are intuitive physics and intuitive psychology independent? A test with children with Asperger Syndrome. Journal of Developmental and Learning Disorders 5:4778). Because this test is widely used and has been validated by others we did not alter it in any way despite the fact that some Britishisms, such as using the word “cross” to mean angry, may have confused our students.
The scale for student interest in participating in theater consisted of the following items:
1) How interested are you in being in a theater performance?
2) If your school were having auditions for a new play, how interested would you be in trying to get a role in that play?
3) How interested are you in taking a drama class?
4) I would be interested in joining a drama club if my school had one.
The scale for student interest in viewing theater consisted of the following items, with the negatively framed items reversecoded:
1) How interested are you in seeing live performances in a theater?
2) If your friends or family wanted to go to a play, how interested would you be in going?
3) Imagine that a friend of yours is going to go on a field trip. Do you think your friend would enjoy these field trips? A theater performance
4) Would you like more live theater performances in your town?
5) I would tell my friends that they should see a live theater performance.
6) I plan to see live theater performances when I am an adult.
7) Live theater is interesting to me.
8) Trips to see live theater are fun.
9) I feel uneasy in theaters.
10) I feel comfortable talking about theater performances.
All of our scales were built by standardizing and averaging the components of the scales. The effect sizes of results were all computed by using the standard deviation of the control group.
Cronbach’s Alpha tests show that the items reliably measure knowledge, tolerance, interest in participating in theater, and interest in viewing theater. (See Appendix Table 2.) The Cronbach’s Alpha for the RMET scale, however, falls short of conventional standards for reliably measuring the same underlying construct. Because this scale has been validated by other researchers, however, we feel comfortable using it in this analysis. We suspect that some Britishisms and the fact that we incorporated RMET into a larger survey may have produced a lower alpha than what other researchers have found. The fact that we still observe significant effects despite a noisy scale also increases our confidence in using it. None of our scales could be improved substantially by omitting any one item, so we build all scales with all available items that are theoretically connected with the underlying constructs.
Appendix Table 2: Cronbach’s Alpha for Outcome Scales
Scale 
Number of Items 
Cronbach’s Alpha 
Knowledge – Both plays combined 
11 
0.59 
Knowledge – Christmas Carol 
11 
0.62 
Knowledge – Hamlet 
11 
0.55 
Tolerance 
7 
0.59 
Reading Others’ Emotions 
28 
0.42 
Interest in Participating in Theater 
4 
0.94 
Interest in Viewing Theater 
10 
0.92 
Analysis without Assuming Weather Events Are Exogenous
Adverse weather prevented several school groups from seeing performances of A Christmas Carol, and we have treated those events as exogenous and assigned those groups to the control group. Doing so cannot bias any estimates of the treatment because it resulted in there being no treatment students within their matched groupings. Those observations do not contribute directly to the estimate of the treatment effect because there is no variance on the treatment variable within their matched grouping. Leaving them within the analysis, however, does improve the precision of estimates for other covariates, which results in a more precise estimate of the treatment effect.
In this section, we show that we generally get similar results even if we relax that assumption and use other approaches to handling the fact that some groups had to cancel their field trips. First, we present below in Appendix Table 3 the results of our preferred approach with the inclusion of standard errors.
Appendix Table 3 – Results Treating Snow Days as Exogenous
Knowledge 
Tolerance 
Reading Others’ Emotions 

Treatment  0.63 (0.13)***  0.26 (0.12)**  0.23 (0.11)** 
Treatment – Controlling for Reading and MovieWatching  0.58 (0.13)***  0.31 (0.11)**  0.21 (0.11)* 
Read Play or Book for School  0.01 (0.15)  0.13 (0.12)  0.04 (0.10) 
Watched Movie for School  0.30 (0.12)**  0.22 (0.11)*  0.11 (0.11) 
Treatment – Controlling for Interest in Theater  0.61 (0.13)***  0.22 (0.09)**  0.22 (0.11)* 
Interest in Theater  0.24 (0.04)***  0.37 (0.05)***  0.09 (0.04)** 
* p < .10, ** p < .05, *** p < .01, twotailed. Standard error in parentheses. 
We could instead use an intentiontotreat approach to estimate our results. That has the advantage of ensuring that there is no bias in our estimate of treatment effects because all groups retain the treatment status they were awarded by the lottery regardless of whether their performance was cancelled for snow. But an intentiontotreat approach has the significant disadvantage of understating the effect of actually being treated, particularly for the large number of school groups whose field trip was cancelled for weather.
The results for the intentiontotreat analyses with standard errors are reported below in Appendix Table 4. As one would expect, the point estimates are lower, but the substantive findings are generally the same. The only important difference is that the effect for the main Tolerance analysis falls short of being statistically significant.
Appendix Table 4 – Results Using IntentiontoTreat Approach
Knowledge 
Tolerance 
Reading Others’ Emotions 

Treatment  0.44 (0.14)***  0.17 (0.12)  0.17 (0.09)* 
Treatment – Controlling for Reading and MovieWatching  0.39 (0.14)**  0.22 (0.11)*  0.16 (0.09)* 
Read Play or Book for School  0.01 (0.16)  0.15 (0.13)  0.05 (0.10) 
Watched Movie for School  0.33 (0.12)***  0.20 (0.11)*  0.12 (0.11) 
Treatment – Controlling for Interest in Theater  0.43 (0.14)***  0.15 (0.09)  0.16 (0.08)* 
Interest in Theater  0.24 (0.04)***  0.38 (0.05)***  0.09 (0.04)** 
* p < .10, ** p < .05, *** p < .01, twotailed. Standard error in parentheses. 
We could estimate the impact on treated using a twostage model in which the intention to treat is used as an instrument for whether groups actually received the treatment. The advantage of this approach is that we get an estimate of the impact on treated. The disadvantage is that we inflate the standard errors by using a twostage model, which is particularly important given that there wasn’t any noncompliance from the intention to treat assignment for the Hamlet groups. So to adjust for noncompliance for one play we inflate standard errors for both.
The results for the instrumental variable analyses are reported below in Appendix Table 5. The point estimates are almost identical to the main approach where we treat weather as exogenous, but the standard errors get larger so that the main Tolerance result falls short of statistical significance.
Appendix Table 5 – Results Using Instrumental Variable Approach
Knowledge 
Tolerance 
Reading Others’ Emotions 

Treatment  0.62 (0.16)***  0.24 (0.16)  0.24 (0.11)** 
Treatment – Controlling for Reading and MovieWatching  0.55 (0.16)***  0.31 (0.15)**  0.22 (0.11)** 
Read Play or Book for School  0.01 (0.14)  0.13 (0.12)  0.04 (0.09) 
Watched Movie for School  0.31 (0.12)***  0.22 (0.11)**  0.11 (0.11) 
Treatment – Controlling for Interest in Theater  0.60 (0.16)***  0.22 (0.12)*  0.23 (0.11)** 
Interest in Theater  0.24 (0.04)***  0.37 (0.05)***  0.09 (0.03)** 
* p < .10, ** p < .05, *** p < .01, twotailed. Standard error in parentheses. 
And if we use intention to treat to determine baseline equivalence, the results come out basically the same as when we treated weather as exogenous and reassigned groups that had to cancel to the control group. The intentiontotreat baseline equivalence comparisons can be found in Appendix Table 6. Of the 19 baseline characteristics on which we compare the students, those assigned by the lottery to the intention to treat condition are not significantly different from the control group in all but one instance. The intentiontotreat students are still more likely to be from a minority racial or ethnic group by 8.5%. Again, this is a difference that could have been produced by chance and is controlled in the regression models.
Appendix Table 6: Intention to Treat/Control Balance
Intent to Treat 
Control 
Difference 

Individual 
Average Grade 
9.3 
9.5 
0.2 
Percentage Female 
59.5 
61.7 
2.2 

Percentage Minority 
30.4 
21.9 
8.5*** 

Percentage Agree “I am a good student.” 
96.0 
98.3 
2.3* 

Percentage Agree “School is boring.” 
47.8 
45.5 
2.3 

School 
Average Enrollment 
969.2 
753.4 
215.8 
Percentage Homeless 
3.1 
2.0 
1.1 

Percentage FRL 
46.4 
49.7 
3.3 

Average School Poverty Index 
84.7 
87.8 
3.1 

Percentage White 
69.9 
69.7 
0.2 

Percentage Hispanic 
13.9 
16.3 
2.4 

Percentage Black 
30.9 
30.3 
0.6 

Percentage Other Race 
10.9 
9.2 
1.7 

Percentage Minority 
30.1 
30.3 
0.2 

Percentage GT 
8.6 
9.3 
0.7 

Percentage SPED 
8.9 
8.6 
0.3 

Percentage LEP 
10.7 
13.5 
2.8 

Average Miles from Theater 
28.7 
34.1 
5.4 

Average Minutes from Theater 
33.4 
39.4 
6.0 

* p < .10, *** p < .01, twotailed.  
Treatment 
Control 

Number of Students Per Group 
All 
428 
242 

Christmas Carol 
199 
148 

Hamlet 
229 
94 

Applicant Groups 
All 
30 
19 

Christmas Carol 
14 
10 

Hamlet 
16 
9 