# Methodological Appendix for the Live Theater Experimental Study

By 10/15/2014

Learning from Live Theater
Education Next, Winter 2015

Empirical Strategy

Because the randomized controlled trial approach has the important feature of generating comparable treatment and control groups, we can use a straightforward set of analytic techniques, designed for use in social experiments, to estimate the impact of a school field trip to see live theater on student outcomes. In its simplest form, this technique can estimate mean differences using the following equation for outcome Y of student i in matched pair m:

(1)        Yim = α + β1Treati + β2Matchim + εim

The binary variable Treati is equal to 1 if the student is in the treatment group that was randomly assigned to receive free tickets for a field trip to see live performances of A Christmas Carol or Hamlet, and is equal to 0 otherwise. Because the groups were created using a stratified randomization procedure within matched applicant group pairs, Matchim is also included in the model as a vector of dummy variables that have the statistical effect of estimating within, as opposed to across, matched groupings. Finally, εim is a stochastic error term clustered at the applicant group level to take into account the spatial correlation from students nested within applicant groups.

Proper randomization generates experimental groups that are comparable but not necessarily identical. The basic regression model can, therefore, be improved by adding controls for observable characteristics to increase the reliability of the estimated impact by accounting for minor differences and improving the precision of the overall statistical model. This yields the following equation to be estimated:

(2)  Yim = α + β1Treati + β2Matchim + β3Genderi + β4Gradei + β5Minorityi + εim

where Genderi is a dummy variable equal to 1 if the student is a female and 0 otherwise, Gradei is a vector of dummy variables indicating the grade level of student i, and Minorityi is a dummy variable equal to 1 if the student does not identify as being white and is 0 otherwise. In this model, β1 is the parameter of interest and represents the effect of a school tour for students in the treatment group. Equation (2) is our preferred model for estimating overall impacts.

Comparability of Treatment and Control Groups

Even within randomized controlled trials, treatment and control groups may differ significantly from each other by chance. To explore whether that occurred in our experiment, we compared the observed characteristics of treatment and control group students. We found only one significant difference among the 19 observed characteristics we examined. (See Appendix Table 1.) The treatment group was 5.3% more likely to be from a minority racial or ethnic group, 27.7% for the treatment group versus 22.4% for the control group. With so many comparisons, it is possible that this difference could have been produced by chance, so we are unconcerned that the lottery failed to give us generally comparable treatment and control groups. In addition, we controlled for minority status in our model, which should further alleviate concerns that the groups were different at baseline.

Appendix Table 1: Treatment/Control Balance

 Treatment Control Difference Individual Average Grade 9.3 9.3 0 Percentage Female 61 62.1 -1.1 Percentage Minority 27.7 22.4 5.3** Percentage Agree “I am a good student.” 95.2 97.6 -2.4 Percentage Agree “School is boring.” 48.5 46.2 2.3 School Average Enrollment 850.8 829.9 20.9 Percentage Homeless 3.1 1.6 1.5 Percentage FRL 40.4 44 -3.6 Average School Poverty Index 75.2 76.8 -1.6 Percentage White 71.7 71.5 0.2 Percentage Hispanic 12.8 15.1 -2.3 Percentage Black 4.9 4.2 0.7 Percentage Other Race 11.1 9.3 1.8 Percentage Minority 28.4 28.5 -0.1 Percentage GT 8.6 8.6 0 Percentage SPED 8.1 7.7 0.4 Percentage LEP 9.5 11.4 -1.9 Average Miles from Theater 24.4 30.5 -6.1 Average Minutes from Theater 29.9 36.2 -6.3 ** p < .05, two-tailed. Treatment Control Number of Students All 330 340 Christmas Carol 101 246 Hamlet 229 94 Applicant Groups All 22 27 Christmas Carol 6 18 Hamlet 16 9

Outcome Scales

We examined five outcomes for students: knowledge of play plots and vocabulary, tolerance, Reading the Mind in the Eyes Test (RMET), desire to participate in theater, and interest in viewing theater. Each of these outcomes consisted of a scale constructed from multiple items on the survey.

For the knowledge scale, students who had applied to see A Christmas Carol were asked,

1)    Who is Jacob Marley?

2)    What lesson does Ebenezer Scrooge learn in A Christmas Carol?

3)    What does the Ghost of Christmas Yet to Come show Scrooge?

4)    In A Christmas Carol who says “God bless us, every one!”

5)    Who is Belle?

6)    Why does Scrooge become a mean-spirited miser?

They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):

1)    Wanting things owned by others (covetous)

2)    Very poor (destitute)

3)    A ghost or ghost like image or appearance (apparition)

4)    Nonsense, or a trick (humbug)

5)    Offensive, or strongly disliked (odious)

Students whose groups applied to see Hamlet were asked,

1)    Who are Rosencrantz and Guildenstern?

2)    When Hamlet wonders whether “to be, or not to be,” what is he considering?

3)    What does the Ghost ask Hamlet to do?

4)    Why is Hamlet troubled?

5)    What happens to Ophelia?

6)    Which of these does Hamlet say? (“What a piece of work is a man!”)

They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):

1)    Happiness and laughter (mirth)

2)    The appearance of a person’s face (countenance)

3)    Wherefore (why)

4)    Not working, active, or being used (idle)

5)    A roguish or mischievous act (knavery)

The tolerance scale consisted of asking students the extent to which they agreed or disagreed with the following statements, with the negatively framed items reverse-coded:

1)    People who disagree with my point of view bother me.

2)    Plays critical of America should not be allowed to be performed in our community.

3)    I am not interested in learning about people different than me.

4)    I think people can have different opinions about the same thing.

5)    Women are equally able to do the same jobs that men can do.

6)    I like to hear views different from my own.

7)    There are multiple ways to interpret the same work of drama.

The version of RMET used in our study was the one developed for adolescent subjects as described and validated in Baron-Cohen, S. Wheelwright, S. Scahill, V. Lawson, J. and Spong, A. (2001). (Are intuitive physics and intuitive psychology independent? A test with children with Asperger Syndrome. Journal of Developmental and Learning Disorders 5:47-78). Because this test is widely used and has been validated by others we did not alter it in any way despite the fact that some British-isms, such as using the word “cross” to mean angry, may have confused our students.

The scale for student interest in participating in theater consisted of the following items:

1)    How interested are you in being in a theater performance?

2)    If your school were having auditions for a new play, how interested would you be in trying to get a role in that play?

3)    How interested are you in taking a drama class?

The scale for student interest in viewing theater consisted of the following items, with the negatively framed items reverse-coded:

1)    How interested are you in seeing live performances in a theater?

2)    If your friends or family wanted to go to a play, how interested would you be in going?

3)    Imagine that a friend of yours is going to go on a field trip. Do you think your friend would enjoy these field trips? A theater performance

4)    Would you like more live theater performances in your town?

5)    I would tell my friends that they should see a live theater performance.

6)    I plan to see live theater performances when I am an adult.

7)    Live theater is interesting to me.

8)    Trips to see live theater are fun.

9)    I feel uneasy in theaters.

10) I feel comfortable talking about theater performances.

All of our scales were built by standardizing and averaging the components of the scales. The effect sizes of results were all computed by using the standard deviation of the control group.

Cronbach’s Alpha tests show that the items reliably measure knowledge, tolerance, interest in participating in theater, and interest in viewing theater. (See Appendix Table 2.) The Cronbach’s Alpha for the RMET scale, however, falls short of conventional standards for reliably measuring the same underlying construct. Because this scale has been validated by other researchers, however, we feel comfortable using it in this analysis. We suspect that some Britishisms and the fact that we incorporated RMET into a larger survey may have produced a lower alpha than what other researchers have found. The fact that we still observe significant effects despite a noisy scale also increases our confidence in using it. None of our scales could be improved substantially by omitting any one item, so we build all scales with all available items that are theoretically connected with the underlying constructs.

Appendix Table 2: Cronbach’s Alpha for Outcome Scales

 Scale Number of Items Cronbach’s Alpha Knowledge – Both plays combined 11 0.59 Knowledge – Christmas Carol 11 0.62 Knowledge – Hamlet 11 0.55 Tolerance 7 0.59 Reading Others’ Emotions 28 0.42 Interest in Participating in Theater 4 0.94 Interest in Viewing Theater 10 0.92

Analysis without Assuming Weather Events Are Exogenous

Adverse weather prevented several school groups from seeing performances of A Christmas Carol, and we have treated those events as exogenous and assigned those groups to the control group. Doing so cannot bias any estimates of the treatment because it resulted in there being no treatment students within their matched groupings. Those observations do not contribute directly to the estimate of the treatment effect because there is no variance on the treatment variable within their matched grouping. Leaving them within the analysis, however, does improve the precision of estimates for other covariates, which results in a more precise estimate of the treatment effect.

In this section, we show that we generally get similar results even if we relax that assumption and use other approaches to handling the fact that some groups had to cancel their field trips. First, we present below in Appendix Table 3 the results of our preferred approach with the inclusion of standard errors.

Appendix Table 3 – Results Treating Snow Days as Exogenous

 Knowledge Tolerance Reading Others’ Emotions Treatment 0.63 (0.13)*** 0.26 (0.12)** 0.23 (0.11)** Treatment – Controlling for Reading and Movie-Watching 0.58 (0.13)*** 0.31 (0.11)** 0.21 (0.11)* Read Play or Book for School 0.01 (0.15) -0.13 (0.12) -0.04 (0.10) Watched Movie for School 0.30 (0.12)** -0.22 (0.11)* 0.11 (0.11) Treatment – Controlling for Interest in Theater 0.61 (0.13)*** 0.22 (0.09)** 0.22 (0.11)* Interest in Theater 0.24 (0.04)*** 0.37 (0.05)*** 0.09 (0.04)** * p < .10, ** p < .05, *** p < .01, two-tailed. Standard error in parentheses.

We could instead use an intention-to-treat approach to estimate our results. That has the advantage of ensuring that there is no bias in our estimate of treatment effects because all groups retain the treatment status they were awarded by the lottery regardless of whether their performance was cancelled for snow. But an intention-to-treat approach has the significant disadvantage of understating the effect of actually being treated, particularly for the large number of school groups whose field trip was cancelled for weather.

The results for the intention-to-treat analyses with standard errors are reported below in Appendix Table 4. As one would expect, the point estimates are lower, but the substantive findings are generally the same. The only important difference is that the effect for the main Tolerance analysis falls short of being statistically significant.

Appendix Table 4 – Results Using Intention-to-Treat Approach

 Knowledge Tolerance Reading Others’ Emotions Treatment 0.44 (0.14)*** 0.17 (0.12) 0.17 (0.09)* Treatment – Controlling for Reading and Movie-Watching 0.39 (0.14)** 0.22 (0.11)* 0.16 (0.09)* Read Play or Book for School -0.01 (0.16) -0.15 (0.13) -0.05 (0.10) Watched Movie for School 0.33 (0.12)*** -0.20 (0.11)* 0.12 (0.11) Treatment – Controlling for Interest in Theater 0.43 (0.14)*** 0.15 (0.09) 0.16 (0.08)* Interest in Theater 0.24 (0.04)*** 0.38 (0.05)*** 0.09 (0.04)** * p < .10, ** p < .05, *** p < .01, two-tailed. Standard error in parentheses.

We could estimate the impact on treated using a two-stage model in which the intention to treat is used as an instrument for whether groups actually received the treatment. The advantage of this approach is that we get an estimate of the impact on treated. The disadvantage is that we inflate the standard errors by using a two-stage model, which is particularly important given that there wasn’t any non-compliance from the intention to treat assignment for the Hamlet groups. So to adjust for non-compliance for one play we inflate standard errors for both.

The results for the instrumental variable analyses are reported below in Appendix Table 5. The point estimates are almost identical to the main approach where we treat weather as exogenous, but the standard errors get larger so that the main Tolerance result falls short of statistical significance.

Appendix Table 5 – Results Using Instrumental Variable Approach

 Knowledge Tolerance Reading Others’ Emotions Treatment 0.62 (0.16)*** 0.24 (0.16) 0.24 (0.11)** Treatment – Controlling for Reading and Movie-Watching 0.55 (0.16)*** 0.31 (0.15)** 0.22 (0.11)** Read Play or Book for School 0.01 (0.14) -0.13 (0.12) -0.04 (0.09) Watched Movie for School 0.31 (0.12)*** -0.22 (0.11)** 0.11 (0.11) Treatment – Controlling for Interest in Theater 0.60 (0.16)*** 0.22 (0.12)* 0.23 (0.11)** Interest in Theater 0.24 (0.04)*** 0.37 (0.05)*** 0.09 (0.03)** * p < .10, ** p < .05, *** p < .01, two-tailed. Standard error in parentheses.

And if we use intention to treat to determine baseline equivalence, the results come out basically the same as when we treated weather as exogenous and reassigned groups that had to cancel to the control group. The intention-to-treat baseline equivalence comparisons can be found in Appendix Table 6. Of the 19 baseline characteristics on which we compare the students, those assigned by the lottery to the intention to treat condition are not significantly different from the control group in all but one instance. The intention-to-treat students are still more likely to be from a minority racial or ethnic group by 8.5%. Again, this is a difference that could have been produced by chance and is controlled in the regression models.

Appendix Table 6: Intention to Treat/Control Balance

 Intent to Treat Control Difference Individual Average Grade 9.3 9.5 -0.2 Percentage Female 59.5 61.7 -2.2 Percentage Minority 30.4 21.9 8.5*** Percentage Agree “I am a good student.” 96.0 98.3 -2.3* Percentage Agree “School is boring.” 47.8 45.5 2.3 School Average Enrollment 969.2 753.4 215.8 Percentage Homeless 3.1 2.0 1.1 Percentage FRL 46.4 49.7 -3.3 Average School Poverty Index 84.7 87.8 -3.1 Percentage White 69.9 69.7 0.2 Percentage Hispanic 13.9 16.3 -2.4 Percentage Black 30.9 30.3 0.6 Percentage Other Race 10.9 9.2 1.7 Percentage Minority 30.1 30.3 -0.2 Percentage GT 8.6 9.3 -0.7 Percentage SPED 8.9 8.6 0.3 Percentage LEP 10.7 13.5 -2.8 Average Miles from Theater 28.7 34.1 -5.4 Average Minutes from Theater 33.4 39.4 -6.0 * p < .10, *** p < .01, two-tailed. Treatment Control Number of Students Per Group All 428 242 Christmas Carol 199 148 Hamlet 229 94 Applicant Groups All 30 19 Christmas Carol 14 10 Hamlet 16 9