Methodological Appendix for the Live Theater Experimental Study

Learning from Live Theater
Education Next, Winter 2015

Empirical Strategy

Because the randomized controlled trial approach has the important feature of generating comparable treatment and control groups, we can use a straightforward set of analytic techniques, designed for use in social experiments, to estimate the impact of a school field trip to see live theater on student outcomes. In its simplest form, this technique can estimate mean differences using the following equation for outcome Y of student i in matched pair m:

₍₁₎Y_{im =}α + β₁Treat_i+ β₂Match_{im +}ε_im

The binary variable Treat_iis equal to 1 if the student is in the treatment group that was randomly assigned to receive free tickets for a field trip to see live performances of A Christmas Carol or Hamlet, and is equal to 0 otherwise. Because the groups were created using a stratified randomization procedure within matched applicant group pairs, Match_im is also included in the model as a vector of dummy variables that have the statistical effect of estimating within, as opposed to across, matched groupings. Finally, ε_im is a stochastic error term clustered at the applicant group level to take into account the spatial correlation from students nested within applicant groups.

Proper randomization generates experimental groups that are comparable but not necessarily identical. The basic regression model can, therefore, be improved by adding controls for observable characteristics to increase the reliability of the estimated impact by accounting for minor differences and improving the precision of the overall statistical model. This yields the following equation to be estimated:

(2) Y_{im =}α + β₁Treat_i+ β₂Match_im + β₃Gender_i+ β₄Grade_i+ β₅Minority_{i +}ε_im

where Gender_iis a dummy variable equal to 1 if the student is a female and 0 otherwise, Grade_i is a vector of dummy variables indicating the grade level of student i, and Minority_i is a dummy variable equal to 1 if the student does not identify as being white and is 0 otherwise. In this model, β₁ is the parameter of interest and represents the effect of a school tour for students in the treatment group. Equation (2) is our preferred model for estimating overall impacts.

Comparability of Treatment and Control Groups

Even within randomized controlled trials, treatment and control groups may differ significantly from each other by chance. To explore whether that occurred in our experiment, we compared the observed characteristics of treatment and control group students. We found only one significant difference among the 19 observed characteristics we examined. (See Appendix Table 1.) The treatment group was 5.3% more likely to be from a minority racial or ethnic group, 27.7% for the treatment group versus 22.4% for the control group. With so many comparisons, it is possible that this difference could have been produced by chance, so we are unconcerned that the lottery failed to give us generally comparable treatment and control groups. In addition, we controlled for minority status in our model, which should further alleviate concerns that the groups were different at baseline.

Appendix Table 1: Treatment/Control Balance

		Treatment	Control	Difference
Individual	Average Grade	9.3	9.3	0
	Percentage Female	61	62.1	-1.1
	Percentage Minority	27.7	22.4	5.3**
	Percentage Agree “I am a good student.”	95.2	97.6	-2.4
	Percentage Agree “School is boring.”	48.5	46.2	2.3
School	Average Enrollment	850.8	829.9	20.9
	Percentage Homeless	3.1	1.6	1.5
	Percentage FRL	40.4	44	-3.6
	Average School Poverty Index	75.2	76.8	-1.6
	Percentage White	71.7	71.5	0.2
	Percentage Hispanic	12.8	15.1	-2.3
	Percentage Black	4.9	4.2	0.7
	Percentage Other Race	11.1	9.3	1.8
	Percentage Minority	28.4	28.5	-0.1
	Percentage GT	8.6	8.6	0
	Percentage SPED	8.1	7.7	0.4
	Percentage LEP	9.5	11.4	-1.9
	Average Miles from Theater	24.4	30.5	-6.1
	Average Minutes from Theater	29.9	36.2	-6.3
** p < .05, two-tailed.

		Treatment	Control
Number of Students	All	330	340
	Christmas Carol	101	246
	Hamlet	229	94
Applicant Groups	All	22	27
	Christmas Carol	6	18
	Hamlet	16	9

Outcome Scales

We examined five outcomes for students: knowledge of play plots and vocabulary, tolerance, Reading the Mind in the Eyes Test (RMET), desire to participate in theater, and interest in viewing theater. Each of these outcomes consisted of a scale constructed from multiple items on the survey.

For the knowledge scale, students who had applied to see A Christmas Carol were asked,

1) Who is Jacob Marley?

2) What lesson does Ebenezer Scrooge learn in A Christmas Carol?

3) What does the Ghost of Christmas Yet to Come show Scrooge?

4) In A Christmas Carol who says “God bless us, every one!”

5) Who is Belle?

6) Why does Scrooge become a mean-spirited miser?

They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):

1) Wanting things owned by others (covetous)

2) Very poor (destitute)

3) A ghost or ghost like image or appearance (apparition)

4) Nonsense, or a trick (humbug)

5) Offensive, or strongly disliked (odious)

Students whose groups applied to see Hamlet were asked,

1) Who are Rosencrantz and Guildenstern?

2) When Hamlet wonders whether “to be, or not to be,” what is he considering?

3) What does the Ghost ask Hamlet to do?

4) Why is Hamlet troubled?

5) What happens to Ophelia?

6) Which of these does Hamlet say? (“What a piece of work is a man!”)

They were also asked to pick the word that best fit the following definitions (with the answers in parentheses):

1) Happiness and laughter (mirth)

2) The appearance of a person’s face (countenance)

3) Wherefore (why)

4) Not working, active, or being used (idle)

5) A roguish or mischievous act (knavery)

The tolerance scale consisted of asking students the extent to which they agreed or disagreed with the following statements, with the negatively framed items reverse-coded:

1) People who disagree with my point of view bother me.

2) Plays critical of America should not be allowed to be performed in our community.

3) I am not interested in learning about people different than me.

4) I think people can have different opinions about the same thing.

5) Women are equally able to do the same jobs that men can do.

6) I like to hear views different from my own.

7) There are multiple ways to interpret the same work of drama.

The version of RMET used in our study was the one developed for adolescent subjects as described and validated in Baron-Cohen, S. Wheelwright, S. Scahill, V. Lawson, J. and Spong, A. (2001). (Are intuitive physics and intuitive psychology independent? A test with children with Asperger Syndrome. Journal of Developmental and Learning Disorders 5:47-78). Because this test is widely used and has been validated by others we did not alter it in any way despite the fact that some British-isms, such as using the word “cross” to mean angry, may have confused our students.

The scale for student interest in participating in theater consisted of the following items:

1) How interested are you in being in a theater performance?

2) If your school were having auditions for a new play, how interested would you be in trying to get a role in that play?

3) How interested are you in taking a drama class?

4) I would be interested in joining a drama club if my school had one.

The scale for student interest in viewing theater consisted of the following items, with the negatively framed items reverse-coded:

1) How interested are you in seeing live performances in a theater?

2) If your friends or family wanted to go to a play, how interested would you be in going?

3) Imagine that a friend of yours is going to go on a field trip. Do you think your friend would enjoy these field trips? A theater performance

4) Would you like more live theater performances in your town?

5) I would tell my friends that they should see a live theater performance.

6) I plan to see live theater performances when I am an adult.

7) Live theater is interesting to me.

8) Trips to see live theater are fun.

9) I feel uneasy in theaters.

10) I feel comfortable talking about theater performances.

All of our scales were built by standardizing and averaging the components of the scales. The effect sizes of results were all computed by using the standard deviation of the control group.

Cronbach’s Alpha tests show that the items reliably measure knowledge, tolerance, interest in participating in theater, and interest in viewing theater. (See Appendix Table 2.) The Cronbach’s Alpha for the RMET scale, however, falls short of conventional standards for reliably measuring the same underlying construct. Because this scale has been validated by other researchers, however, we feel comfortable using it in this analysis. We suspect that some Britishisms and the fact that we incorporated RMET into a larger survey may have produced a lower alpha than what other researchers have found. The fact that we still observe significant effects despite a noisy scale also increases our confidence in using it. None of our scales could be improved substantially by omitting any one item, so we build all scales with all available items that are theoretically connected with the underlying constructs.

Appendix Table 2: Cronbach’s Alpha for Outcome Scales

Scale	Number of Items	Cronbach’s Alpha
Knowledge – Both plays combined	11	0.59
Knowledge – Christmas Carol	11	0.62
Knowledge – Hamlet	11	0.55
Tolerance	7	0.59
Reading Others’ Emotions	28	0.42
Interest in Participating in Theater	4	0.94
Interest in Viewing Theater	10	0.92

Analysis without Assuming Weather Events Are Exogenous

Adverse weather prevented several school groups from seeing performances of A Christmas Carol, and we have treated those events as exogenous and assigned those groups to the control group. Doing so cannot bias any estimates of the treatment because it resulted in there being no treatment students within their matched groupings. Those observations do not contribute directly to the estimate of the treatment effect because there is no variance on the treatment variable within their matched grouping. Leaving them within the analysis, however, does improve the precision of estimates for other covariates, which results in a more precise estimate of the treatment effect.

In this section, we show that we generally get similar results even if we relax that assumption and use other approaches to handling the fact that some groups had to cancel their field trips. First, we present below in Appendix Table 3 the results of our preferred approach with the inclusion of standard errors.

Appendix Table 3 – Results Treating Snow Days as Exogenous

	Knowledge	Tolerance	Reading Others’ Emotions
Treatment	0.63 (0.13)***	0.26 (0.12)**	0.23 (0.11)**

Treatment – Controlling for Reading and Movie-Watching	0.58 (0.13)***	0.31 (0.11)**	0.21 (0.11)*
Read Play or Book for School	0.01 (0.15)	-0.13 (0.12)	-0.04 (0.10)
Watched Movie for School	0.30 (0.12)**	-0.22 (0.11)*	0.11 (0.11)

Treatment – Controlling for Interest in Theater	0.61 (0.13)***	0.22 (0.09)**	0.22 (0.11)*
Interest in Theater	0.24 (0.04)***	0.37 (0.05)***	0.09 (0.04)**
* p < .10, p < .05, * p < .01, two-tailed. Standard error in parentheses.

We could instead use an intention-to-treat approach to estimate our results. That has the advantage of ensuring that there is no bias in our estimate of treatment effects because all groups retain the treatment status they were awarded by the lottery regardless of whether their performance was cancelled for snow. But an intention-to-treat approach has the significant disadvantage of understating the effect of actually being treated, particularly for the large number of school groups whose field trip was cancelled for weather.

The results for the intention-to-treat analyses with standard errors are reported below in Appendix Table 4. As one would expect, the point estimates are lower, but the substantive findings are generally the same. The only important difference is that the effect for the main Tolerance analysis falls short of being statistically significant.

Appendix Table 4 – Results Using Intention-to-Treat Approach

	Knowledge	Tolerance	Reading Others’ Emotions
Treatment	0.44 (0.14)***	0.17 (0.12)	0.17 (0.09)*

Treatment – Controlling for Reading and Movie-Watching	0.39 (0.14)**	0.22 (0.11)*	0.16 (0.09)*
Read Play or Book for School	-0.01 (0.16)	-0.15 (0.13)	-0.05 (0.10)
Watched Movie for School	0.33 (0.12)***	-0.20 (0.11)*	0.12 (0.11)

Treatment – Controlling for Interest in Theater	0.43 (0.14)***	0.15 (0.09)	0.16 (0.08)*
Interest in Theater	0.24 (0.04)***	0.38 (0.05)***	0.09 (0.04)**
* p < .10, p < .05, * p < .01, two-tailed. Standard error in parentheses.

We could estimate the impact on treated using a two-stage model in which the intention to treat is used as an instrument for whether groups actually received the treatment. The advantage of this approach is that we get an estimate of the impact on treated. The disadvantage is that we inflate the standard errors by using a two-stage model, which is particularly important given that there wasn’t any non-compliance from the intention to treat assignment for the Hamlet groups. So to adjust for non-compliance for one play we inflate standard errors for both.

The results for the instrumental variable analyses are reported below in Appendix Table 5. The point estimates are almost identical to the main approach where we treat weather as exogenous, but the standard errors get larger so that the main Tolerance result falls short of statistical significance.

Appendix Table 5 – Results Using Instrumental Variable Approach

	Knowledge	Tolerance	Reading Others’ Emotions
Treatment	0.62 (0.16)***	0.24 (0.16)	0.24 (0.11)**

Treatment – Controlling for Reading and Movie-Watching	0.55 (0.16)***	0.31 (0.15)**	0.22 (0.11)**
Read Play or Book for School	0.01 (0.14)	-0.13 (0.12)	-0.04 (0.09)
Watched Movie for School	0.31 (0.12)***	-0.22 (0.11)**	0.11 (0.11)

Treatment – Controlling for Interest in Theater	0.60 (0.16)***	0.22 (0.12)*	0.23 (0.11)**
Interest in Theater	0.24 (0.04)***	0.37 (0.05)***	0.09 (0.03)**
* p < .10, p < .05, * p < .01, two-tailed. Standard error in parentheses.

And if we use intention to treat to determine baseline equivalence, the results come out basically the same as when we treated weather as exogenous and reassigned groups that had to cancel to the control group. The intention-to-treat baseline equivalence comparisons can be found in Appendix Table 6. Of the 19 baseline characteristics on which we compare the students, those assigned by the lottery to the intention to treat condition are not significantly different from the control group in all but one instance. The intention-to-treat students are still more likely to be from a minority racial or ethnic group by 8.5%. Again, this is a difference that could have been produced by chance and is controlled in the regression models.

Appendix Table 6: Intention to Treat/Control Balance


		Intent to Treat	Control	Difference
Individual	Average Grade	9.3	9.5	-0.2
	Percentage Female	59.5	61.7	-2.2
	Percentage Minority	30.4	21.9	8.5***
	Percentage Agree “I am a good student.”	96.0	98.3	-2.3*
	Percentage Agree “School is boring.”	47.8	45.5	2.3
School	Average Enrollment	969.2	753.4	215.8
	Percentage Homeless	3.1	2.0	1.1
	Percentage FRL	46.4	49.7	-3.3
	Average School Poverty Index	84.7	87.8	-3.1
	Percentage White	69.9	69.7	0.2
	Percentage Hispanic	13.9	16.3	-2.4
	Percentage Black	30.9	30.3	0.6
	Percentage Other Race	10.9	9.2	1.7
	Percentage Minority	30.1	30.3	-0.2
	Percentage GT	8.6	9.3	-0.7
	Percentage SPED	8.9	8.6	0.3
	Percentage LEP	10.7	13.5	-2.8
	Average Miles from Theater	28.7	34.1	-5.4
	Average Minutes from Theater	33.4	39.4	-6.0
* p < .10, *** p < .01, two-tailed.

		Treatment	Control
Number of Students Per Group	All	428	242
	Christmas Carol	199	148
	Hamlet	229	94
Applicant Groups	All	30	19
	Christmas Carol	14	10
	Hamlet	16	9

Methodological Appendix for the Live Theater Experimental Study

Latest Issue

Summer 2025

NEWSLETTER

Business + Editorial Office

Discover

More Information