3 Results

Our search yielded 5,989 citations (see Figure 1). We retained 565 citations (11%) for full-text eligibility assessment. Of the 565 full-texts assessed, we identified 16 eligible systematic reviews (3% of full-texts assessed for eligibility; <1% of all citations identified in our search). These 16 systematic reviews included 326 primary studies, of which 35 (11%) met our eligibility criteria. In summary, we included 16 systematic reviews and 35 primary studies in this overview.

3.1 Characteristics of Included Systematic Reviews

Overall, these 16 reviews were published between 2011 and 2023, with a median publication year of 2017 (see Table 1). Across all reviews, 23 bibliographic databases and search engines were used to identify eligible studies; review authors most commonly searched PsycInfo (16 reviews; 100%), Medline or PubMed (13 reviews; 81%), and ERIC (8 reviews; 50%) and Google Scholar (8 reviews; 50%). When reported, the year that each review last conducted their literature search ranged from 2010 to 2021, with a median of 2018. Regarding research transparency, 11 reviews (69%) included a flow diagram of information through the different phases of a systematic review (Page et al., 2021), 8 reviews (50%) reported a registration number for their review (Booth et al., 2012), and 5 reviews (31%) provided statements about the data and code underlying the review (Page et al., 2022).

3.1.1 Methodological Quality and Risk of Bias of Included Systematic Reviews

AMSTAR-2 assessments (see Table 2) yielded methodological quality ratings of moderate for 2 reviews (13%), low for 6 reviews (38%), and critically low for 8 reviews (50%). Weaknesses of methodological quality in at least half of reviews included not reporting funding sources for included studies (16 reviews; 100%), justification for excluding individual studies (15 reviews; 94%), not using a comprehensive literature search strategy (12 reviews; 75%), not explaining the selection criteria used to identify included study designs (11 reviews; 69%), not using a satisfactory technique for assessing risk of bias in included studies (11 reviews; 69%), not having publicly available analysis intentions (8 reviews; 50%), and not assessing the impact of risk of bias in individual studies on evidence synthesis results (8 reviews; 50%). ROBIS assessments (see Table 3) yielded ratings of low risk of bias for 4 reviews (25%) and high risk of bias for 12 reviews (75%). Concerns about risk of bias were most commonly related to not assessing risk of bias formally using appropriate criteria (10 reviews; 63%) and not searching an appropriate range of sources for published and unpublished reports (8 reviews; 50%).

3.1.2 Overlap of Primary Studies Across Systematic Reviews

The number of primary studies included in each review ranged from 11 to 136, with a median of 44. Of the 326 primary studies included across all reviews, 155 studies (47.5%) were included in more than one review (see Figure 2). Of the 35 primary studies meeting our eligibility criteria, 26 studies (74.3%) were included in more than one review (see Figure 3). The overall CCA percentage for all primary studies included across reviews (11.1%) was lower than the overall CCA percentage for primary studies meeting our eligibility criteria (24.6%). These numbers indicate high overlap across reviews in all primary studies, with very high overlap across reviews in primary studies eligible for our overview (i.e., school-based anxiety prevention delivered directly to students during school hours). Importantly, no single review identified all primary studies eligible for our overview—including reviews directly focused on school-based anxiety prevention (Caldwell et al., 2019; Feiss et al., 2019; Gallegos et al., 2012; Mychailyszyn, 2011; Werner-Seidler et al., 2021; Zhang et al., 2023) as opposed to reviews that had a more general remit (e.g., anxiety prevention across all ages and settings) or specific remit (only universal or targeted school-based anxiety prevention interventions).

3.2 Characteristics of Eligible Primary Studies

The 35 primary studies on interventions delivered during school hours (see Table 4) were published between 1995 and 2021 (with a median of 2011). Most studies (n = 21; 60%) did not report start dates for participant recruitment; among the 14 studies (40%) that did, start dates for participant recruitment ranged from 2000 to 2016 (with a median of 2009). Most studies (n = 23; 66%) also did not report start dates for participant recruitment; among the 12 studies (34%) that did, end dates for completing data collection ranged from 2003 to 2017 (with a median of 2009). Unless otherwise indicated, the percentages below include all studies in the denominator (including those with missing data).

3.2.1 Study Design

Regarding number of interventions and comparisons trialed per study, 27 studies (77%) had two groups, and 8 studies (23%) had three groups. In terms of study design, 3 studies (9%) randomized individual students to interventions, 28 studies (80%) randomized clusters of students to interventions, and 4 studies (11%) used non-random assignment. Of the 28 cluster-randomized trials, 20 studies (71%) randomized schools, 7 studies (25%) randomized classrooms, and 1 (4%) study randomized districts. Among the cluster-randomized trials reporting this information, the average cluster size ranged from 10 to 874 (with a median of 36). Regarding research transparency, 17 studies (49%) included a flow diagram of participants through the different phases of the study, 7 studies (20%) reported a registration number for their study, and 2 studies (6%) provided availability statements about the materials, data, and/or code underlying the results that they reported.

3.2.2 Participants

The 35 included primary studies included 17,950 students (median: 303 students per study, range: 20 to 2,288) and at least 380 schools (median: 7 schools per study, range: 1 to 63, one study did not report number of schools). Only 13 studies (37%) reported the number of classrooms. Of the 27 studies (77%) reporting student age, the average age of students ranged from 8.8 to 16.3 (with a median of 10.3). Regarding school level, 25 studies (71%) included elementary schools, 18 studies (51%) included middle schools, and 6 studies (17%) included high schools. The most common grade levels were 4th grade (15 studies; 40%), 6th grade (12 studies; 34%), 5th grade (11 studies; 31%), 3rd grade (10 studies; 29%), and 7th grade (9 studies; 26%). The average percentage of students who were female ranged from 0% to 100% (with a median of 52%).

3.2.3 Settings

The most common countries in which studies took place were Australia (15 studies; 43%), Canada (5 studies; 14%), United Kingdom (4 studies; 11%), and Ireland (2 studies; 6%); no included studies took place in the United States. Of the 25 studies (71%) reporting information on area type, 8 studies (32%) took place in rural areas, 4 studies (16%) took place in suburban areas, and 20 studies (80%) took place in suburban areas. Of the 21 studies (60%) reporting information on school type, 16 studies (76%) took place in public schools, 4 studies (19%) took place in private schools, and 5 studies (24%) took place in parochial schools.

3.2.4 Experimental Interventions

Across all 35 primary studies, 43 experimental interventions were evaluated. Regarding format of the 43 experimental interventions, 6 (14%) were administered to individuals, 5 (12%) were administered to small groups, and 33 (77%) were administered to whole classes. The duration of interventions ranged from 1 to 20 weeks (with a median of 9), the number of sessions ranged from 2 to 20 (with a median of 9), and the length of each session ranged from 30 to 120 minutes (with a median of 55). Of the 42 interventions for which frequency of intervention delivery was reported, 3 interventions (7%) were delivered 2-4 times a week, 37 interventions (88%) were delivered once a week, and 2 interventions (5%) were delivered less than weekly. Of the 42 interventions for which provider information was reported, 13 interventions (41%) by behavioral health personnel, 11 interventions (26%) by researchers, 22 interventions (52%) were delivered by teachers, 7 (17%) were self-administered, 6 interventions (14%) by counselors, 1 intervention (2%) by other school personnel, and 2 interventions (5%) by other providers. All interventions (100%) were delivered directly to students, 14 interventions (33%) to family, and 1 (1%) each to teachers, staff, and the community. In addition to delivery during school hours, 11 interventions (26%) also involved a component that took place out-of-school time but on school groups (e.g., after school), 3 interventions (7%) involved a component that took place at home, and 1 intervention (2%) involved a component that took place in the community. For intervention mode, 40 interventions (93%) involved in-person delivery, and 7 interventions (16%) involved digital delivery. Across the 43 interventions, 33 (77%) reported monitoring intervention implementation to assess delivery as intended, and 11 (26%) reported that there was uncontrolled variation or degradation in intervention implementation.

3.2.5 Comparison Interventions

Overall, the 35 primary studies had 35 comparison interventions. Of these 35 comparison interventions, 15 (43%) involved business-as-usual, 6 (17%) involved no intervention, 10 (29%) involved a wait-list control, 1 (3%) involved a different active intervention, and 3 (9%) involved an attention-control intervention.

3.2.6 Risks of Bias

Of the 3 individually-randomized trials, we rated 2 studies (66%) as high risk of bias due to problems in the randomization process and missing outcome data. We rated the other 1 study (33%) as having only some concerns. Of the 28 cluster-randomized trials, we rated 18 studies (64%) as high risk of bias for at least one follow-up, and the other 10 studies (36%) as having only some concerns. Of the 4 non-randomized trials, we rated 3 studies (75%) as critical risk of bias due to confounding and missing outcome data. We rated the other 1 study (25%) as having serious risk of bias.

3.3 Effects of School-Based Anxiety Prevention Interventions

We first summarize the most applicable finding from each eligible review. We then report results for meta-analyses (and narrative findings when meta-analyses were not possible) of primary studies meeting our eligibility criteria.

3.3.1 Findings from Eligible Systematic Reviews

Meta-analyses from previous reviews have generally found that school-based anxiety prevention interventions may have modest positive impacts on anxiety. Several reviews found positive impacts of school-based anxiety prevention interventions for children and adolescents in general (Feiss et al., 2019; Werner-Seidler et al., 2021; Zhang et al., 2023). In addition, Zhelinsky-Denyer (2015) found no difference between school-based and non-school-based anxiety prevention interventions for youth 14 years of age and younger. In terms of stages of prevention, several reviews (Gee et al., 2020; Hugh-Jones et al., 2021; Mychailyszyn, 2011) found that targeted school-based interventions had a significant effect on reducing anxiety symptoms, while Ahlen et al (2015) and Johnstone et al. (2018) had conflicting findings about the impact of universal programs on anxiety symptoms. In terms of types of intervention, reviews specifically found positive impacts on anxiety for the FRIENDS program (Maggin & Johnson, 2014), cognitive-behavioral prevention interventions (Caldwell et al., 2019; Gallegos et al., 2012), and resilience-focused interventions (Dray et al., 2017). In terms of provider, Teubert and Pinquart (2011) did not detect a statistically significant effect for interventions delivered by teachers.

3.3.2 Anxiety Diagnosis

Eight studies (23%) comprised of 5,286 students (29%) reported data on anxiety diagnosis that could be used in meta-analysis. These studies measured anxiety diagnosis through established thresholds for either Spence Children’s Anxiety Scale (SPAC; 5 studies; 63%) or the General Anxiety Disorder-7 (GAD-7), Multidimensional Anxiety Scale for Children (MASC), or Social Phobia and Anxiety Inventory for Children (SPAI-C; 1 study each, 13% each). Length of follow-up for outcome measurement ranged from 0 to 35 weeks post-intervention (with a median of 0 weeks). Each study contributed 1 effect estimate to the meta-analysis. Students receiving anxiety prevention interventions had a 35% reduced risk of meeting criteria for an anxiety diagnosis relative to students in control groups (risk ratio = 0.65, 95% CI [0.42 to 1.01], 95% PI [0.34 to 1.26], \(I^2\) = 17%, \(\tau^2\) = 0.06). The TOST procedure indicated that the observed effect size (Risk Ratio\(_{\log}\) = -0.426) was not significantly within the equivalent bounds of \(RR_{\log}\) +/-0.69 (Z = 1.223 p = 0.889). Using 11% (110 per 1,000) for the assumed baseline risk of a current anxiety disorder diagnosis (Ghandour et al., 2019), this relative risk reduction translates into 39 fewer students meeting criteria for an anxiety diagnosis per every 1,000 who received a anxiety prevention intervention rather than a comparator, or a “number needed to treat” of 17. However, the prediction interval indicates that the possible underlying effect in a new study could range from a risk ratio of 0.27 to 1.66 (i.e., 73% reduced risk to 66% increased risk to meet criteria for an anxiety diagnosis). The estimated probability that the true effect of school-based anxiety prevention interventions on anxiety diagnosis will be null or better in a new study was 88%.

3.3.3 Subsyndromal Anxiety

No studies provided outcome data on subsyndromal anxiety.

3.3.4 Anxiety Symptoms

Twenty-eight studies (80%) comprised of 14,844 students (83%) reported data on anxiety symptoms that could be used in meta-analysis. These studies measured anxiety symptoms through the SCAS (18 studies, 64%), MASC (5 studies, 18%), Revised Childres Manifest Anxiety Scale (RCMAS; 5 studies, 18%), GAD-7 (3 studies, 11%), Revised Child Anxiety and Depression Scale (RCADS; 2 studies, 7%), Social Anxiety Scale for Adolescents (SAS-A; 2 studies, 7%), SPAI-C (1 study, 4%), Screen for Child Anxiety Related Disorders (SCARED; 1 study, 4%), and Anxiety Scale for Children (EAN; 1 study, 4%). The length of follow-up for outcome measurement ranged from 0 to 234 weeks post-intervention (with a median of 26 weeks). Each study provided 1 to 18 effect estimates for depression symptoms (median = 3, total = 118). We found that students receiving anxiety prevention interventions had lower anxiety symptoms relative to students in control groups (standardized mean difference = -0.09, 95% CI [-0.16, -0.01], 95% PI [-0.38, 0.21], \(I^2\) = 18%, \(\tau^2\) = 0.01). The TOST procedure indicated that the observed effect size (d = -0.09) was not significantly within the equivalent bounds of SMD +/-0.10 (Z = -0.27, p = 0.394). Using empirically-based benchmarks (Tanner-Smith et al., 2018), this effect size represents a small effect equal to the 25th percentile of the distribution of mean effects obtained for universal prevention programs focused on internalizing problems among school-aged youth. However, the prediction interval indicates that the possible underlying effect in a new study could range from a standardized mean difference of -0.38 to 0.21 (i.e., a medium beneficial effect to a small harmful effect). The estimated probability that the true effect of school-based anxiety prevention interventions on anxiety symptoms will be null or lower in a new study was 83%.

3.3.5 Depression Symptoms

Seventeen studies (49%) comprised of 12,007 students (67%) reported data on depression symptoms that could be used in meta-analysis. These studies measured depression symptoms through the Childrens Depression Inventory (CDI; 10 studies, 59%), Centre for Epidemiological Studies - Depression Scale (CES-D; 2 studies, 12%), Short Mood and Feelings Questionnaire (SMFQ; 2 studies, 12%), Depression Questionnaire for Children (CDN; 1 study, 6%), Patient Health Questionnaire - short form (PHQ-5; 1 study, 6%), and Revised Child Anxiety and Depression Scale (RCADS; 1 study, 6%). The length of follow-up for outcome measurement ranged from 0 to 234 weeks post-intervention (with a median of 26 weeks). Each study provided 1 to 12 effect estimates for anxiety symptoms (median = 2, total = 52). We found that students receiving anxiety prevention interventions had lower depression symptoms relative to students in control groups (standardized mean difference = -0.07, 95% CI [-0.13, -0.01], 95% PI [-0.27, 0.13], \(I^2\) = 0%, \(\tau^2\) = 0). The TOST procedure indicated that the observed effect size (d = -0.07) was not significantly within the equivalent bounds of SMD +/- 0.10 (Z = 1.17, p = 0.122). Using empirically-based benchmarks (Tanner-Smith et al., 2018), this effect size is potentially trivial, as it is less than the 25th percentile of the distribution of mean effects obtained for universal prevention programs focused on internalizing problems among school-aged youth. In addition, the prediction interval indicates that the possible underlying effect in a new study could range from a standardized mean difference of -0.27 to 0.13 (i.e., a medium beneficial effect to a trivial negative effect).

3.3.6 Educational Achievement

Three studies (9%) comprised of 2,402 students (13%) reported outcome data on educational achievement. Skryabina et al. (2016) found national standardized test scores in reading, writing, and math did not differ between groups at 52-week follow-up. Collins et al. (2014) found no significant differences in spelling between the psychologist-led anxiety prevention program and comparison group students, but they did find significant differences between the teacher-led anxiety prevention program and comparison group students. Ahlen et al., (2018) found no significant difference between groups on teacher-rated academic performance at post-intervention and 52-week follow-up.

3.3.7 Self-Harm

No studies provided outcome data on self-harm.

3.3.8 Stress

No studies provided outcome data on stress.

3.3.9 Substance Use

No studies provided outcome data on substance use.

3.3.10 Suicidal Ideation

One study (3%) comprised of 2,288 students (13%) reported outcome data on suicidal ideation. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Roberts et al. (2018) found significantly lower incidence rates of suicidal ideation at post-intervention and 52-weeks follow-up among students receiving anxiety prevention programs compared to control group students.

3.3.11 Well-being

Five studies (14%) comprised of 4,388 students (24%) reported meta-analyzable data on well-being. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Two studies (40%) measured well-being using the Warwick-Edinburgh Mental Well-being Scale (WEMWBS), while one study each (20% each) used the Culture-free Self Esteem Questionnaire, Kessler psychological distress scale - short form (K6), and Total Life Satisfaction questionnaire. We did not detect statistically significant differences between groups on student well-being (SMD = -0.06, 95% CI [-0.18, 0.06], 95% PI [-0.18, 0.06], \(I^2\) = 0%, \(\tau^2\) = 0). The TOST procedure indicated that the observed effect size (d = -0.06) was not significantly within the equivalent bounds of SMD +/- 0.10 (Z = -0.64, p = 0.261).

3.4 Variation in Effects by Methodological, Demographic, and Intervention Characteristics

Tests for residual heterogeneity remained significant in all meta-regression models (i.e., in every model, the variability in the observed effect sizes not accounted for by the moderator included in the model is larger than would be expected based on sampling variability alone). For type of type of assignment to interventions (randomized or not), the moderator test was statistically significant (F[1,18.65] = 68.1, p < 0.0001): trials using random assignment had significantly lower effect sizes (i.e., greater reductions in anxiety symptoms) compared to the one study using non-random assignment (non-random SMD = 0.22; random SMD = -0.31, 95% CI [-0.39 to -0.23]). The moderator test was also statistically significant for type of school (F[1, 16.3] = 7.9, p = 0.01): studies including public schools had significantly higher effect sizes (i.e., smaller reductions in anxiety symptoms) compared to studies not including public schools (no public schools SMD = -0.21, 95% CI [-0.36 to -0.06; public schools SMD = -0.002, 95% CI [-0.08 to 0.09]). Based on this meta-regression results, we conducted the TOST procedure on studies that included public schools, which indicated that the observed effect size was significantly within the equivalent bounds of SMD +/- 0.10 (Z = -2.674, p = 0.004)—suggesting a null effect on average in studies that included public schools. As “studies including public schools” involved studies with a mix of public and other school types, we ran an additional exploratory analysis with five moderator levels: studies with public schools only, private schools only, parochial schools only, mix of school types, and “cannot tell” type of school. The moderator test for this analysis was not statistically significant (F[5, 1.17] = 11.2, p = 0.185). We did not detect statistically significant associations between anxiety symptoms and school level (secondary or not), country (Australia or not), level of prevention (universal or targeted), type of comparator (business as usual or not), years since publication, percent female in the sample, and risk of bias (high or not).