3 Results

Our search yielded 4,784 citations (see Figure 1). We retained 562 citations (11.7%) for full-text eligibility assessment. Of the 562 full-texts assessed, we identified 24 eligible systematic reviews (4.3% of full-texts assessed for eligibility; 0.5% of all citations identified in our search). These 24 systematic reviews included 450 primary studies. After assessing the full texts of these 450 primary studies, we identified 70 (15.6%) that met our eligibility criteria. In summary, we included 24 systematic reviews and 70 primary studies in this overview.

3.1 Characteristics of Included Systematic Reviews

Overall, these 24 reviews were published between 2006 and 2023, with a median publication year of 2017 (see Table 1). These reviews most commonly searched APA PsycInfo (23 reviews; 95.8%), MEDLINE only (10 reviews; 41.7%), PubMed (10 reviews; 41.7%), ERIC (9 reviews; 37.5%), and Cochrane Library (8 reviews; 33.3%) to identify primary studies. The year that each review last conducted their literature search ranged from 2007 to 2021, with a median of 2017. Regarding research transparency, 20 reviews (83.3%) included a flow diagram of information through the different phases of a systematic review (Page et al., 2021), 8 reviews (33.3%) reported a registration number for their review (Booth et al., 2012), and 4 reviews (16.7%) provided statements about the data and code underlying the review (Page et al., 2022).

3.1.1 Methodological Quality and Risk of Bias of Included Systematic Reviews

AMSTAR-2 assessments (see Table 2) yielded methodological quality ratings of low for 2 reviews (8.3%) and critically low for 22 reviews (91.7%). The most common critical weaknesses in methodological quality were not providing justification for excluding individual studies (20 reviews; 83.3%), not registering a protocol prior to conducting the review (16 reviews; 66.7%), and inadequate literature search process (15 reviews; 62.5%). The most common non-critical weaknesses in methodological quality were not reporting funding sources for included studies (24 reviews; 100%), not assessing the impact of risk of bias in individual studies on evidence synthesis results (15 reviews; 62.5%), and not explaining the selection criteria used to identify included study designs (15 reviews; 62.5%). ROBIS assessments (see Table 3) yielded ratings of low risk of bias for 3 reviews (12.5%), unclear risk of bias for 1 review (4.2%), and high risk of bias for 20 reviews (83.3%). Concerns about risk of bias were most commonly related to study eligibility criteria (21 reviews; 83.3%) and synthesis and findings (19 reviews; 79.2%).

3.1.2 Overlap of Primary Studies Across Systematic Reviews

The number of primary studies included in each review ranged from 9 to 137, with a median of 38. Of the 450 primary studies included across all reviews, 158 studies (35.1%) were included in more than one review (see Figure 2). Of the 70 primary studies meeting our eligibility criteria, 47 studies (67.1%) were included in more than one review (see Figure 3). The overall CCA percentage for all primary studies included across reviews (6.8%) was lower than the overall CCA percentage for primary studies meeting our eligibility criteria (19.2%). These numbers indicate only moderate overlap across reviews in all primary studies, though very high overlap across reviews in primary studies eligible for our overview (i.e., school-based depression prevention delivered directly to students during school hours). Importantly, no single review identified all primary studies eligible for our overview—including reviews directly focused on school-based depression prevention (Caldwell et al., 2019b; Werner-Seidler et al., 2021; Zhang et al., 2022) as opposed to reviews that had a more general remit (e.g., depression prevention across all ages and settings).

3.1.3 Findings from Eligible Systematic Reviews

Meta-analyses from previous reviews have generally found that school-based depression prevention interventions may have modest positive impacts on depression. Several reviews found positive impacts of school-based depression prevention interventions for children and adolescents in general (Cowen, 2014; Davaasambuu et al., 2020; Mychailyszyn, 2011; van Zoonen et al., 2014; Werner-Seidler et al., 2021), though some cautioned more heavily that effects were modest and the evidence not robust (Caldwell et al., 2021). In terms of school level, Havlik (2020) found school-based mental health interventions to have small effects on depression for secondary students. In terms of stages of prevention, Johnstone et al. (2018) and Ahlen et al. (2015) both found that universal school-based prevention programs led to significantly fewer depressive symptoms, Gee et al. (2020) found that indicated school-based interventions had a small effect on reducing depression symptoms, and Garber et al. (2016) found positive effects in targeted samples. In terms of types of intervention, reviews specifically found positive impacts on depression for resilience-focused interventions (Dray et al., 2017; Ma et al., 2020), third-wave (Caldwell et al., 2021) and mindfulness interventions (Reangsing et al., 2021), and cognitive-behavioral approaches (Kavanagh et al., 2009; Zhang et al., 2022)—including in a group-format (Ssegonja et al., 2019) with a psycho-education component (Caldwell et al., 2021), or with hopeful elements (Venning et al., 2009). Two reviews had conflicting findings on whether specifically the Penn Resiliency Program did (Brunwasser et al., 2009) or did not (Bastounis et al., 2016) have a positive impact on depression. In terms of participant age, Feiss et al. (2019) found school-based depression prevention programs significantly reduced depressive symptoms for adolescents. Horowitz & Garber (2006) and Stockings et al. (2016) found that universal, selective, and indicated prevention programs for young people reduce depression symptoms and the relative risk for a depression disorder, respectively; while they included programs in non-school settings, the majority of studies took place in schools. Looking across several of these moderators, Caldwell et al. (2021) found weak evidence supporting the use of cognitive-behavioral interventions (alone and with interpersonal therapy) in universal secondary school settings, while Zhang et al. (2022) concluded that secondary schools should strive to implement cognitive-behavioral interventions delivered through clinicians where possible. However, reviews generally noted concerns about study quality, heterogeneity, and (in some instances) publication bias in this body of evidence (Caldwell et al., 2021; Werner-Seidler et al., 2021; Zhang et al., 2022). In addition, no review provided meta-analyses that matched our specific eligibility criteria for primary studies, leading to the following information and meta-analyses to confirm whether findings were similar for eligible primary studies across all reviews.

3.2 Characteristics of Eligible Primary Studies

The 70 primary studies on interventions delivered during school hours (see Table 4) were published between 1993 and 2020 (with a median of 2011). Start dates for participant recruitment ranged from 1989 to 2016 (with a median of 2008); end dates for completing data collection ranged from 1993 to 2017 (with a median of 2009). Unless otherwise indicated, the percentages below include all studies in the denominator (including those with missing data).

3.2.1 Study Design

Regarding number of interventions and comparisons trialed per study, 53 studies (75.7%) had two groups, 12 studies (17.1%) had three groups, and 5 studies (7.1%) had four groups. In terms of study design, 27 studies (38.6%) randomized individual students to interventions, 36 studies (51.4%) randomized clusters of students to interventions, and 7 studies (10%) used non-random assignment. Of the 36 cluster-randomized trials, 19 studies (52.8%) randomized classrooms, and 17 studies (47.2%) randomized schools. Among the cluster-randomized trials reporting this information, the average cluster size ranged from 11 to 114 (with a median of 22.9). Regarding research transparency, 38 studies (54.3%) included a flow diagram of participants through the different phases of the study, 14 studies (100%) reported a registration number for their study, and 12 studies (17.1%) provided statements about the data and code underlying the results that they reported.

3.2.2 Participants

The 70 included primary studies included 45,519 students (median: 209 students per study, range: 15 to 5,634), 332 classrooms (median: 13 classrooms per study, range: 0 to 66), and 570 schools (median: 4 schools per study, range: 1 to 63). The average age of students ranged from 8.8 to 17.3 (with a median of 13.6). Regarding school level, 14 studies (20.0%) took place in primary/elementary school, 15 studies (21.4%) took place in intermediate/middle school, and 39 studies (55.7%) took place in secondary/high school. The most common grade levels were 1st grade (32 studies; 45.7%), 9th grade (25 studies; 35.7%), 7th grade (19 studies; 27.1%), 8th grade (18 studies; 25.7%), and 10th grade (16 studies; 22.9%). The average percentage of students who were female ranged from 0% to 100% (with a median of 52.5%). Of the 22 studies conducted in the United States, the average percentage of students who were white ranged from 0% to 98.9% (with a median of 61.9%). No studies conducted in the United States reported the percentage of English language learners. Two studies reported the percentage of students receiving free or reduced price lunch: Pössel et al. (2013) had 29% and McLaughlin (2010) had 46%.

3.2.3 Settings

Studies took place in Australia (25 studies; 35.7%), United States (22 studies; 31.4%), United Kingdom (4 studies; 5.7%), and Netherlands (3 studies; 4.3%). Of the 22 studies conducted in the United States, the most common states were Pennsylvania (5 studies; 27.3%) and Washington (2 studies; 9.1%). Of the 39 studies (55.7%) reporting information on area type, 14 studies (35.9%) took place in rural areas, 26 studies (66.7%) took place in urban areas, and 6 studies (15.4%) took place in suburban areas. Of the 31 studies (44.3%) reporting information on school type, 24 studies (77.4%) took place in public schools, 13 studies (41.9%) took place in private schools, and 1 study (3.2%) took place in charter schools.

3.2.4 Experimental Interventions

Across all 70 primary studies, 84 experimental interventions were evaluated. Regarding format of the 84 experimental interventions, 10 (11.9%) were administered to individual students, 41 (48.8%) were administered to small groups, 32 (38.1%) were administered to whole classes, and 1 (1.2%) was administered to whole schools. The duration of interventions ranged from 0.14 to 104 weeks (with a median of 8), the number of sessions ranged from 1 to 30 sessions (with a median of 9), and the length of each session ranged from 25 to 120 minutes (with a median of 50). Of the 59 interventions for which frequency of intervention delivery was reported, 4 interventions (6.8%) were delivered less than weekly, 50 interventions (84.7%) were delivered once a week, 4 interventions (6.8%) were delivered 2-4 times a week, and 1 intervention (1.7%) involved daily contact. Of the 81 interventions for which provider information was reported, 31 interventions (38.3%) by behavioral health personnel, 30 interventions (37.0%) by researchers, 24 interventions (29.6%) were delivered by teachers, 8 (9.9%) were self-administered, 3 interventions (3.7%) by guidance counselors, 4 interventions (4.9%) by other school personnel, and 8 interventions (9.9%) by other providers. Of the 84 interventions for which recipient information was reported, all interventions (100%) were delivered directly to students, 10 interventions (11.9%) to family, 1 intervention (1.2%) to peers, 1 intervention (1.2%) to teachers, 1 intervention (1.2%) to staff, and 1 intervention (1.2%) to the community. In addition to delivery during school hours, 7 interventions (8.3%) also involved a component that took place out-of-school time but on school groups (e.g., after school), 6 interventions (7.1%) involved a component that took place at home, and 1 intervention (1.2%) involved a component that took place in the community. Of the 83 interventions for which intervention mode was reported, 78 interventions (94.0%) involved in-person delivery, 9 interventions (10.8%) involved digital delivery, and no interventions (0%) involved phone delivery. Across the 70 primary studies, 49 (70.0%) reported monitoring intervention implementation to assess delivery as intended, and 10 (14.3%) reported that there was uncontrolled variation or degradation in intervention implementation.

3.2.5 Comparison Interventions

Overall, the 70 primary studies had 77 comparison interventions. Of these 74 comparison interventions, 33 (42.9%) involved business-as-usual, 16 (20.8%) involved no intervention, 10 (13.0%) involved a wait-list control, 9 (11.7%) involved a different active intervention, and 7 (9.1%) involved an attention-control intervention.

3.2.6 Risks of Bias

Of the 27 individually-randomized trials, we rated 15 studies (55.6%) as high risk of bias and 12 studies (44.4%) as having some concerns. Concerns about risk of bias were most commonly related to missing outcome data (10 studies; 37%), random allocation (5 studies; 18.5%), selective reporting (3 studies; 18.5%), and assignment/adherence deviation (2 studies; 7.4%). Of the 36 cluster-randomized trials, we rated 22 studies (61.1%) as high risk of bias and 14 studies (38.9%) as having some concerns. Concerns about risk of bias were most commonly related to missing outcome data (16 studies; 44.4%), assignment/adherence deviation (10 studies; 27.8%), measurement (5 studies; 13.9%), and recruitment (3 studies; 8.3%). Of the 7 non-randomized trials, we rated 4 studies (57.1%) as serious risk of bias and 3 studies (42.9%) as having moderate risk of bias. Serious or critical concerns about risk of bias were most commonly related to confounding (4 studies; 57.1%).

3.3 Effects of Interventions in Eligible Primary Studies

We were able to conduct meta-analyses for depression diagnosis, depression symptoms, and anxiety symptoms. We provide narrative findings for subsyndromal depression, educational achievement, self-harm, stress, substance use, suicidal ideation, and well-being.

3.3.1 Depression Diagnosis

Twelve studies (17.1%) comprised of 9,838 students (20.1%) reported data on depression diagnosis that could be used in meta-analysis. These studies measured depression diagnosis through either Children’s Depression Inventory (3 studies; 25%), Center for Epidemiologic Studies Depression Scale (2 studies; 16.7%), structured clinical interviews (2 studies; 16.7%), Beck Depression Inventory (1 study; 8.3%), Depression Anxiety Stress Scale (1 study; 8.3%), Diagnostic Interview for Children and Adolescents (1 study; 8.3%), Major Depression Inventory (1 study; 8.3%), and Mood and Feelings Questionnaire (1 study; 8.3%). Length of follow-up for outcome measurement ranged from 0 to 78.2 weeks post-intervention (with a median of 26.1 weeks). Each study provided 1 to 6 effect estimates (median = 2, total = 33).

Students receiving depression prevention interventions had a 33% reduced risk of meeting criteria for a depression diagnosis relative to students in control groups (risk ratio = 0.67, 95% CI [0.48 to 0.93], 95% PI [0.27 to 1.66], \(I^2\) = 67%, \(\tau^2\) = 0.14). Using 17% (170 per 1,000) for the assumed baseline risk of experiencing a major depressive episode in the past year (Substance Abuse and Mental Health Services Administration, 2021), this relative risk reduction translates into 56 [12 to 88] fewer students meeting criteria for a depression diagnosis per every 1,000 who received a depression prevention intervention rather than a comparator, or a “number needed to prevent” of 18 [11 to 85]. However, the prediction interval indicates that the possible underlying effect in a new study could range from a risk ratio of 0.27 to 1.66 (i.e., 27% to 166% as likely to meet criteria for a depression diagnosis). The estimated probability that the true effect of school-based depression prevention interventions on depression diagnosis will be null or better in a new study was 82.7%.

3.3.2 Subsyndromal Depression

Three studies (4.3%) comprised of 1,674 students (3.8%) reported meta-analyzable data on subsyndromal depression. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. These studies measured subsyndromal depression through the Children’s Depression Inventory (2 studies; 66.7%) and the Revised Children’s Attributional Style Questionnaire (1 study; 33.3%). The length of follow-up for outcome measurement ranged from 0 to 104.3 weeks post-intervention (with a median of 16 weeks). Each study provided 2 to 5 effect estimates for subsyndromal depression (median = 2, total = 9). All three studies did not find a statistically significant difference between intervention and comparison group students in subsyndromal depression. O’Kearney et al. (2006) found no significant differences between intervention and comparison groups in the proportion of high vulnerability status at post-intervention (odds ratio = 1.30) and 16-week follow-up (odds ratio = 1.56). Pophillat et al. (2016) reported no significant difference between intervention and comparison groups in the proportion of students who moved from healthy at pre-test to at-risk at post-test (odds ratio = 1.17, 95% CI [0.49, 2.82], p = 0.719) nor those who moved from at-risk at pre-test to healthy at post-test (odds ratio = 1.00, 95% CI [0.42, 2.40], p = 0.997). Tak et al. (2016) did not find a statistically significant difference between intervention and comparison groups in the prevalence of an elevated level of depressive symptoms at post-intervention (odds ratio = 0.50, 95% CI [0.90, 3.02], p=.102) or 1-year follow-up (odds ratio = 1.00, 95% CI [0.60, 1.65], p=.992). We could not conduct analyses to examine sources of heterogeneity in underlying effects due to a limited number of studies.

3.3.3 Depression Symptoms

Sixty studies (85.7%) comprised of 37,705 students (84.7%) reported data on depression symptoms that could be used in meta-analysis. These studies measured depression symptoms through the Children’s Depression Inventory (25 studies; 41.7%), Center for Epidemiologic Studies Depression Scale (16 studies; 26.7%), Reynolds Adolescent Depression Scale (6 studies; 10%), Mood and Feelings Questionnaire (5 studies; 8.3%), Beck Depression Inventory (4 studies; 6.7%), Depression Anxiety Stress Scale (4 studies; 6.7%), Reynolds Child Depression Scale (2 studies; 3.6%), Revised Children’s Depression Rating Scale (1 study; 1.7%), Cuestionario de Depresión para Niños (1 study; 1.7%), Major Depression Inventory (1 study; 1.7%), Patient Health Questionnaire-9 (1 study; 1.7%); Peer Nomination Inventory for Depression (1 study; 1.7%), Revised Children’s Anxiety and Depression Scale (1 study; 1.7%), Selbstbeurteilungsbogen-Depressive Stoerungen (1 study; 1.7%), and Structured Clinical Interview for DSM-IV Disorders 1 study; 1.7%). The length of follow-up for outcome measurement ranged from 0 to 234.6 weeks post-intervention (with a median of 26.1 weeks). Each study provided 1 to 18 effect estimates for depression symptoms (median = 3, total = 239).

We found that intervention group students had lower depression symptoms relative to students in control groups (standardized mean difference = -0.12, 95% CI [-0.20, -0.04], 95% PI [-0.57, 0.33], \(I^2\) = 71%, \(\tau^2\) = 0.05). Using empirically-based benchmarks (Tanner-Smith et al., 2018), this effect size represents a medium effect equal to the 50th percentile of the distribution of mean effects obtained for universal prevention programs focused on internalizing problems among school-aged youth. However, the prediction interval indicates that the possible underlying effect in a new study could range from a standardized mean difference of -0.57 to 0.33 (i.e., a large positive effect to a negative effect). The estimated probability that the true effect of school-based depression prevention interventions on depression symptoms will be null or lower in a new study was 69.8%.

3.3.4 Anxiety Symptoms

Twenty-three studies (32.9%) comprised of 20,386 students (45.8%) reported data on anxiety symptoms that could be used in meta-analysis. These studies measured anxiety symptoms through the Revised Children’s Manifest Anxiety Scale (6 studies; 26.1%), Spence Children’s Anxiety Scale (5 studies; 21.7%), Revised Children’s Anxiety and Depression Scale (3 studies; 13.0%), Depression Anxiety Stress Scale (2 studies; 8.7%), Multidimensional Anxiety Scale for Children (2 studies; 8.7%), State-Trait Anxiety Inventory for Children (2 studies; 8.7%), Escala de Ansiedad para Niños (1 study; 4.4%), Children’s Automatic Thoughts Scale (1 study; 4.4%), and Generalized Anxiety Disorder 7-item scale (1 study; 4.4%). The length of follow-up for outcome measurement ranged from 0 to 234.6 weeks post-intervention (with a median of 26.1 weeks). Each study provided 1 to 32 effect estimates for anxiety symptoms (median = 2, total = 104). We found that the average effect of students receiving depression prevention interventions was lower anxiety symptoms relative to students in control groups (standardized mean difference = -0.06, 95% CI [-0.15, 0.03], 95% PI [-0.34, 0.22], \(I^2\) = 46%, \(\tau^2\) = 0.02). Using empirically-based benchmarks (Tanner-Smith et al., 2018), this effect size is potentially trivial, as it is less than the 25th percentile of the distribution of mean effects obtained for universal prevention programs focused on internalizing problems among school-aged youth. In addition, the data are consistent with an average effect of trivially higher anxiety symptoms among students receiving depression prevention interventions relative to students in control groups (i.e., an upper 95% CI of 0.03). In addition, the prediction interval indicates that the possible underlying effect in a new study could range from a standardized mean difference of -0.34 to 0.22 (i.e., a medium positive effect to a negative effect).

3.3.5 Educational Achievement

Two studies (2.9%) comprised of 1,930 students (4.3%) reported meta-analyzable data on educational achievement. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Perry et al. (2017) reported data on the Australian Tertiary Admission Rank and found that academic outcomes did not differ between the intervention and comparison groups at 40-weeks post-intervention (p = 0.41). Tak et al. (2016) reported data on the last grades students obtained on a variety of subject-specific tests. They found students to in the depression prevention program to have slightly lower school grades than the comparison group immediately at post-intervention (p < 0.001), though no differences were found at two-year follow-up (p = 0.112).

3.3.6 Self-Harm

One study (1.1%) comprised of 5,030 students (11.3%) reported meta-analyzable data on self-harm. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Stallard et al. (2012) reported data on self-harming thoughts and self-harming behaviors at 6- and 12-months post-intervention. Among students at high risk of depression at baseline, they found a potentially beneficial effect of classroom-based cognitive behavioral therapy (relative to usual school provision) on self-harming thoughts at 6-months post-intervention, though they found no significant differences at 12-months post-intervention, relative to an attention control, and on self-harming behaviors.

3.3.7 Stress

Three studies (4.3%) comprised of 1305 students (2.9%) reported meta-analyzable data on stress. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Singhal et al. (2014) did not find significant differences between intervention and comparison groups on the Scale for Assessing Academic Stress immediately at post-intervention and at 3-month follow-up. Wong et al. (2012) also found no significant differences between intervention and comparison groups on the Depression Anxiety Stress Scale immediately at post-intervention. Similarly, Wong et al. (2014) found no significant differences between intervention and comparison groups on the six-item short form of the Kessler Psychological Distress Scale immediately at post-intervention.

3.3.8 Substance Use

One study (1.4%) comprised of 5,030 students (11.3%) reported meta-analyzable data on self-harm. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Among students at high risk of depression at baseline, Stallard et al. (2012) found no significant differences between the intervention and comparison groups on alcohol, cannabis, and street drug misuse at 12-months post-intervention.

3.3.9 Suicidal Ideation

One study (1.4%) comprised of 540 students (1.2%) reported meta-analyzable data on suicidal ideation. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. Perry et al. (2017) found no significant differences between the intervention and comparison group on suicidal ideation immediately, 6-months, and 18-months post-intervention.

3.3.10 Well-being

Three studies (4.3%) comprised of 1,018 students (2.3%) reported meta-analyzable data on well-being. However, the degrees of freedom are below the threshold to use robust variance estimation (Tipton, 2015), so results are reported narratively. All three studies measured well-being using the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS). Brown et al. (2019) found the intervention group to score 3.57 (95% CI [1.37 to 5.76]) points higher on the WEMWBS than the control group at 3-month post-intervention; in absolute terms, the average control group score is indicative of low well-being, while the average intervention group score was above this threshold. Both Johnson et al. (2016) and Johnson et al. (2017) found no significant differences between the intervention and comparison group on suicidal ideation immediately and at 3-months post-intervention.

3.4 Variation in Effects by Methodological, Demographic, and Intervention Characteristics

We examined heterogeneity in effect estimates for depression symptoms using the following pre-specified factors: risk of bias, baseline differences in depression symptoms, school level, school type, country, level of prevention, comparator type, and study/publication year. We also examined heterogeneity based on whether a study involved randomization to experimental and comparator groups. Studies insufficiently reported race/ethnicity, percent female, and cultural specificity of interventions to examine this pre-specified factor in meta-regressions. No meta-regressions reduced unexplained residual heterogeneity variance nor variability due to between-study heterogeneity, and we did not detect associations between depression symptoms and any of our pre-specified predictors: type of assignment to interventions (SMD = -0.13, 95% CI [-0.40 to 0.14], \(I^2\) = 70%, \(\tau^2\) = 0.05), high risk of bias (SMD = -0.01, 95% CI [-0.16 to 0.14], \(I^2\) = 72%, \(\tau^2\) = 0.05), baseline depression symptoms (SMD = 0.01, 95% CI [-0.92 to 0.95], \(I^2\) = 72%, \(\tau^2\) = 0.05), school level (SMD = -0.14, 95% CI [-0.29 to 0.01], \(I^2\) = 71%, \(\tau^2\) = 0.05), school type (SMD = -0.13, 95% CI [-0.41 to 0.15], \(I^2\) = 78%, \(\tau^2\) = 0.09), country (SMD = 0.03, 95% CI [-0.12 to 0.18], \(I^2\) = 71%, \(\tau^2\) = 0.05), level of prevention (SMD = 0.04, 95% CI [-0.12 to 0.20], \(I^2\) = 71%, \(\tau^2\) = 0.05), type of comparator (SMD = 0.12, 95% CI [-0.08 to 0.31], \(I^2\) = 70%, \(\tau^2\) = 0.05), and publication year (SMD = -0.00, 95% CI [-0.02 to 0.01], \(I^2\) = 71%, \(\tau^2\) = 0.05).