Visualization by Kristoffer Magnusson

A very short primer on standardized mean differences for meta-analyses

Calculating effect sizes from the d-family and pretest-posttest-control (PPC) designs


Cohen´s d is a widely used standardized effect size for mean differences within psychology, intervention research, and research synthesis. Yet, depending on the way researchers calculate standardized mean differences, meta-analyses could come to different conclusions. A somewhat recent example of this was part of a great deep dive by Nick Jacobson on the efficacy of digital mental health interventions on anxiety:

A recently published meta-analysis on the same topic by Linardon et al. reported another, different effect size for the exact same study.

So 3 different calculations for the same effect led to 3 different effect size estimates:

1 Weisel et al.: Hedges g = -0.084, variance = 0.062
2 Linardon et al.,: Hedges g = 0.028, variance = 0.062
3 Jacobson: d = 0.158 (pooled anxiety, panic, and hyperventilation at post)

When only few studies are included in a meta-analysis, the influence of each individual effect size estimate is quite high. Therefore, the way d is calculated matters—a lot.

In this blog post I walk you through multiple approaches I tried to recreate how the standardized mean differences could have been calculated and provide the necessary R code.

Importing the data

To start, I extract the data from the table provided in the study (for anxiety only).

# treatment
M_pre_treatment = 11.5
SD_pre_treatment = 5.05
N_pre_treatment = 31

M_post_treatment = 9.39
SD_post_treatment = 5.21
N_post_treatment = 32

# control
M_pre_control = 10.66
SD_pre_control = 4.63
N_pre_control = 31

M_post_control = 9.53
SD_post_control = 4.79
N_post_control = 32

The sample size post-treatment is LARGER than pre-treatment, which I have not seen before and cannot explain…

The two approaches to calculate standardized mean differences

Lets try two ways to calculate standardized mean differences that are used most of the time:

  • Posttest only with Control Design (POWC): A standardized mean difference between both groups is calculated only at post-intervention AKA the d-family.

  • Pretest-Posttest-Control (PPC) Design: A standardized mean difference between both groups is calculated using information from both pre- and post-intervention measurements.

To keep things consistent: Positive effect sizes will indicate a better outcome for the treatment group, negative effect sizes a better outcome for the control group.

The d-Family

Cohens d

Both Cohens d and Hedges g belong to the “d family” of effect sizes. They can be calculated by dividing the difference between observations (x1 = mean of treatment group, x2 = mean of control group) by the standard deviation of these observations.

This pooled standard deviation can be calculated in different ways, Cohen defined the pooled standard deviation as:

sd_pool_jc <- sqrt(((N_post_treatment - 1) * SD_post_treatment^2 + (N_post_control - 1) * SD_post_control^2)/ (N_post_treatment + N_post_control - 2))
[1] 5.004408

But most commonly it is approximated like this:

sd_pool = (SD_post_treatment + SD_post_control)/2 
[1] 5

We can now divide the mean difference by the pooled standard deviation to calculate d:

d = (M_post_treatment - M_post_control)/sd_pool

d <- d * -1 # positive value for better treatment outcome

[1] 0.028

A great ressource for developing intuitions what Cohens d actually means and how it should be interpreted can be found on Kristoffer Magnusson´s website, where you can play with different d values:

Hedges g

j can be approximated using either the formula provided by Lakens´ great paper on effect sizes:

j_lakens = 1 - (3 / (4*(N_post_treatment + N_post_control)-9)) 

Or we could use the formula from Borenstein´s great resource on effect sizes:

j = 1 - (3 / (4*(N_post_treatment + N_post_control-2)-1)) 

Now, we can finally calculate the Hedges g:

g = d * j

[1] 0.02765992

Variance of g

Calculating it directly

The variance of g can be calculated directly from sample sizes and g:

var_g_1 <- (N_post_treatment + N_post_control)/(N_post_treatment * N_post_control) + g^2 / (2*(N_post_treatment + N_post_control))

[1] 0.06250598

Calculating it indirectly

The variance of g can also be obtained, when we first calculate the variance for d and use our correction factor j again. Formulas from Borenstein can be found here

var_d <- (N_post_treatment + N_post_control)/(N_post_treatment * N_post_control) + d / (2*((N_post_treatment + N_post_control)))

var_g_2 <- var_d * j^2
[1] 0.06120447

Using this Cohens d formula for the post comparison. I arrive at the effect size Linardon et al. reported, d = 0.0276599, variance = 0.062506!


You don´t have to do this manually!

You can calculate Hedges g and its variance easily with the escalc() function from the package metafor:

g_anx <- escalc(measure = "SMD", 
       m1i  = M_post_treatment, 
       sd1i = SD_post_treatment, 
       n1i  = N_post_treatment,
       m2i  = M_post_control, 
       sd2i = SD_post_control, 
       n2i  = N_post_control)

g_anx[1] <- g_anx[1] * -1 # positive value for better treatment outcome

      yi     vi 
1 0.0276 0.0625 

As you can see, we come to the exact same values as with the Hedges g formula from above.

Pooling Hedges g

If we want to add the data on panic and hyperventilation, we need to pool these effect sizes:

g_panic <- escalc(measure = "SMD", 
                  m1i  = 15.35, # mean post treatment-group
                  sd1i = 5.76,  # sd post treatment-group
                  n1i  = 32,    # sample size post treatment-group
                  m2i  = 14.13, # mean post control-group
                  sd2i = 6.57,  # sd post control-group
                  n2i  = 32)    # sample post control-group

g_panic[1] <- g_panic[1] * -1   # positive value for better treatment outcome

       yi     vi 
1 -0.1951 0.0628 
g_hyper <- escalc(measure = "SMD", 
                  m1i  = 21.74, # mean post treatment-group
                  sd1i = 11.27, # sd post treatment-group
                  n1i  = 32,    # sample size post treatment-group
                  m2i  = 24.72, # mean post control-group
                  sd2i = 12.75, # sd post control-group
                  n2i  = 32)    # sample post control-group

g_hyper[1] <- g_hyper[1] * -1 # positive value for better treatment outcome

      yi     vi 
1 0.2446 0.0630 

As all sample sizes are equal, we can obtain this pooled Hedges g by adding the effect sizes and dividing by 3:

g_pool_1 <- (g_anx + g_panic + g_hyper)/3
          yi         vi
1 0.02573884 0.06275694

But in a more realistic situation, when sample sizes are different, we should weigh by N:

g_pool_2 <- (g_anx * (N_post_treatment+N_post_control) + g_panic * (N_post_treatment+N_post_control) + g_hyper * (N_post_treatment+N_post_control))/(3* (N_post_treatment+N_post_control))
          yi         vi
1 0.02573884 0.06275694

Pooled effect sizes can change drastically depending on what you pool.

If a researcher would like to inflate the effect size estimate a.k.a. p-hack—which obviously is a big NONO—he/she could simply ignore panic to get a larger effect size estimate:

g_pool_anx_hyp <- (g_anx + g_hyper)/2
         yi         vi
1 0.1361411 0.06273678

This is why registered reports, preregistration, and transparency are in dire need for research synthesis—and science as a whole.

Pretest-Posttest-Control (PPC) Design

As far as I know, dPPC cannot be interpreted the same way as the d family and mixing both effect sizes should NOT be done.

dPPC is another standardized mean differences which might have been calculated as an effect size. It includes the pre-intervention information on means and standard deviations and is described very exhaustively by Morris (2008):

In the PPC design, research participants are assigned to treatment or control conditions, and each participant is measured both before and after the treatment has been administered. The PPC design can provide useful estimates of treatment effects as either an experimental design, where participants are randomly assigned to treatment conditions, or as a quasiexperimental design, where randomization is not feasible (Cook & Campbell, 1979).

It is important to note that this method is not necessary for randomized control trials as the population pretest means can be assumed to be equal across groups. This is why PPC designs are preferred in quasi-experimental research, where the groups cannot be assumed to be equal at baseline.

If we still want to calculate it, we have to choose between many available formulas.

Standardized Mean Change

A common approach would be to calculate the difference between the standardized mean change for the treatment and control groups.

First, we need to calculate the standardized mean change for the treatment group:

smc_t <- (M_post_treatment - M_pre_treatment) / ((SD_post_treatment + SD_pre_treatment)/2)

as well as for the control group:

smc_c <- (M_post_control - M_pre_control) / ((SD_post_control + SD_pre_control)/2)

Then we can calculate the effect size for the standardized mean change for the treatment and control groups:

delta <- smc_t - smc_c

delta <- delta * -1 # positive value for better treatment outcome

[1] 0.171391


Another approach is suggested by Morris (2008). He compared alternate effect size estimates—for studies with repeated measurements in both treatment and control groups—in terms of bias, precision, and robustness to heterogeneity of variance. He concludes with the following recommendation:

The results favored an effect size based on the mean pre-post change in the treatment group minus the mean pre-post change in the control group, divided by the pooled pretest standard deviation.

This can be calculated with this formula (8):

dppc2 = j * (((M_post_treatment - M_pre_treatment) - (M_post_control - M_pre_control))/SDpre)

dppc2 <- dppc2 * -1 # positive value for better treatment outcome
[1] 0.1997498

You can see that we calculate the uncorrected effect size again with our j correction factor to obtain a corrected version.

For the pooled pretest SD I use his formula (9):

SDpre = sqrt(((N_pre_treatment - 1) * SD_pre_treatment^2 + ((N_pre_control -1)*SD_pre_control^2))/
  (N_pre_treatment + N_pre_control-2))

Variance of dPPC

To calculate the variance of dPPC, we need to include the pre-test-post-test correlations. Yet, this presents a problem, as it is stated in the paper:

Even when the values of pre-test-post-test correlations are not presented explicitly one may still be able to estimate them from t statistics computed using change scores or an average estimate may be obtainable from the analysis-of-variance F test.

I am not sure how to do this, so I use multiple rhos from 0 to 1 as a form of sensitivity check.

His formula (25) gives us the variance of dPPC2:

If we calculate this for rho values ranging from -1 to 1, we get the following variances:

magic_for(print, silent = TRUE) # create a dataframe to save output from loops

for (rho in c(seq(-1 , 1, by = .1 ))){
var_dppc2 <- (2 * j^2 * (1 - rho)) * ((N_post_treatment + N_post_control)/(N_post_treatment * N_post_control)) * ((N_post_treatment + N_post_control - 2)/(N_post_treatment + N_post_control - 4)) * (1 + (delta^2/((2 * (1-rho)) * ((N_post_treatment + N_post_control)/(N_post_treatment * N_post_control))))) - delta^2
result <- magic_result_as_dataframe() 
    rho  var_dppc2
1  -1.0 0.25211060
2  -0.9 0.23951617
3  -0.8 0.22692173
4  -0.7 0.21432730
5  -0.6 0.20173286
6  -0.5 0.18913843
7  -0.4 0.17654400
8  -0.3 0.16394956
9  -0.2 0.15135513
10 -0.1 0.13876069
11  0.0 0.12616626
12  0.1 0.11357183
13  0.2 0.10097739
14  0.3 0.08838296
15  0.4 0.07578852
16  0.5 0.06319409
17  0.6 0.05059966
18  0.7 0.03800522
19  0.8 0.02541079
20  0.9 0.01281635
21  1.0        NaN

We can see that the variance of dPPC2 is highly influenced by the pre-test-post-test correlations and we should be careful when interpreting this. At rho = .5 our variance estimate is close the variance of d/g.


The effect sizes we tried to recreate were:

1 Weisel et al.: Hedges g = -0.084, variance = 0.062
2 Linardon et al.,: Hedges g = 0.028, variance = 0.062
3 Jacobson: d = 0.158 (pooled anxiety, panic, and hyperventilation at post)

So you can see, we could only figure out how Linardon et al. calculated their effect size. Depending on the way you calculate this mean difference, the standardized mean effect size could be:

For anxiety:

  • d = 0.028, variance = 0.063
  • g = 0.028, variance = 0.063/0.061
  • dPPC2 = 0.2, variance ranging from 0.013 to 0.252
  • delta = 0.171

For pooled anxiety, panic, and hyperventilation at post:

  • g = 0.026, variance = 0.063

For pooled anxiety and hyperventilation at post:

  • g = 0.136, variance = 0.063

Unfortunately, I could only reproduce 1 out of 3 effect sizes. Please feel free to suggest another approach!