Photo by Sadman Sakib on Unsplash

(Multi-) Leveling Up: Part I

Multilevel modeling and data visualisations for two recent meta-analyses on the efficacy of apps for the treatment of depression

Photo by Sadman Sakib on Unsplash

(Multi-) Leveling Up: Part I

Multilevel modeling and data visualisations for two recent meta-analyses on the efficacy of apps for the treatment of depression

Intro

The need for mental health interventions exceeds the supply everywhere. I became painfully familiar with this fact as we conducted a study in 2015 which led us to estimate that only a fifth of all people in need of psychotherapy can get the help they need (in Lower Austria).

Digital mental health interventions (DMHI) show promise to help those who cannot go to a therapist, be it due to the fact that there are simply not enough, they are not affordable, a fear of stigma, or many other reasons I would have never thought of. A worldwide pandemic that leads to billions of people self-isolating—as is the case at the time of writing—is one of them.

Two recent meta-analyses investigated the efficacy of such digital mental health interventions on a broad range of mental disorders. Both focused on smartphone apps, but one focused on standalone apps—no human therapist is involved—and the other on app-supported interventions—human therapists might be a part of the therapy process.

For depression, both meta-analyses come to similar conclusions: these interventions work. There are some mixed findings, but overall, they seem to help, which is good news. But for anxiety, they diverge: one meta-analysis provides support for their efficacy, the other does not.
To explore this overlapping and diverging results further, I conducted some additional analyses with data from the respective supplementary materials. I am curious if a multilevel meta-analysis leads to different results or if additional analyses might reveal anything of interest. If you are interested in a very short primer on multilevel meta-analyses, click here.

You will find a description and code on the following analyses in this post:
1. Meta-analysis with averaged effect sizes
2. Multilevel meta-analysis 3. Funnel plot variations 4. Baujat plot 5. Cumulative meta-analysis 6. p-curve
7. p-uniform*

In this first blog post I will compare the converging results on depression, and in a next post I will compare their diverging findings on anxiety. My main goal of these posts is to provide you with a detailed walk through of my code and the interpretation of the output while exploring both meta-analyses “a little deeper”. And when I say “a little deeper”, I mean way too deep.


Meta-analysis on standalone smartphone apps for mental health: Depression

The first meta-analysis on digital mental health interventions I would like to play with, Standalone smartphone apps for mental health—a systematic review and meta-analysis by Weisel et al. 2019, addresses standalone apps, e.g. there is no human therapist involved in the treatment.

Main result from abstract:

Effects on depression (Hedges’ g = 0.33, 95%CI 0.10–0.57, P = 0.005, NNT = 5.43, I2 = 59%) and on smoking behavior (g = 0.39, 95%CI 0.21–0.57, NNT = 4.59, P ≤ 0.001, I2 = 0%) were significant. No significant pooled effects were found for anxiety, suicidal ideation, self-injury, or alcohol use (g = −0.14 to 0.18). Effect sizes for single trials ranged from g = −0.05 to 0.14 for PTSD and g = 0.72 to 0.84 for insomnia. Although some trials showed potential of apps targeting mental health symptoms, using smartphone apps as standalone psychological interventions cannot be recommended based on the current level of evidence.

So standalone apps are effective for depression and smoking cessation, but not for anxiety, suicidal ideation, self-injury, or alcohol use.

Let´s have a look at depression first. Here, I walk you through my code in detail so that you can see what I did and how I did it (which I won´t do in that much detail for the next post, don´t worry).


Setup

We will need the following packages for our analyses:

library(readxl)     # importing excel spreadsheets
library(tidyverse)  # data wrangling
library(metafor)    # meta-analyses & multilevel meta-analyses
library(metaviz)    # advanced data visualization
library(dmetar)     # p-curve function
library(puniform)   # puniform
library(readxl)     # importing excel spreadsheets


Data wrangling

I created this data frame with values from Supplementary Table 4, and I already added the study and effect size IDs manually which I will need later on for the multilevel model. I extracted Hedges g and the variance to calculate the summary effect size, sample size if we need to calculate weighted average effect sizes to address effect size dependency, and p-values for the p-curve.

weisel_data <- tribble(
  ~study_name   , ~study_id, ~es_id, ~hedges_g, ~N,   ~p,    ~variance,
  "Arean 2016",      1,       1,       0.308,    129,  .088,  .033,
  "Arean 2016",      1,       2,       0.072,    150,  .678,  .030,
  "Birney 2016",     2,       3,       0.098,    199,  .49,   .020,
  "Enock 2014",      3,       4,       0.48,     106,  .034,  .050,
  "Enock 2014",      3,       5,      -0.03,     220,  .86,   .020,
  "Horsch 2017",     4,       6,       0.57,     151,  .05,   .027,
  "Hur 2018",        5,       7,       0.662,     34,  .055,  .119,
  "Kuhn 2017",       6,       8,       0.23,     120,  .211,  .033,
  "Roepke 2015",     7,       9,       0.296,    140,  .099,  .032,
  "Roepke 2015",     7,       10,      0.793,    144,  .001,  .034,
  "Stolz 2018",      8,       11,      0.30,     90,  .18,    .050,
  "Tighe 2017",      9,       12,      0.71,     61,  .0007,  .068
)


Recreating the meta-analysis

Weisel et al. report two summary effects for depression:

  1. When depression was the primary target of the intervention: g = 0.33, 95%CI 0.10–0.57, k = 6, N = 796, I2=59%, 95%CI 0–83%.

  2. When depression was the primary or secondary outcome: g = 0.34, 95%CI 0.18–0.49, k = 12, N = 1544, I2 = 53%, 95%CI 10–76%.

In our analysis, we recreate only the second result in order to compare it to the second meta-analysis more closely, as Linardon et al. compared the treatment effect irrespective of outcome type.

weisel_ma <- rma.uni(data = weisel_data, 
                     yi = hedges_g,
                     vi = variance, 
                     method = "REML",
                     slab = study_name)
weisel_ma

Random-Effects Model (k = 12; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0388 (SE = 0.0318)
tau (square root of estimated tau^2 value):      0.1971
I^2 (total heterogeneity / total variability):   53.06%
H^2 (total variability / sampling variability):  2.13

Test for Heterogeneity:
Q(df = 11) = 23.4361, p-val = 0.0153

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub 
  0.3365  0.0797  4.2246  <.0001  0.1804  0.4926  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimate provides the summary effect size with standard error(se). The z-value is the corresponding test statistic, pval is the corresponding p-value, ci.lb the the lower bound of the confidence interval and ci.ub the upper bound of the confidence interval. We conduct a random effects model with the REML estimator, and get the same result of g = 0.34 in favor of smartphone interventions compared to controls.


Estimates of heterogeneity

Here we can see that heterogeneity is moderate-to-high with a Higgins I2 = 53% and a slightly different 95% CI [6%, 83%].

weisel_ma_confint <- confint(weisel_ma)

weisel_ma_confint

       estimate  ci.lb   ci.ub 
tau^2    0.0388 0.0020  0.1663 
tau      0.1971 0.0449  0.4079 
I^2(%)  53.0629 5.5399 82.8842 
H^2      2.1305 1.0586  5.8426 


Forest Plot

Very similar forest plot to the paper, except it is depicted as a rain forest plot, which I prefer for several reasons outlined here: Finding your way out of the forest without a trail of bread crumbs: development and evaluation of two novel displays of forest plots.

weisel_forest <- viz_forest(weisel_ma, 
           summary_label = "Summary effect", 
           study_labels = weisel_data$study_name,
           method = "REML",
           xlab = "Hedges g", 
           variant = "rain",
           annotate_CI = TRUE, 
           table_headers = "Hedges g [95% CI]")

weisel_forest

For more info on these types of data visualization for meta-analyses check out the metaviz package!


Additional Analyses

Now that we have recreated the analyses from the paper, we can explore their results with these following additional analyses:

  1. Meta-analysis with averaged effect sizes
  2. Funnel Plots Variations
  3. Baujat Plot
  4. Cumulative Meta-Analysis
  5. Multilevel Meta-Analysis
  6. p-curve
  7. p-uniform*


Addressing effect size dependency by averaging

Meta-analysis with averaged effect sizes

Several included studies reported multiple effect sizes, which introduces effect size dependency. This can be addressed by either averaging all effect sizes for each study, or conducting a hierarchical model/multilevel model. To check whether this changes the results in a meaningful way, I did both below.

weisel_data_averaged <- 
  weisel_data %>% 
  group_by(study_id) %>%                    # to average all studies per study_id 
  mutate(hedges_g = sum(hedges_g*N)/sum(N), # Hedges g are weighted by sample size
         variance = sum(variance*N)/sum(N), # sample variances are weighted by sample size
         N = mean(N)) %>%                   # sample size is averaged
  distinct(study_id,                        # only 1 effect size per study is kept
           .keep_all = TRUE)

weisel_data_averaged
# A tibble: 9 x 7
# Groups:   study_id [9]
  study_name  study_id es_id hedges_g     N      p variance
  <chr>          <dbl> <dbl>    <dbl> <dbl>  <dbl>    <dbl>
1 Arean 2016         1     1    0.181  140. 0.088    0.0314
2 Birney 2016        2     3    0.098  199  0.49     0.02  
3 Enock 2014         3     4    0.136  163  0.034    0.0298
4 Horsch 2017        4     6    0.570  151  0.05     0.027 
5 Hur 2018           5     7    0.662   34  0.055    0.119 
6 Kuhn 2017          6     8    0.23   120  0.211    0.033 
7 Roepke 2015        7     9    0.548  142  0.099    0.0330
8 Stolz 2018         8    11    0.3     90  0.18     0.05  
9 Tighe 2017         9    12    0.71    61  0.0007   0.068 

Now all effect sizes and variances are weighted by the sample size. I will use this data set for all additional analyses that follow below.

weisel_ma_avg <- rma.uni(data = weisel_data_averaged, 
                     yi = hedges_g,
                     vi = variance, 
                     method = "REML",
                     slab = study_name)
weisel_ma_avg

Random-Effects Model (k = 9; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0170 (SE = 0.0263)
tau (square root of estimated tau^2 value):      0.1306
I^2 (total heterogeneity / total variability):   32.20%
H^2 (total variability / sampling variability):  1.47

Test for Heterogeneity:
Q(df = 8) = 11.5812, p-val = 0.1709

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub 
  0.3347  0.0777  4.3085  <.0001  0.1824  0.4870  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that the summary effect size is almost identical to the reported effect size from the published meta-analysis!


Estimates of heterogeneity
confint(weisel_ma_avg)

       estimate  ci.lb   ci.ub 
tau^2    0.0170 0.0000  0.1619 
tau      0.1306 0.0000  0.4024 
I^2(%)  32.2001 0.0000 81.8564 
H^2      1.4749 1.0000  5.5116 

Here you can see that heterogeneity has decreased slightly, indicating that effect sizes within studies differed from each other.


Multilevel model

Now, we can (multi-)level up! We now have everything we need to calculate the multilevel meta-analysis: sampling variation for each effect size is on level 1, the variation of effect sizes within each study on level 2, and the variation of effect sizes across studies on level 3.

You might have noticed that we added two IDs: one for each study, and one for each effect size within each study. This is what metafors rma.mv() function needs in order to work!

weisel_mv <- rma.mv(yi = hedges_g, 
                   V = variance, # sampling variances are squared SE 
                   random = list(~ 1 | es_id, # level 2
                                 ~ 1 | study_id), # level 3
                   tdist = TRUE, # Knap*p*-Hartung adjustment
                   data = weisel_data,
                   method = "REML",
                   slab = study_name)

summary(weisel_mv)

Multivariate Meta-Analysis Model (k = 12; method: REML)

  logLik  Deviance       AIC       BIC      AICc 
 -1.3213    2.6425    8.6425    9.8362   12.0711   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0388  0.1971     12     no     es_id 
sigma^2.2  0.0000  0.0000      9     no  study_id 

Test for Heterogeneity:
Q(df = 11) = 23.4361, p-val = 0.0153

Model Results:

estimate      se    tval    pval   ci.lb   ci.ub 
  0.3365  0.0797  4.2246  0.0014  0.1612  0.5118  ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, we can see that the summary effect size is almost identical to the reported effect size in the meta-analysis! This indicates that no matter how you analyse the data, you will come to the same, robust, result.


Heterogeneity
mlm.variance.distribution(x = weisel_mv)
        % of total variance    I2
Level 1        4.693717e+01   ---
Level 2        5.306283e+01 53.06
Level 3        2.661847e-08     0
Total I2: 53.06% 

Our estimate for heterogeneity can now be attributed to level 2 and level three, increasing the beneift of our third level.


Sunset Funnel Plot

From: >The funnel plot is the most widely used diagnostic plot in meta-analysis, primarily to assess small-study effects. The sunset (power-enhanced) funnel plot incorporates study-level power information in the funnel display. This directly allows to examine the power studies had to detect an effect of interest (e.g., the observed meta-analytic summary effect), whether funnel plot asymmetry is driven by under-powered but significant studies, and to visually assess if there is an excess of low-powered significant effects in the meta-analysis (conceptually related to the test of excess significance, Ioannidis & Trikalinos, 2007). For effect sizes assumed to be normally distributed (e.g., Cohen d, log OR), the power corresponding to a given standard error is computed by using a two-sided Wald test and (by default) the meta-analytic summary effect as assumed true effect. Colored regions of different power levels and a second axis with study level power are shown in the funnel plot. In addition, power-related statistics are shown: a) The median power of all studies, b) the true effect size necessary such that the median power of the studies would have been 33% or 66%, c) results of a test of excess significance (Ioannidis & Trikalinos, 2007), and d) the R-Index for expected replicability (Schimmack, 2016).

viz_sunset(weisel_ma_avg, true_effect = .3)

You can see that the included studies had only a Power of maximal 60% to detect a study with g = .3, indicating that most studies were severely under-powered.

The median power of all studies was quite low with only 37.9%!


Contour Enhanced Funnel Plot with Eggers Regression Line & Trim and Fill Method

Here, we can see imputed studies by the trim and fill method, the adjusted summary effect (dotted line), as well as Egger’s regression line. This can help to visually assess funnel plot asymmetry.

viz_funnel(weisel_ma_avg,
           trim_and_fill = T,
           egger = T, 
           method = "REML")

Currently, the usefulness of the trim and fill method is heavily debated, but still you can see where studies would need to be in a file drawer in order to enhance symmetry. Additionally, you can see that Eggers regression line (red dotted line) reveals a lot of asymmetry, yet does not reach statistical significance (as it is under-powered).

regtest(weisel_ma_avg)

Regression Test for Funnel Plot Asymmetry

model:     mixed-effects meta-regression model
predictor: standard error

test for funnel plot asymmetry: z = 1.7311, p = 0.0834


Baujat Plot

Baujat et al. (2002) proposed a diagnostic plot to detect sources of heterogeneity in meta-analytic data. The plot shows the contribution of each study to the overall Q-test statistic for heterogeneity on the horizontal axis versus the influence of each study (defined as the standardized squared difference between the overall estimate based on a fixed-effects model with and without the ith study included in the model) on the vertical axis.

baujat(weisel_ma_avg, symbol = "slab")

You can see that studies by Horsch and Birney were both highly influential on effect size heterogeneity and the summary effect!


Cumulative Meta-Analysis Over Time

Cumulative meta-analyses can help visualizing, if interventions have improved over time.

First we need to arrange the data set after publication year and conduct another random effects model:

weisel_data_year <- weisel_data_averaged %>% 
  separate(study_name, c("author", "year")) %>%     # extract year from study_name 
  mutate(study_name = paste(author,"et al.", year)) %>%  # create new study name 
  mutate(year = as.numeric(year)) %>% 
  arrange(year)

weisel_ma_year <- rma.uni(data = weisel_data_year,
                          yi = hedges_g,
                          vi = variance, 
                          method = "REML",
                          slab = study_name)
weisel_ma_year

Random-Effects Model (k = 9; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0170 (SE = 0.0263)
tau (square root of estimated tau^2 value):      0.1306
I^2 (total heterogeneity / total variability):   32.20%
H^2 (total variability / sampling variability):  1.47

Test for Heterogeneity:
Q(df = 8) = 11.5812, p-val = 0.1709

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub 
  0.3347  0.0777  4.3085  <.0001  0.1824  0.4870  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we can visualize the cumulative meta-analysis, in which each individual study is added, starting at the top, to calculate the summary effect, step by step.

weisel_forest_year <- viz_forest(weisel_ma_year, 
           summary_label = "Summary effect", 
           study_labels = weisel_data_year$study_name,
           method = "REML",
           xlab = "Hedges g", 
           variant = "rain",
           annotate_CI = TRUE, 
           table_headers = "Hedges g [95% CI]", 
           type = "cumulative")

weisel_forest_year

It appears that studies report larger effect sizes in recent years, which MIGHT indicate, that these apps are becoming better over time.


p-curve

p-curve, the distribution of significant p-values, can be analyzed to assess if the findings have evidential value (in particular, if p-hacking and file-drawering can be ruled out).

Best practice would be to follow the p-curve User Guide and include only the “main” p-value reported in the paper. We cannot do this in our analysis, because it would be too time consuming, but still: we should only include 1 effect size from each study, so we just pretend that the averaged effect size would have been included.

Now we need to prepare the data for analysis:

weisel_data_pcurve_avg <- weisel_data_averaged %>% 
  mutate(seTE = sqrt(variance)) %>% 
  rename(TE = hedges_g, 
       studlab = study_name)

Now, pcurve() from dmetar package calculates p-values from SE and effect sizes and conducts the p-curve, with graphical output.

weisel_pcurve_avg <- pcurve(weisel_data_pcurve_avg)

P-curve analysis 
 ----------------------- 
- Total number of provided studies: k = 9 
- Total number of p<0.05 studies included into the analysis: k = 3 (33.33%) 
- Total number of studies with p<0.025: k = 3 (33.33%) 
   
Results 
 ----------------------- 
                    pBinomial  zFull pFull  zHalf pHalf
Right-skewness test     0.125 -2.928 0.002 -2.280 0.011
Flatness test           1.000  1.517 0.935  2.356 0.991
Note: p-values of 0 or 1 correspond to p<0.001 and p>0.999, respectively.   
Power Estimate: 80% (29%-97.6%)
   
Evidential value 
 ----------------------- 
- Evidential value present: yes 
- Evidential value absent/inadequate: no 
   
weisel_pcurve_avg
$pcurveResults
                    pBinomial  zFull pFull  zHalf pHalf
Right-skewness test     0.125 -2.928 0.002 -2.280 0.011
Flatness test           1.000  1.517 0.935  2.356 0.991

$Power
 powerEstimate powerLower powerUpper
           0.8       0.29      0.976

$PlotData
  p-value Observed (blue) Power 33% (Green) Flat (Red)
1    0.01             100            44.253         20
2    0.02               0            19.514         20
3    0.03               0            14.443         20
4    0.04               0            11.760         20
5    0.05               0            10.030         20

$Input
                     TE     p ppSkewFull ppSkewHalf ppFlatFull ppFlatHalf
1 Arean 2016  0.1811183 0.307         NA         NA         NA         NA
2 Birney 2016 0.0980000 0.488         NA         NA         NA         NA
3 Enock 2014  0.1358282 0.431         NA         NA         NA         NA
4 Horsch 2017 0.5700000 0.001      0.010      0.021      0.921      0.963
5 Hur 2018    0.6620000 0.055         NA         NA         NA         NA
6 Kuhn 2017   0.2300000 0.205         NA         NA         NA         NA
7 Roepke 2015 0.5480000 0.003      0.051      0.102      0.795      0.904
8 Stolz 2018  0.3000000 0.180         NA         NA         NA         NA
9 Tighe 2017  0.7100000 0.006      0.129      0.259      0.651      0.837
              zSkewFull zSkewHalf zFlatFull zFlatHalf
1 Arean 2016         NA        NA        NA        NA
2 Birney 2016        NA        NA        NA        NA
3 Enock 2014         NA        NA        NA        NA
4 Horsch 2017    -2.310    -2.035     1.415     1.791
5 Hur 2018           NA        NA        NA        NA
6 Kuhn 2017          NA        NA        NA        NA
7 Roepke 2015    -1.633    -1.268     0.823     1.306
8 Stolz 2018         NA        NA        NA        NA
9 Tighe 2017     -1.129    -0.646     0.389     0.983

$EvidencePresent
[1] "yes"

$EvidenceAbsent
[1] "no"

$kInput
[1] 9

$kAnalyzed
[1] 3

$kp0.25
[1] 3

We can see that 3 out of 9 p-values were statistically significant at .05, presenting us with a nicely right skewed and we should worry too much, i.e. about p-hacking.


p-uniform*

This meta-analysis method corrects for publication bias and is an extension of the classic p-uniform method. It now allows for estimation of the average effect size and the between-study variance in a meta-analysis, and uses both the statistically significant and non-significant effect sizes.

Again, we should only include one effect size per study, and include the averaged effect sizes:

weisel_puniform_avg <- puni_star(yi = weisel_data_averaged$hedges_g, 
          vi = weisel_data_averaged$variance, 
          alpha = .05,
          side = "right", 
          method = "ML")

weisel_puniform_avg

Method: ML (k = 9; ksig = 3)

Estimating effect size p-uniform*

       est     ci.lb     ci.ub       L.0    pval
    0.4473    0.2006    0.7025   11.3814   <.001

===

Estimating between-study variance p-uniform*

      tau2   tau2.lb   tau2.ub     L.het      pval
    0.0146         0     0.103    0.7227    0.3952

===

Publication bias test p-uniform*

      L.pb      pval
    0.7951     0.672

This method yields a LARGER effect size estimate than the reported random-effects model or multilevel model, indicating that we might not need to worry about a large file drawer.


Meta-analysis on app‐supported smartphone interventions for mental health: Depression

Lets continue our exploration with the meta-analysis from Linardon et al. 2019, which had an even larger scope as it included app-supported smartphone studies, e.g. apps that were also partly supported by a therapist.

Main findings from abstract:

Smartphone interventions significantly outperformed control conditions in improving depressive (g=0.28, n=54) and generalized anxiety (g=0.30, n=39) symptoms, stress levels (g=0.35, n=27), quality of life (g=0.35, n=43), general psychiat- ric distress (g=0.40, n=12), social anxiety symptoms (g=0.58, n=6), and positive affect (g=0.44, n=6), with most effects being robust even after adjusting for various possible biasing factors (type of control condition, risk of bias rating). Smartphone interventions conferred no significant benefit over control conditions on panic symptoms (g=–0.05, n=3), post-traumatic stress symptoms (g=0.18, n=4), and negative affect (g=–0.08, n=5).

Interestingly, this yields a smaller effect size for depressive symptoms, even-though the involvement of a therapist usually leads to better outcomes. Lets explore why this might be the case.


Data wrangling

I created this data frame with values from from Figure 2, clean the data, and add study/effect size idea with the code below:

linardon_data <- read_excel("data_linardon_depression.xlsx") 

linardon_data <- linardon_data %>% 
  mutate(study_name = 
           str_trim(linardon_data$ref)) %>%     # cleaning whitspace
  arrange(study_name) %>%                                  # arranging alphabetically
  mutate(effect_id = row_number()) %>%                            # adding unique ID
  group_by(study_name) %>%                
  mutate(study_id = group_indices(),                      # adding study size id
         variance = as.numeric(variance))  %>%            
  select(study_id, effect_id,                         # reordering data frame
         study_name, year, hedges_g, variance) 

linardon_data
# A tibble: 54 x 6
# Groups:   study_name [40]
   study_id effect_id study_name         year hedges_g variance
      <int>     <int> <chr>             <dbl>    <dbl>    <dbl>
 1        1         1 Arean et al28      2016    0.11     0.098
 2        1         2 Arean et al28      2016    0.417    0.01 
 3        2         3 Bakker et al16     2018   -0.213    0.026
 4        2         4 Bakker et al16     2018   -0.13     0.025
 5        2         5 Bakker et al16     2018    0.225    0.026
 6        3         6 Birney et al32     2016    0.197    0.013
 7        4         7 Boettcher et al38  2018    0.296    0.029
 8        5         8 Bostock et al45    2018    0.47     0.019
 9        6         9 Carissoli et al29  2015    0.126    0.101
10        7        10 Cox et al39        2018    0.315    0.097
# … with 44 more rows


Recreating the meta-analysis

Overall Linardon et al. found that: > The pooled effect size for the 54 comparisons between smart- phone interventions and control conditions on depressive symptoms was g = 0.28 95% CI [0.21-0.36], with moderate heterogeneity I2 = 54% 95 CI [38; 66]

Linardon et al. included 37 standalone smartphone apps (g = 0.23 (0.15-0.31)), which is more than what Weisel included. It would be interesting to compare their overlap, which I might do in another blog post. Especially as the mention: > Studies that offered professional guidance (e.g., regular supportive text messages, phone calls, or personalized feedback from therapists or research staff) produced larger effect sizes than studies that did not offer guidance.

But lets recreate their meta-analysis first:

linardon_ma <- rma.uni(data = linardon_data, 
                     yi = hedges_g,
                     vi = variance, 
                     method = "REML")
linardon_ma

Random-Effects Model (k = 54; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0411 (SE = 0.0155)
tau (square root of estimated tau^2 value):      0.2028
I^2 (total heterogeneity / total variability):   55.12%
H^2 (total variability / sampling variability):  2.23

Test for Heterogeneity:
Q(df = 53) = 116.9372, p-val < .0001

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub 
  0.2818  0.0395  7.1297  <.0001  0.2043  0.3593  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(linardon_ma)

       estimate   ci.lb   ci.ub 
tau^2    0.0411  0.0147  0.0707 
tau      0.2028  0.1212  0.2659 
I^2(%)  55.1162 30.4900 67.8672 
H^2      2.2280  1.4386  3.1121 

Thus, the reported effect size from the paper could be reproduced, again the Higgins I2 estimate and CI could not be reproduced, I suspect this might be due to differences in its calculation between R, which I use, and Comprehensive Meta-Analysis, which both authors used in their analyses.


Forest Plot

Here, again, the rain forest plot similar to the publication.

linardon_forest <- viz_forest(linardon_ma, 
           summary_label = "Summary effect", 
           study_labels = linardon_data$study_name,
           method = "REML",
           xlab = "Hedges g", 
           variant = "rain",
           annotate_CI = TRUE, 
           table_headers = "Hedges g [95% CI]")

linardon_forest


Additional Analyses

Addressing effect size dependency by averaging

Effect size dependency was addressed in the paper by conducting sensitivity analyses, as Linardon et al. state that:

In the previous analyses, we included a few trials in which more than one intervention condition was compared with the same control condition (or vice versa). These comparisons were not independent from each other, which may have artificially reduced the heterogeneity estimate and affected the pooled ef- fect size. To deal with this, we ran sensitivity analyses in which the comparison with the smallest effect size was only included in the analysis, and then repeated this again for the comparison with the largest effect size. These sensitivity analyses ensured that only one comparison per study was included in the meta- analysis. These sensitivity analyses yielded a pooled effect size highly similar to the overall effect (Table 1).

Another way to adress this issue is to weigh the effect sizes/variances by sample size to get only one average effect size per study. As no sample sizes were reported, I cannot do this. Or it could be done, but I won´t extract them all… so I just take the mean of g and the variances—which should not be done and is only an approximation!

linardon_data_averaged <- linardon_data %>% 
  group_by(study_id) %>%               # to average all studies per study_id 
  mutate(hedges_g = mean(hedges_g),    # mean of Hedges g 
         variance = mean(variance)) %>% # mean of sample variances 
  distinct(study_id,                   # only 1 effect size per study is kept
           .keep_all = TRUE)

linardon_data_averaged
# A tibble: 40 x 6
# Groups:   study_id [40]
   study_id effect_id study_name             year hedges_g variance
      <int>     <int> <chr>                 <dbl>    <dbl>    <dbl>
 1        1         1 Arean et al28          2016   0.264    0.054 
 2        2         3 Bakker et al16         2018  -0.0393   0.0257
 3        3         6 Birney et al32         2016   0.197    0.013 
 4        4         7 Boettcher et al38      2018   0.296    0.029 
 5        5         8 Bostock et al45        2018   0.47     0.019 
 6        6         9 Carissoli et al29      2015   0.126    0.101 
 7        7        10 Cox et al39            2018   0.315    0.097 
 8        8        11 Dennis-Tiwary et al26  2017   0.073    0.131 
 9        9        12 Ebert et al46          2018   0.528    0.02  
10       10        13 Ebert et al47          2016   0.547    0.016 
# … with 30 more rows

Taking this new data set with only one effect size per study, we can see that the summary effect becomes slightly larger…

linardon_ma_avg <- rma.uni(data = linardon_data_averaged, 
                     yi = hedges_g,
                     vi = variance, 
                     method = "REML",
                     slab = study_name)
linardon_ma_avg

Random-Effects Model (k = 40; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.0309 (SE = 0.0154)
tau (square root of estimated tau^2 value):      0.1758
I^2 (total heterogeneity / total variability):   47.37%
H^2 (total variability / sampling variability):  1.90

Test for Heterogeneity:
Q(df = 39) = 72.1007, p-val = 0.0010

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub 
  0.3285  0.0424  7.7418  <.0001  0.2453  0.4116  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

… while estimates of heterogeneity decrease slightly.

confint(linardon_ma_avg)

       estimate   ci.lb   ci.ub 
tau^2    0.0309  0.0064  0.0626 
tau      0.1758  0.0799  0.2502 
I^2(%)  47.3746 15.6836 64.5777 
H^2      1.9002  1.1860  2.8231 

Multilevel model

linardon_mv <- rma.mv(yi = hedges_g, 
                   V = variance, 
                   random = list(~ 1 | effect_id, # level 2
                                 ~ 1 | study_id), # level 3
                   tdist = TRUE, # Knap*p*-Hartung adjustment
                   data = linardon_data,
                   method = "REML",
                   slab = study_name)

summary(linardon_mv)

Multivariate Meta-Analysis Model (k = 54; method: REML)

  logLik  Deviance       AIC       BIC      AICc 
 -8.1081   16.2161   22.2161   28.1270   22.7059   

Variance Components:

            estim    sqrt  nlvls  fixed     factor 
sigma^2.1  0.0186  0.1364     54     no  effect_id 
sigma^2.2  0.0232  0.1524     40     no   study_id 

Test for Heterogeneity:
Q(df = 53) = 116.9372, p-val < .0001

Model Results:

estimate      se    tval    pval   ci.lb   ci.ub 
  0.2981  0.0422  7.0656  <.0001  0.2135  0.3827  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The effect size increased to 0.29 when we multi-level up, which indicates that the published ES is a robust estimate.

mlm.variance.distribution(x = linardon_mv)
        % of total variance    I2
Level 1            44.45717   ---
Level 2            24.69093 24.69
Level 3            30.85190 30.85
Total I2: 55.54% 

The effect size heterogeneity stayed approximately the same as in the random effects model, but we can attribute it to between- and within-study variability!


Sunset Funnel Plot

viz_sunset(linardon_ma_avg, true_effect = .3)

Even-though we can see more adaquatly powered studies, the median power of all studies was even lower than in the first meta-analysis: only 28%!


Contour Enhanced Funnel Plot with Eggers Regression Line & Trim and Fill Method

Here, we can see imputed studies by the trim and fill method, the adjusted summary effect (dotted line), as well as Egger’s regression line. This can help to visually assess funnel plot asymmetry.

viz_funnel(linardon_ma_avg,
           trim_and_fill = T,
           egger = T, 
           method = "REML")

Interestingly, the asymmetry goes away from a null effect to an even larger effect. A file drawer in this direction seems highly unlikely so there is not too much to worry about here.


Baujat Plot

baujat(linardon_ma_avg, symbol = "slab")

Nobis et al. has a strong influence both on the summary effect size as well as overall heterogeneity. A closer look might be interesting.


Cumulative Meta-Analysis

Now we can look at the change of effect sizes over time.

linardon_data_year <- linardon_data %>% 
  arrange(year)

linardon_ma_year <- rma.uni(data = linardon_data_year,
                          yi = hedges_g,
                          vi = variance, 
                          method = "REML",
                          slab = study_name)

linardon_forest_year <- viz_forest(linardon_ma_year, 
           summary_label = "Summary effect", 
           study_labels = linardon_data_year$study_name,
           method = "REML",
           xlab = "Hedges g", 
           variant = "rain",
           annotate_CI = TRUE, 
           table_headers = "Hedges g [95% CI]", 
           type = "cumulative")

linardon_forest_year

The effect sizes decreased over time, but remain stable around g = 0.3.


p-curve

linardon_data_pcurve_avg <- linardon_data_averaged %>% 
  mutate(seTE = sqrt(variance)) %>% 
  rename(TE = hedges_g, 
       studlab = study_name)

linardon_pcurve_avg <- pcurve(linardon_data_pcurve_avg)

P-curve analysis 
 ----------------------- 
- Total number of provided studies: k = 40 
- Total number of p<0.05 studies included into the analysis: k = 12 (30%) 
- Total number of studies with p<0.025: k = 11 (27.5%) 
   
Results 
 ----------------------- 
                    pBinomial  zFull pFull  zHalf pHalf
Right-skewness test     0.003 -8.800     0 -8.535     0
Flatness test           0.982  5.725     1  7.608     1
Note: p-values of 0 or 1 correspond to p<0.001 and p>0.999, respectively.   
Power Estimate: 96% (87.7%-98.7%)
   
Evidential value 
 ----------------------- 
- Evidential value present: yes 
- Evidential value absent/inadequate: no 
   
linardon_pcurve_avg
$pcurveResults
                    pBinomial  zFull pFull  zHalf pHalf
Right-skewness test     0.003 -8.800     0 -8.535     0
Flatness test           0.982  5.725     1  7.608     1

$Power
 powerEstimate powerLower powerUpper
          0.96      0.877      0.987

$PlotData
  p-value Observed (blue) Power 33% (Green) Flat (Red)
1    0.01          83.333            44.253         20
2    0.02           8.333            19.514         20
3    0.03           0.000            14.443         20
4    0.04           0.000            11.760         20
5    0.05           8.333            10.030         20

$Input
                                    TE     p ppSkewFull ppSkewHalf ppFlatFull
1 Arean et al28             0.26350000 0.257         NA         NA         NA
2 Bakker et al16           -0.03933333 0.806         NA         NA         NA
3 Birney et al32            0.19700000 0.084         NA         NA         NA
4 Boettcher et al38         0.29600000 0.082         NA         NA         NA
5 Bostock et al45           0.47000000 0.001      0.013      0.026      0.910
6 Carissoli et al29         0.12600000 0.692         NA         NA         NA
7 Cox et al39               0.31500000 0.312         NA         NA         NA
8 Dennis-Tiwary et al26     0.07300000 0.840         NA         NA         NA
9 Ebert et al46             0.52800000 0.000      0.004      0.008      0.959
10 Ebert et al47            0.54700000 0.000      0.000      0.001      0.992
11 Ebert et al49            0.59400000 0.000      0.000      0.000      0.998
12 Enock et al23            0.28300000 0.094         NA         NA         NA
13 Faurholt-Jepsen et al22 -0.04500000 0.852         NA         NA         NA
14 Flett et al30            0.19950000 0.233         NA         NA         NA
15 Guo et al37              0.23300000 0.392         NA         NA         NA
16 Hall et al53             0.83100000 0.013      0.260      0.521      0.490
17 Harrer et al50           0.62500000 0.000      0.004      0.008      0.959
18 Heber et al51            0.63900000 0.000      0.000      0.000      0.999
19 Hirsch et al33           0.20400000 0.362         NA         NA         NA
20 Horsch et al48           0.57100000 0.001      0.010      0.020      0.923
21 Howells et al42          0.34700000 0.056         NA         NA         NA
22 Ivanova et al25          0.32550000 0.104         NA         NA         NA
23 Kahn et al44             0.43700000 0.042      0.832         NA      0.083
24 Kauer et al20           -0.10800000 0.626         NA         NA         NA
25 Kollei et al43           0.39000000 0.154         NA         NA         NA
26 Krafft et al18          -0.04025000 0.910         NA         NA         NA
27 Krzystanek et al21      -0.08500000 0.624         NA         NA         NA
28 Kuhn et al34             0.22800000 0.209         NA         NA         NA
29 Lee & Jung35             0.22900000 0.148         NA         NA         NA
30 Ludtke et al27           0.08300000 0.718         NA         NA         NA
31 Mistretta et al31        0.19500000 0.549         NA         NA         NA
32 Moell et al36            0.23000000 0.398         NA         NA         NA
33 Nobis et al54            0.85500000 0.000      0.000      0.000      1.000
34 Oh et al24               0.13200000 0.693         NA         NA         NA
35 Proudfoot et al41        0.38900000 0.001      0.013      0.026      0.910
36 Roepke et al40           0.54750000 0.056         NA         NA         NA
37 Stolz et al5             0.29700000 0.180         NA         NA         NA
38 Teng et al17            -0.14150000 0.593         NA         NA         NA
39 Tighe et al52            0.70700000 0.007      0.134      0.268      0.645
40 Versluis et al19        -0.01950000 0.935         NA         NA         NA
                           ppFlatHalf zSkewFull zSkewHalf zFlatFull zFlatHalf
1 Arean et al28                    NA        NA        NA        NA        NA
2 Bakker et al16                   NA        NA        NA        NA        NA
3 Birney et al32                   NA        NA        NA        NA        NA
4 Boettcher et al38                NA        NA        NA        NA        NA
5 Bostock et al45               0.958    -2.226    -1.943     1.341     1.728
6 Carissoli et al29                NA        NA        NA        NA        NA
7 Cox et al39                      NA        NA        NA        NA        NA
8 Dennis-Tiwary et al26            NA        NA        NA        NA        NA
9 Ebert et al46                 0.981    -2.671    -2.430     1.737     2.070
10 Ebert et al47                0.996    -3.426    -3.233     2.420     2.685
11 Ebert et al49                0.999    -3.876    -3.704     2.833     3.068
12 Enock et al23                   NA        NA        NA        NA        NA
13 Faurholt-Jepsen et al22         NA        NA        NA        NA        NA
14 Flett et al30                   NA        NA        NA        NA        NA
15 Guo et al37                     NA        NA        NA        NA        NA
16 Hall et al53                 0.762    -0.642     0.053    -0.025     0.713
17 Harrer et al50               0.981    -2.674    -2.432     1.739     2.072
18 Heber et al51                1.000    -4.294    -4.138     3.221     3.433
19 Hirsch et al33                  NA        NA        NA        NA        NA
20 Horsch et al48               0.964    -2.318    -2.045     1.423     1.797
21 Howells et al42                 NA        NA        NA        NA        NA
22 Ivanova et al25                 NA        NA        NA        NA        NA
23 Kahn et al44                    NA     0.962        NA    -1.383        NA
24 Kauer et al20                   NA        NA        NA        NA        NA
25 Kollei et al43                  NA        NA        NA        NA        NA
26 Krafft et al18                  NA        NA        NA        NA        NA
27 Krzystanek et al21              NA        NA        NA        NA        NA
28 Kuhn et al34                    NA        NA        NA        NA        NA
29 Lee & Jung35                    NA        NA        NA        NA        NA
30 Ludtke et al27                  NA        NA        NA        NA        NA
31 Mistretta et al31               NA        NA        NA        NA        NA
32 Moell et al36                   NA        NA        NA        NA        NA
33 Nobis et al54                1.000    -5.983    -5.869     4.814     4.964
34 Oh et al24                      NA        NA        NA        NA        NA
35 Proudfoot et al41            0.958    -2.229    -1.946     1.344     1.730
36 Roepke et al40                  NA        NA        NA        NA        NA
37 Stolz et al5                    NA        NA        NA        NA        NA
38 Teng et al17                    NA        NA        NA        NA        NA
39 Tighe et al52                0.834    -1.107    -0.618     0.371     0.970
40 Versluis et al19                NA        NA        NA        NA        NA

$EvidencePresent
[1] "yes"

$EvidenceAbsent
[1] "no"

$kInput
[1] 40

$kAnalyzed
[1] 12

$kp0.25
[1] 11

We can see that 12 out of 40 p-values were statistically significant at .05, presenting us with a nicely right skewed curve. We should worry too much about p-hacking etc.


p-uniform*

linardon_puniform_avg <- puni_star(yi = linardon_data_averaged$hedges_g, 
          vi = linardon_data_averaged$variance, 
          alpha = .05,
          side = "right", 
          method = "ML")

linardon_puniform_avg

Method: ML (k = 40; ksig = 12)

Estimating effect size p-uniform*

       est     ci.lb     ci.ub       L.0    pval
    0.4249    0.2972    0.5526   31.3024   <.001

===

Estimating between-study variance p-uniform*

      tau2   tau2.lb   tau2.ub     L.het      pval
    0.0303    0.0079     0.075    9.6924    0.0019

===

Publication bias test p-uniform*

      L.pb      pval
    2.4671    0.2913

The effect size estimate by p-uniform* yields a LARGER effect size estimate (g = 0.42) than the reported random-effects model or multilevel model, indicating that we might not need to worry about a large file drawer.



Conclusion

All in all, both meta-analyses provide solid and robust evidence for the efficacy of apps for the treatment of depression. The reported effect sizes by Linardon et al. can even be seen as a conservative estimate of the true effect size.

At the same time, the power-enhanced plots suggest that the entire field would benefit from more adequately powered studies—as is the case for most of psychology.