# Intro

The need for mental health interventions exceeds the supply everywhere. I became painfully familiar with this fact as we conducted a study in 2015 which led us to estimate that only a fifth of all people in need of psychotherapy can get the help they need (in Lower Austria).

Digital mental health interventions (DMHI) show promise to help those who cannot go to a therapist, be it due to the fact that there are simply not enough, they are not affordable, a fear of stigma, or many other reasons I would have never thought of. A worldwide pandemic that leads to billions of people self-isolating—as is the case at the time of writing—is one of them.

Two recent meta-analyses investigated the efficacy of such digital mental health interventions on a broad range of mental disorders. Both focused on smartphone apps, but one focused on standalone apps—no human therapist is involved—and the other on app-supported interventions—human therapists might be a part of the therapy process.

For depression, both meta-analyses come to similar conclusions: these interventions work. There are some mixed findings, but overall, they seem to help, which is good news. But for anxiety, they diverge: one meta-analysis provides support for their efficacy, the other does not.

To explore this overlapping and diverging results further, I conducted some additional analyses with data from the respective supplementary materials. I am curious if a multilevel meta-analysis leads to different results or if additional analyses might reveal anything of interest. If you are interested in a very short primer on multilevel meta-analyses, click here.

You will find a description and code on the following analyses in this post:

1. Meta-analysis with averaged effect sizes

2. Multilevel meta-analysis
3. Funnel plot variations
4. Baujat plot
5. Cumulative meta-analysis
6. *p*-curve

7. *p*-uniform*

In this first blog post I will compare the converging results on depression, and in a next post I will compare their diverging findings on anxiety. My main goal of these posts is to provide you with a detailed walk through of my code and the interpretation of the output while exploring both meta-analyses “a little deeper”. And when I say “a little deeper”, I mean way too deep.

# Meta-analysis on standalone smartphone apps for mental health: Depression

The first meta-analysis on digital mental health interventions I would like to play with, *Standalone smartphone apps for mental health—a systematic review and meta-analysis* by Weisel et al. 2019, addresses standalone apps, e.g. there is no human therapist involved in the treatment.

Main result from abstract:

Effects on depression (Hedges’ g = 0.33, 95%CI 0.10–0.57, P = 0.005, NNT = 5.43, I2 = 59%) and on smoking behavior (g = 0.39, 95%CI 0.21–0.57, NNT = 4.59, P ≤ 0.001, I2 = 0%) were significant. No significant pooled effects were found for anxiety, suicidal ideation, self-injury, or alcohol use (g = −0.14 to 0.18). Effect sizes for single trials ranged from g = −0.05 to 0.14 for PTSD and g = 0.72 to 0.84 for insomnia. Although some trials showed potential of apps targeting mental health symptoms, using smartphone apps as standalone psychological interventions cannot be recommended based on the current level of evidence.

So standalone apps are effective for depression and smoking cessation, but not for anxiety, suicidal ideation, self-injury, or alcohol use.

Let´s have a look at depression first. Here, I walk you through my code in detail so that you can see what I did and how I did it (which I won´t do in that much detail for the next post, don´t worry).

## Setup

We will need the following packages for our analyses:

```
library(readxl) # importing excel spreadsheets
library(tidyverse) # data wrangling
library(metafor) # meta-analyses & multilevel meta-analyses
library(metaviz) # advanced data visualization
library(dmetar) # p-curve function
library(puniform) # puniform
library(readxl) # importing excel spreadsheets
```

## Data wrangling

I created this data frame with values from Supplementary Table 4, and I already added the study and effect size IDs manually which I will need later on for the multilevel model. I extracted Hedges *g* and the variance to calculate the summary effect size, sample size if we need to calculate weighted average effect sizes to address effect size dependency, and *p*-values for the *p*-curve.

```
weisel_data <- tribble(
~study_name , ~study_id, ~es_id, ~hedges_g, ~N, ~p, ~variance,
"Arean 2016", 1, 1, 0.308, 129, .088, .033,
"Arean 2016", 1, 2, 0.072, 150, .678, .030,
"Birney 2016", 2, 3, 0.098, 199, .49, .020,
"Enock 2014", 3, 4, 0.48, 106, .034, .050,
"Enock 2014", 3, 5, -0.03, 220, .86, .020,
"Horsch 2017", 4, 6, 0.57, 151, .05, .027,
"Hur 2018", 5, 7, 0.662, 34, .055, .119,
"Kuhn 2017", 6, 8, 0.23, 120, .211, .033,
"Roepke 2015", 7, 9, 0.296, 140, .099, .032,
"Roepke 2015", 7, 10, 0.793, 144, .001, .034,
"Stolz 2018", 8, 11, 0.30, 90, .18, .050,
"Tighe 2017", 9, 12, 0.71, 61, .0007, .068
)
```

## Recreating the meta-analysis

Weisel et al. report two summary effects for depression:

When depression was the primary target of the intervention:

*g*= 0.33, 95%CI 0.10–0.57,*k*= 6,*N*= 796, I^{2}=59%, 95%*CI*0–83%.When depression was the primary or secondary outcome:

*g*= 0.34, 95%CI 0.18–0.49,*k*= 12,*N*= 1544, I^{2}= 53%, 95%*CI*10–76%.

In our analysis, we recreate only the second result in order to compare it to the second meta-analysis more closely, as Linardon et al. compared the treatment effect irrespective of outcome type.

```
weisel_ma <- rma.uni(data = weisel_data,
yi = hedges_g,
vi = variance,
method = "REML",
slab = study_name)
weisel_ma
```

```
Random-Effects Model (k = 12; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0388 (SE = 0.0318)
tau (square root of estimated tau^2 value): 0.1971
I^2 (total heterogeneity / total variability): 53.06%
H^2 (total variability / sampling variability): 2.13
Test for Heterogeneity:
Q(df = 11) = 23.4361, p-val = 0.0153
Model Results:
estimate se zval pval ci.lb ci.ub
0.3365 0.0797 4.2246 <.0001 0.1804 0.4926 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The `estimate`

provides the summary effect size with standard error(`se`

). The *z*-value is the corresponding test statistic, `pval`

is the corresponding *p*-value, `ci.lb`

the the lower bound of the confidence interval and `ci.ub`

the upper bound of the confidence interval. We conduct a random effects model with the REML estimator, and get the same result of *g* = 0.34 in favor of smartphone interventions compared to controls.

### Estimates of heterogeneity

Here we can see that heterogeneity is moderate-to-high with a Higgins *I ^{2}* = 53% and a slightly different 95%

*CI*[6%, 83%].

```
weisel_ma_confint <- confint(weisel_ma)
weisel_ma_confint
```

```
estimate ci.lb ci.ub
tau^2 0.0388 0.0020 0.1663
tau 0.1971 0.0449 0.4079
I^2(%) 53.0629 5.5399 82.8842
H^2 2.1305 1.0586 5.8426
```

### Forest Plot

Very similar forest plot to the paper, except it is depicted as a rain forest plot, which I prefer for several reasons outlined here: Finding your way out of the forest without a trail of bread crumbs: development and evaluation of two novel displays of forest plots.

```
weisel_forest <- viz_forest(weisel_ma,
summary_label = "Summary effect",
study_labels = weisel_data$study_name,
method = "REML",
xlab = "Hedges g",
variant = "rain",
annotate_CI = TRUE,
table_headers = "Hedges g [95% CI]")
weisel_forest
```

For more info on these types of data visualization for meta-analyses check out the metaviz package!

## Additional Analyses

Now that we have recreated the analyses from the paper, we can explore their results with these following additional analyses:

- Meta-analysis with averaged effect sizes

- Funnel Plots Variations
- Baujat Plot
- Cumulative Meta-Analysis
- Multilevel Meta-Analysis

*p*-curve

*p*-uniform*

### Addressing effect size dependency by averaging

#### Meta-analysis with averaged effect sizes

Several included studies reported multiple effect sizes, which introduces effect size dependency. This can be addressed by either averaging all effect sizes for each study, or conducting a hierarchical model/multilevel model. To check whether this changes the results in a meaningful way, I did both below.

```
weisel_data_averaged <-
weisel_data %>%
group_by(study_id) %>% # to average all studies per study_id
mutate(hedges_g = sum(hedges_g*N)/sum(N), # Hedges g are weighted by sample size
variance = sum(variance*N)/sum(N), # sample variances are weighted by sample size
N = mean(N)) %>% # sample size is averaged
distinct(study_id, # only 1 effect size per study is kept
.keep_all = TRUE)
weisel_data_averaged
```

```
# A tibble: 9 x 7
# Groups: study_id [9]
study_name study_id es_id hedges_g N p variance
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Arean 2016 1 1 0.181 140. 0.088 0.0314
2 Birney 2016 2 3 0.098 199 0.49 0.02
3 Enock 2014 3 4 0.136 163 0.034 0.0298
4 Horsch 2017 4 6 0.570 151 0.05 0.027
5 Hur 2018 5 7 0.662 34 0.055 0.119
6 Kuhn 2017 6 8 0.23 120 0.211 0.033
7 Roepke 2015 7 9 0.548 142 0.099 0.0330
8 Stolz 2018 8 11 0.3 90 0.18 0.05
9 Tighe 2017 9 12 0.71 61 0.0007 0.068
```

Now all effect sizes and variances are weighted by the sample size. I will use this data set for all additional analyses that follow below.

```
weisel_ma_avg <- rma.uni(data = weisel_data_averaged,
yi = hedges_g,
vi = variance,
method = "REML",
slab = study_name)
weisel_ma_avg
```

```
Random-Effects Model (k = 9; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0170 (SE = 0.0263)
tau (square root of estimated tau^2 value): 0.1306
I^2 (total heterogeneity / total variability): 32.20%
H^2 (total variability / sampling variability): 1.47
Test for Heterogeneity:
Q(df = 8) = 11.5812, p-val = 0.1709
Model Results:
estimate se zval pval ci.lb ci.ub
0.3347 0.0777 4.3085 <.0001 0.1824 0.4870 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We can see that the summary effect size is almost identical to the reported effect size from the published meta-analysis!

##### Estimates of heterogeneity

`confint(weisel_ma_avg)`

```
estimate ci.lb ci.ub
tau^2 0.0170 0.0000 0.1619
tau 0.1306 0.0000 0.4024
I^2(%) 32.2001 0.0000 81.8564
H^2 1.4749 1.0000 5.5116
```

Here you can see that heterogeneity has decreased slightly, indicating that effect sizes within studies differed from each other.

### Multilevel model

Now, we can (multi-)level up! We now have everything we need to calculate the multilevel meta-analysis: sampling variation for each effect size is on level 1, the variation of effect sizes within each study on level 2, and the variation of effect sizes across studies on level 3.

You might have noticed that we added two IDs: one for each study, and one for each effect size within each study. This is what metafors `rma.mv()`

function needs in order to work!

```
weisel_mv <- rma.mv(yi = hedges_g,
V = variance, # sampling variances are squared SE
random = list(~ 1 | es_id, # level 2
~ 1 | study_id), # level 3
tdist = TRUE, # Knap*p*-Hartung adjustment
data = weisel_data,
method = "REML",
slab = study_name)
summary(weisel_mv)
```

```
Multivariate Meta-Analysis Model (k = 12; method: REML)
logLik Deviance AIC BIC AICc
-1.3213 2.6425 8.6425 9.8362 12.0711
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0388 0.1971 12 no es_id
sigma^2.2 0.0000 0.0000 9 no study_id
Test for Heterogeneity:
Q(df = 11) = 23.4361, p-val = 0.0153
Model Results:
estimate se tval pval ci.lb ci.ub
0.3365 0.0797 4.2246 0.0014 0.1612 0.5118 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Again, we can see that the summary effect size is almost identical to the reported effect size in the meta-analysis! This indicates that no matter how you analyse the data, you will come to the same, robust, result.

##### Heterogeneity

`mlm.variance.distribution(x = weisel_mv)`

```
% of total variance I2
Level 1 4.693717e+01 ---
Level 2 5.306283e+01 53.06
Level 3 2.661847e-08 0
Total I2: 53.06%
```

Our estimate for heterogeneity can now be attributed to level 2 and level three, increasing the beneift of our third level.

### Sunset Funnel Plot

From: >The funnel plot is the most widely used diagnostic plot in meta-analysis, primarily to assess small-study effects. The sunset (power-enhanced) funnel plot incorporates study-level power information in the funnel display. This directly allows to examine the power studies had to detect an effect of interest (e.g., the observed meta-analytic summary effect), whether funnel plot asymmetry is driven by under-powered but significant studies, and to visually assess if there is an excess of low-powered significant effects in the meta-analysis (conceptually related to the test of excess significance, Ioannidis & Trikalinos, 2007). For effect sizes assumed to be normally distributed (e.g., Cohen d, log OR), the power corresponding to a given standard error is computed by using a two-sided Wald test and (by default) the meta-analytic summary effect as assumed true effect. Colored regions of different power levels and a second axis with study level power are shown in the funnel plot. In addition, power-related statistics are shown: a) The median power of all studies, b) the true effect size necessary such that the median power of the studies would have been 33% or 66%, c) results of a test of excess significance (Ioannidis & Trikalinos, 2007), and d) the R-Index for expected replicability (Schimmack, 2016).

`viz_sunset(weisel_ma_avg, true_effect = .3)`

You can see that the included studies had only a Power of maximal 60% to detect a study with *g* = .3, indicating that most studies were severely under-powered.

The median power of all studies was quite low with only 37.9%!

### Contour Enhanced Funnel Plot with Eggers Regression Line & Trim and Fill Method

Here, we can see imputed studies by the trim and fill method, the adjusted summary effect (dotted line), as well as Egger’s regression line. This can help to visually assess funnel plot asymmetry.

```
viz_funnel(weisel_ma_avg,
trim_and_fill = T,
egger = T,
method = "REML")
```

Currently, the usefulness of the trim and fill method is heavily debated, but still you can see where studies would need to be in a file drawer in order to enhance symmetry. Additionally, you can see that Eggers regression line (red dotted line) reveals a lot of asymmetry, yet does not reach statistical significance (as it is under-powered).

`regtest(weisel_ma_avg)`

```
Regression Test for Funnel Plot Asymmetry
model: mixed-effects meta-regression model
predictor: standard error
test for funnel plot asymmetry: z = 1.7311, p = 0.0834
```

### Baujat Plot

Baujat et al. (2002) proposed a diagnostic plot to detect sources of heterogeneity in meta-analytic data. The plot shows the contribution of each study to the overall Q-test statistic for heterogeneity on the horizontal axis versus the influence of each study (defined as the standardized squared difference between the overall estimate based on a fixed-effects model with and without the ith study included in the model) on the vertical axis.

`baujat(weisel_ma_avg, symbol = "slab")`

You can see that studies by Horsch and Birney were both highly influential on effect size heterogeneity and the summary effect!

### Cumulative Meta-Analysis Over Time

Cumulative meta-analyses can help visualizing, if interventions have improved over time.

First we need to arrange the data set after publication year and conduct another random effects model:

```
weisel_data_year <- weisel_data_averaged %>%
separate(study_name, c("author", "year")) %>% # extract year from study_name
mutate(study_name = paste(author,"et al.", year)) %>% # create new study name
mutate(year = as.numeric(year)) %>%
arrange(year)
weisel_ma_year <- rma.uni(data = weisel_data_year,
yi = hedges_g,
vi = variance,
method = "REML",
slab = study_name)
weisel_ma_year
```

```
Random-Effects Model (k = 9; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0170 (SE = 0.0263)
tau (square root of estimated tau^2 value): 0.1306
I^2 (total heterogeneity / total variability): 32.20%
H^2 (total variability / sampling variability): 1.47
Test for Heterogeneity:
Q(df = 8) = 11.5812, p-val = 0.1709
Model Results:
estimate se zval pval ci.lb ci.ub
0.3347 0.0777 4.3085 <.0001 0.1824 0.4870 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Now we can visualize the cumulative meta-analysis, in which each individual study is added, starting at the top, to calculate the summary effect, step by step.

```
weisel_forest_year <- viz_forest(weisel_ma_year,
summary_label = "Summary effect",
study_labels = weisel_data_year$study_name,
method = "REML",
xlab = "Hedges g",
variant = "rain",
annotate_CI = TRUE,
table_headers = "Hedges g [95% CI]",
type = "cumulative")
weisel_forest_year
```

It appears that studies report larger effect sizes in recent years, which MIGHT indicate, that these apps are becoming better over time.

*p*-curve

*p*-curve, the distribution of significant *p*-values, can be analyzed to assess if the findings have evidential value (in particular, if *p*-hacking and file-drawering can be ruled out).

Best practice would be to follow the *p*-curve User Guide and include only the “main” *p*-value reported in the paper. We cannot do this in our analysis, because it would be too time consuming, but still: we should only include 1 effect size from each study, so we just pretend that the averaged effect size would have been included.

Now we need to prepare the data for analysis:

```
weisel_data_pcurve_avg <- weisel_data_averaged %>%
mutate(seTE = sqrt(variance)) %>%
rename(TE = hedges_g,
studlab = study_name)
```

Now, `pcurve()`

from dmetar package calculates *p*-values from *SE* and effect sizes and conducts the *p*-curve, with graphical output.

`weisel_pcurve_avg <- pcurve(weisel_data_pcurve_avg)`

```
P-curve analysis
-----------------------
- Total number of provided studies: k = 9
- Total number of p<0.05 studies included into the analysis: k = 3 (33.33%)
- Total number of studies with p<0.025: k = 3 (33.33%)
Results
-----------------------
pBinomial zFull pFull zHalf pHalf
Right-skewness test 0.125 -2.928 0.002 -2.280 0.011
Flatness test 1.000 1.517 0.935 2.356 0.991
Note: p-values of 0 or 1 correspond to p<0.001 and p>0.999, respectively.
Power Estimate: 80% (29%-97.6%)
Evidential value
-----------------------
- Evidential value present: yes
- Evidential value absent/inadequate: no
```

`weisel_pcurve_avg`

```
$pcurveResults
pBinomial zFull pFull zHalf pHalf
Right-skewness test 0.125 -2.928 0.002 -2.280 0.011
Flatness test 1.000 1.517 0.935 2.356 0.991
$Power
powerEstimate powerLower powerUpper
0.8 0.29 0.976
$PlotData
p-value Observed (blue) Power 33% (Green) Flat (Red)
1 0.01 100 44.253 20
2 0.02 0 19.514 20
3 0.03 0 14.443 20
4 0.04 0 11.760 20
5 0.05 0 10.030 20
$Input
TE p ppSkewFull ppSkewHalf ppFlatFull ppFlatHalf
1 Arean 2016 0.1811183 0.307 NA NA NA NA
2 Birney 2016 0.0980000 0.488 NA NA NA NA
3 Enock 2014 0.1358282 0.431 NA NA NA NA
4 Horsch 2017 0.5700000 0.001 0.010 0.021 0.921 0.963
5 Hur 2018 0.6620000 0.055 NA NA NA NA
6 Kuhn 2017 0.2300000 0.205 NA NA NA NA
7 Roepke 2015 0.5480000 0.003 0.051 0.102 0.795 0.904
8 Stolz 2018 0.3000000 0.180 NA NA NA NA
9 Tighe 2017 0.7100000 0.006 0.129 0.259 0.651 0.837
zSkewFull zSkewHalf zFlatFull zFlatHalf
1 Arean 2016 NA NA NA NA
2 Birney 2016 NA NA NA NA
3 Enock 2014 NA NA NA NA
4 Horsch 2017 -2.310 -2.035 1.415 1.791
5 Hur 2018 NA NA NA NA
6 Kuhn 2017 NA NA NA NA
7 Roepke 2015 -1.633 -1.268 0.823 1.306
8 Stolz 2018 NA NA NA NA
9 Tighe 2017 -1.129 -0.646 0.389 0.983
$EvidencePresent
[1] "yes"
$EvidenceAbsent
[1] "no"
$kInput
[1] 9
$kAnalyzed
[1] 3
$kp0.25
[1] 3
```

We can see that 3 out of 9 *p*-values were statistically significant at .05, presenting us with a nicely right skewed and we should worry too much, i.e. about *p*-hacking.

*p*-uniform*

This meta-analysis method corrects for publication bias and is an extension of the classic *p*-uniform method. It now allows for estimation of the average effect size and the between-study variance in a meta-analysis, and uses both the statistically significant and non-significant effect sizes.

Again, we should only include one effect size per study, and include the averaged effect sizes:

```
weisel_puniform_avg <- puni_star(yi = weisel_data_averaged$hedges_g,
vi = weisel_data_averaged$variance,
alpha = .05,
side = "right",
method = "ML")
weisel_puniform_avg
```

```
Method: ML (k = 9; ksig = 3)
Estimating effect size p-uniform*
est ci.lb ci.ub L.0 pval
0.4473 0.2006 0.7025 11.3814 <.001
===
Estimating between-study variance p-uniform*
tau2 tau2.lb tau2.ub L.het pval
0.0146 0 0.103 0.7227 0.3952
===
Publication bias test p-uniform*
L.pb pval
0.7951 0.672
```

This method yields a LARGER effect size estimate than the reported random-effects model or multilevel model, indicating that we might not need to worry about a large file drawer.

# Meta-analysis on app‐supported smartphone interventions for mental health: Depression

Lets continue our exploration with the meta-analysis from Linardon et al. 2019, which had an even larger scope as it included app-supported smartphone studies, e.g. apps that were also partly supported by a therapist.

Main findings from abstract:

Smartphone interventions significantly outperformed control conditions in improving depressive (g=0.28, n=54) and generalized anxiety (g=0.30, n=39) symptoms, stress levels (g=0.35, n=27), quality of life (g=0.35, n=43), general psychiat- ric distress (g=0.40, n=12), social anxiety symptoms (g=0.58, n=6), and positive affect (g=0.44, n=6), with most effects being robust even after adjusting for various possible biasing factors (type of control condition, risk of bias rating). Smartphone interventions conferred no significant benefit over control conditions on panic symptoms (g=–0.05, n=3), post-traumatic stress symptoms (g=0.18, n=4), and negative affect (g=–0.08, n=5).

Interestingly, this yields a smaller effect size for depressive symptoms, even-though the involvement of a therapist usually leads to better outcomes. Lets explore why this might be the case.

## Data wrangling

I created this data frame with values from from Figure 2, clean the data, and add study/effect size idea with the code below:

```
linardon_data <- read_excel("data_linardon_depression.xlsx")
linardon_data <- linardon_data %>%
mutate(study_name =
str_trim(linardon_data$ref)) %>% # cleaning whitspace
arrange(study_name) %>% # arranging alphabetically
mutate(effect_id = row_number()) %>% # adding unique ID
group_by(study_name) %>%
mutate(study_id = group_indices(), # adding study size id
variance = as.numeric(variance)) %>%
select(study_id, effect_id, # reordering data frame
study_name, year, hedges_g, variance)
linardon_data
```

```
# A tibble: 54 x 6
# Groups: study_name [40]
study_id effect_id study_name year hedges_g variance
<int> <int> <chr> <dbl> <dbl> <dbl>
1 1 1 Arean et al28 2016 0.11 0.098
2 1 2 Arean et al28 2016 0.417 0.01
3 2 3 Bakker et al16 2018 -0.213 0.026
4 2 4 Bakker et al16 2018 -0.13 0.025
5 2 5 Bakker et al16 2018 0.225 0.026
6 3 6 Birney et al32 2016 0.197 0.013
7 4 7 Boettcher et al38 2018 0.296 0.029
8 5 8 Bostock et al45 2018 0.47 0.019
9 6 9 Carissoli et al29 2015 0.126 0.101
10 7 10 Cox et al39 2018 0.315 0.097
# … with 44 more rows
```

## Recreating the meta-analysis

Overall Linardon et al. found that:
> The pooled effect size for the 54 comparisons between smart- phone interventions and control conditions on depressive symptoms was *g* = 0.28 95% *CI* [0.21-0.36], with moderate heterogeneity I^{2} = 54% 95 *CI* [38; 66]

Linardon et al. included 37 standalone smartphone apps (g = 0.23 (0.15-0.31)), which is more than what Weisel included. It would be interesting to compare their overlap, which I might do in another blog post. Especially as the mention: > Studies that offered professional guidance (e.g., regular supportive text messages, phone calls, or personalized feedback from therapists or research staff) produced larger effect sizes than studies that did not offer guidance.

But lets recreate their meta-analysis first:

```
linardon_ma <- rma.uni(data = linardon_data,
yi = hedges_g,
vi = variance,
method = "REML")
linardon_ma
```

```
Random-Effects Model (k = 54; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0411 (SE = 0.0155)
tau (square root of estimated tau^2 value): 0.2028
I^2 (total heterogeneity / total variability): 55.12%
H^2 (total variability / sampling variability): 2.23
Test for Heterogeneity:
Q(df = 53) = 116.9372, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.2818 0.0395 7.1297 <.0001 0.2043 0.3593 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`confint(linardon_ma)`

```
estimate ci.lb ci.ub
tau^2 0.0411 0.0147 0.0707
tau 0.2028 0.1212 0.2659
I^2(%) 55.1162 30.4900 67.8672
H^2 2.2280 1.4386 3.1121
```

Thus, the reported effect size from the paper could be reproduced, again the Higgins I^{2} estimate and *CI* could not be reproduced, I suspect this might be due to differences in its calculation between R, which I use, and Comprehensive Meta-Analysis, which both authors used in their analyses.

### Forest Plot

Here, again, the rain forest plot similar to the publication.

```
linardon_forest <- viz_forest(linardon_ma,
summary_label = "Summary effect",
study_labels = linardon_data$study_name,
method = "REML",
xlab = "Hedges g",
variant = "rain",
annotate_CI = TRUE,
table_headers = "Hedges g [95% CI]")
linardon_forest
```

## Additional Analyses

### Addressing effect size dependency by averaging

Effect size dependency was addressed in the paper by conducting sensitivity analyses, as Linardon et al. state that:

In the previous analyses, we included a few trials in which more than one intervention condition was compared with the same control condition (or vice versa). These comparisons were not independent from each other, which may have artificially reduced the heterogeneity estimate and affected the pooled ef- fect size. To deal with this, we ran sensitivity analyses in which the comparison with the smallest effect size was only included in the analysis, and then repeated this again for the comparison with the largest effect size. These sensitivity analyses ensured that only one comparison per study was included in the meta- analysis. These sensitivity analyses yielded a pooled effect size highly similar to the overall effect (Table 1).

Another way to adress this issue is to weigh the effect sizes/variances by sample size to get only one average effect size per study. As no sample sizes were reported, I cannot do this. Or it could be done, but I won´t extract them all… so I just take the mean of *g* and the variances—which should not be done and is only an approximation!

```
linardon_data_averaged <- linardon_data %>%
group_by(study_id) %>% # to average all studies per study_id
mutate(hedges_g = mean(hedges_g), # mean of Hedges g
variance = mean(variance)) %>% # mean of sample variances
distinct(study_id, # only 1 effect size per study is kept
.keep_all = TRUE)
linardon_data_averaged
```

```
# A tibble: 40 x 6
# Groups: study_id [40]
study_id effect_id study_name year hedges_g variance
<int> <int> <chr> <dbl> <dbl> <dbl>
1 1 1 Arean et al28 2016 0.264 0.054
2 2 3 Bakker et al16 2018 -0.0393 0.0257
3 3 6 Birney et al32 2016 0.197 0.013
4 4 7 Boettcher et al38 2018 0.296 0.029
5 5 8 Bostock et al45 2018 0.47 0.019
6 6 9 Carissoli et al29 2015 0.126 0.101
7 7 10 Cox et al39 2018 0.315 0.097
8 8 11 Dennis-Tiwary et al26 2017 0.073 0.131
9 9 12 Ebert et al46 2018 0.528 0.02
10 10 13 Ebert et al47 2016 0.547 0.016
# … with 30 more rows
```

Taking this new data set with only one effect size per study, we can see that the summary effect becomes slightly larger…

```
linardon_ma_avg <- rma.uni(data = linardon_data_averaged,
yi = hedges_g,
vi = variance,
method = "REML",
slab = study_name)
linardon_ma_avg
```

```
Random-Effects Model (k = 40; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.0309 (SE = 0.0154)
tau (square root of estimated tau^2 value): 0.1758
I^2 (total heterogeneity / total variability): 47.37%
H^2 (total variability / sampling variability): 1.90
Test for Heterogeneity:
Q(df = 39) = 72.1007, p-val = 0.0010
Model Results:
estimate se zval pval ci.lb ci.ub
0.3285 0.0424 7.7418 <.0001 0.2453 0.4116 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

… while estimates of heterogeneity decrease slightly.

`confint(linardon_ma_avg)`

```
estimate ci.lb ci.ub
tau^2 0.0309 0.0064 0.0626
tau 0.1758 0.0799 0.2502
I^2(%) 47.3746 15.6836 64.5777
H^2 1.9002 1.1860 2.8231
```

### Multilevel model

```
linardon_mv <- rma.mv(yi = hedges_g,
V = variance,
random = list(~ 1 | effect_id, # level 2
~ 1 | study_id), # level 3
tdist = TRUE, # Knap*p*-Hartung adjustment
data = linardon_data,
method = "REML",
slab = study_name)
summary(linardon_mv)
```

```
Multivariate Meta-Analysis Model (k = 54; method: REML)
logLik Deviance AIC BIC AICc
-8.1081 16.2161 22.2161 28.1270 22.7059
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0186 0.1364 54 no effect_id
sigma^2.2 0.0232 0.1524 40 no study_id
Test for Heterogeneity:
Q(df = 53) = 116.9372, p-val < .0001
Model Results:
estimate se tval pval ci.lb ci.ub
0.2981 0.0422 7.0656 <.0001 0.2135 0.3827 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The effect size increased to 0.29 when we multi-level up, which indicates that the published ES is a robust estimate.

`mlm.variance.distribution(x = linardon_mv)`

```
% of total variance I2
Level 1 44.45717 ---
Level 2 24.69093 24.69
Level 3 30.85190 30.85
Total I2: 55.54%
```

The effect size heterogeneity stayed approximately the same as in the random effects model, but we can attribute it to between- and within-study variability!

#### Sunset Funnel Plot

`viz_sunset(linardon_ma_avg, true_effect = .3)`

Even-though we can see more adaquatly powered studies, the median power of all studies was even lower than in the first meta-analysis: only 28%!

#### Contour Enhanced Funnel Plot with Eggers Regression Line & Trim and Fill Method

Here, we can see imputed studies by the trim and fill method, the adjusted summary effect (dotted line), as well as Egger’s regression line. This can help to visually assess funnel plot asymmetry.

```
viz_funnel(linardon_ma_avg,
trim_and_fill = T,
egger = T,
method = "REML")
```

Interestingly, the asymmetry goes away from a null effect to an even larger effect. A file drawer in this direction seems highly unlikely so there is not too much to worry about here.

#### Baujat Plot

`baujat(linardon_ma_avg, symbol = "slab")`

Nobis et al. has a strong influence both on the summary effect size as well as overall heterogeneity. A closer look might be interesting.

### Cumulative Meta-Analysis

Now we can look at the change of effect sizes over time.

```
linardon_data_year <- linardon_data %>%
arrange(year)
linardon_ma_year <- rma.uni(data = linardon_data_year,
yi = hedges_g,
vi = variance,
method = "REML",
slab = study_name)
linardon_forest_year <- viz_forest(linardon_ma_year,
summary_label = "Summary effect",
study_labels = linardon_data_year$study_name,
method = "REML",
xlab = "Hedges g",
variant = "rain",
annotate_CI = TRUE,
table_headers = "Hedges g [95% CI]",
type = "cumulative")
linardon_forest_year
```

The effect sizes decreased over time, but remain stable around *g* = 0.3.

*p*-curve

```
linardon_data_pcurve_avg <- linardon_data_averaged %>%
mutate(seTE = sqrt(variance)) %>%
rename(TE = hedges_g,
studlab = study_name)
linardon_pcurve_avg <- pcurve(linardon_data_pcurve_avg)
```

```
P-curve analysis
-----------------------
- Total number of provided studies: k = 40
- Total number of p<0.05 studies included into the analysis: k = 12 (30%)
- Total number of studies with p<0.025: k = 11 (27.5%)
Results
-----------------------
pBinomial zFull pFull zHalf pHalf
Right-skewness test 0.003 -8.800 0 -8.535 0
Flatness test 0.982 5.725 1 7.608 1
Note: p-values of 0 or 1 correspond to p<0.001 and p>0.999, respectively.
Power Estimate: 96% (87.7%-98.7%)
Evidential value
-----------------------
- Evidential value present: yes
- Evidential value absent/inadequate: no
```

`linardon_pcurve_avg`

```
$pcurveResults
pBinomial zFull pFull zHalf pHalf
Right-skewness test 0.003 -8.800 0 -8.535 0
Flatness test 0.982 5.725 1 7.608 1
$Power
powerEstimate powerLower powerUpper
0.96 0.877 0.987
$PlotData
p-value Observed (blue) Power 33% (Green) Flat (Red)
1 0.01 83.333 44.253 20
2 0.02 8.333 19.514 20
3 0.03 0.000 14.443 20
4 0.04 0.000 11.760 20
5 0.05 8.333 10.030 20
$Input
TE p ppSkewFull ppSkewHalf ppFlatFull
1 Arean et al28 0.26350000 0.257 NA NA NA
2 Bakker et al16 -0.03933333 0.806 NA NA NA
3 Birney et al32 0.19700000 0.084 NA NA NA
4 Boettcher et al38 0.29600000 0.082 NA NA NA
5 Bostock et al45 0.47000000 0.001 0.013 0.026 0.910
6 Carissoli et al29 0.12600000 0.692 NA NA NA
7 Cox et al39 0.31500000 0.312 NA NA NA
8 Dennis-Tiwary et al26 0.07300000 0.840 NA NA NA
9 Ebert et al46 0.52800000 0.000 0.004 0.008 0.959
10 Ebert et al47 0.54700000 0.000 0.000 0.001 0.992
11 Ebert et al49 0.59400000 0.000 0.000 0.000 0.998
12 Enock et al23 0.28300000 0.094 NA NA NA
13 Faurholt-Jepsen et al22 -0.04500000 0.852 NA NA NA
14 Flett et al30 0.19950000 0.233 NA NA NA
15 Guo et al37 0.23300000 0.392 NA NA NA
16 Hall et al53 0.83100000 0.013 0.260 0.521 0.490
17 Harrer et al50 0.62500000 0.000 0.004 0.008 0.959
18 Heber et al51 0.63900000 0.000 0.000 0.000 0.999
19 Hirsch et al33 0.20400000 0.362 NA NA NA
20 Horsch et al48 0.57100000 0.001 0.010 0.020 0.923
21 Howells et al42 0.34700000 0.056 NA NA NA
22 Ivanova et al25 0.32550000 0.104 NA NA NA
23 Kahn et al44 0.43700000 0.042 0.832 NA 0.083
24 Kauer et al20 -0.10800000 0.626 NA NA NA
25 Kollei et al43 0.39000000 0.154 NA NA NA
26 Krafft et al18 -0.04025000 0.910 NA NA NA
27 Krzystanek et al21 -0.08500000 0.624 NA NA NA
28 Kuhn et al34 0.22800000 0.209 NA NA NA
29 Lee & Jung35 0.22900000 0.148 NA NA NA
30 Ludtke et al27 0.08300000 0.718 NA NA NA
31 Mistretta et al31 0.19500000 0.549 NA NA NA
32 Moell et al36 0.23000000 0.398 NA NA NA
33 Nobis et al54 0.85500000 0.000 0.000 0.000 1.000
34 Oh et al24 0.13200000 0.693 NA NA NA
35 Proudfoot et al41 0.38900000 0.001 0.013 0.026 0.910
36 Roepke et al40 0.54750000 0.056 NA NA NA
37 Stolz et al5 0.29700000 0.180 NA NA NA
38 Teng et al17 -0.14150000 0.593 NA NA NA
39 Tighe et al52 0.70700000 0.007 0.134 0.268 0.645
40 Versluis et al19 -0.01950000 0.935 NA NA NA
ppFlatHalf zSkewFull zSkewHalf zFlatFull zFlatHalf
1 Arean et al28 NA NA NA NA NA
2 Bakker et al16 NA NA NA NA NA
3 Birney et al32 NA NA NA NA NA
4 Boettcher et al38 NA NA NA NA NA
5 Bostock et al45 0.958 -2.226 -1.943 1.341 1.728
6 Carissoli et al29 NA NA NA NA NA
7 Cox et al39 NA NA NA NA NA
8 Dennis-Tiwary et al26 NA NA NA NA NA
9 Ebert et al46 0.981 -2.671 -2.430 1.737 2.070
10 Ebert et al47 0.996 -3.426 -3.233 2.420 2.685
11 Ebert et al49 0.999 -3.876 -3.704 2.833 3.068
12 Enock et al23 NA NA NA NA NA
13 Faurholt-Jepsen et al22 NA NA NA NA NA
14 Flett et al30 NA NA NA NA NA
15 Guo et al37 NA NA NA NA NA
16 Hall et al53 0.762 -0.642 0.053 -0.025 0.713
17 Harrer et al50 0.981 -2.674 -2.432 1.739 2.072
18 Heber et al51 1.000 -4.294 -4.138 3.221 3.433
19 Hirsch et al33 NA NA NA NA NA
20 Horsch et al48 0.964 -2.318 -2.045 1.423 1.797
21 Howells et al42 NA NA NA NA NA
22 Ivanova et al25 NA NA NA NA NA
23 Kahn et al44 NA 0.962 NA -1.383 NA
24 Kauer et al20 NA NA NA NA NA
25 Kollei et al43 NA NA NA NA NA
26 Krafft et al18 NA NA NA NA NA
27 Krzystanek et al21 NA NA NA NA NA
28 Kuhn et al34 NA NA NA NA NA
29 Lee & Jung35 NA NA NA NA NA
30 Ludtke et al27 NA NA NA NA NA
31 Mistretta et al31 NA NA NA NA NA
32 Moell et al36 NA NA NA NA NA
33 Nobis et al54 1.000 -5.983 -5.869 4.814 4.964
34 Oh et al24 NA NA NA NA NA
35 Proudfoot et al41 0.958 -2.229 -1.946 1.344 1.730
36 Roepke et al40 NA NA NA NA NA
37 Stolz et al5 NA NA NA NA NA
38 Teng et al17 NA NA NA NA NA
39 Tighe et al52 0.834 -1.107 -0.618 0.371 0.970
40 Versluis et al19 NA NA NA NA NA
$EvidencePresent
[1] "yes"
$EvidenceAbsent
[1] "no"
$kInput
[1] 40
$kAnalyzed
[1] 12
$kp0.25
[1] 11
```

We can see that 12 out of 40 *p*-values were statistically significant at .05, presenting us with a nicely right skewed curve. We should worry too much about *p*-hacking etc.

*p*-uniform*

```
linardon_puniform_avg <- puni_star(yi = linardon_data_averaged$hedges_g,
vi = linardon_data_averaged$variance,
alpha = .05,
side = "right",
method = "ML")
linardon_puniform_avg
```

```
Method: ML (k = 40; ksig = 12)
Estimating effect size p-uniform*
est ci.lb ci.ub L.0 pval
0.4249 0.2972 0.5526 31.3024 <.001
===
Estimating between-study variance p-uniform*
tau2 tau2.lb tau2.ub L.het pval
0.0303 0.0079 0.075 9.6924 0.0019
===
Publication bias test p-uniform*
L.pb pval
2.4671 0.2913
```

The effect size estimate by *p*-uniform* yields a LARGER effect size estimate (*g* = 0.42) than the reported random-effects model or multilevel model, indicating that we might not need to worry about a large file drawer.

# Conclusion

All in all, both meta-analyses provide solid and robust evidence for the efficacy of apps for the treatment of depression. The reported effect sizes by Linardon et al. can even be seen as a conservative estimate of the true effect size.

At the same time, the power-enhanced plots suggest that the entire field would benefit from more adequately powered studies—as is the case for most of psychology.