In a small clinical study published a few weeks ago, researchers didn’t find much difference between the three treatment groups of depressed subjects they studied — a group that received antidepressant medications, a group that received a specific type of not-commonly-practiced psychodynamic psychotherapy, and a group that received a sugar pill.
But there were some serious issues with this study from the onset, issues that call into question not only the generalizability of the results, but also their validity. It’s a shame that Reuters, who picked up on the study just yesterday, glossed over the methodology problems of the study, and instead just repeated the results as a shiny new established fact.
And easily lost in the discussion is the best result of them all — 16 weeks was all that was needed for most people in the study (who completed it) to find improvement in the symptoms of their depression, no matter what the treatment.
Let’s see what went wrong, and what the study actually tells us…
The researchers (Barber et al., 2011) studied three treatment options — a form of short-term dynamic psychotherapy, something called supportive-expressive therapy; two types of antidepressant medications (first sertraline [Zoloft], and then if no response after 8 weeks, venlafaxine extended release [Effexor ER]); and a sugar pill (otherwise known as a placebo). It was a pretty traditional three-arm study, with the good ‘ole Hamilton rating scale used as the measurement of treatment response (“Response at 16 weeks was defined as HRSD17 score â‰¤ 9 or 50% HRSD17 score reduction and a HRSD17 score â‰¤ 12.”).
You know the study’s in trouble right off the bat when the researchers start out in the 6th paragraph by noting the problems with recruiting the number of subjects needed:
A planned sample size of 180 was determined through a method that accounts for increased statistical power in repeated-measures designs. Due to slower-than-anticipated recruitment, 156 patients (SET: n = 51; MED: n = 55; PBO: n = 50) were randomized. This sample afforded detection of a medium effect size of 0.48 with power > 80% when comparing MED or SET to PBO over the longitudinal period.
But it’s worse than the researchers let on… In the two pill groups (medication and placebo), the drop-out rate was 40 percent of subjects, which left much smaller numbers to analyze — just 91 subjects completed the study. This is half the number the researchers themselves said they needed to run the study. Ouch.
What this means to science is that the study is less able to detect positive relationships in the data, and is more open to error where a few data points might inadvertently skew the results. The researchers argue that since others have argued you only need a group size of 5 to 7, it’s okay. They also say it’s okay they lost so many subjects due to attrition because, well, that’s what other studies have shown when your subject pool is more ethnically diverse. Neither of these are very persuasive arguments.
Although the researchers didn’t get to their pre-defined target response rates, all groups showed a decline in depression symptoms over time of anywhere from 2 to 8 points on the Hamilton rating scale they used.
About 30 percent of subjects were classified as treatment “responders” in the two treatment groups; 24 percent responded in the placebo group. Although this doesn’t seem to quite jive with the Reuters headline, “Antidepressant, talk therapy fail to beat placebo,” it does because the differences between the groups weren’t statistically significant (although the psychotherapy group experienced about a little less than half the number of drop-outs in treatment than the other two groups — a pretty significant difference if you ask me).
So, rather than poor subject group sizes and large attrition rates, what do the researchers attribute their findings to?
Rather than study design or power issues, the relatively low efficacy and response rates are most likely due to characteristics unique to this sample. Unlike most efficacy trials, our sample comprised economically disadvantaged, highly comorbid, chronic, recurrently depressed, urban patients.
Indeed, that could be a reasonable explanation, since most drug trials are conducted on relatively “clean” and well-filtered patients. Researchers are usually careful to pre-select their patients, to have the greatest likelihood of achieving a positive result.
The recruitment process typically goes something like this… Have more than one diagnosis? You can’t be in my research. Been through multiple prior treatments? Gone. Recurrent depression? Gone.
While this makes a researcher’s data more “pure” (less likely to be tainted by other factors that might affect the results in some unknown manner), it also makes it a lot less like the real world. In the real world, people come to professionals with multiple problems, lots of previously-failed treatments, and other complex issues.
We’re left with a study that didn’t meet its own subject recruitment target, lost another 42 percent of its subjects while the study was underway, and then didn’t really find any differentiation between its three treatment groups.
This research may best demonstrate that when you try and run a “real world” research trial, don’t be surprised by the less-than-overwhelming results — a fact known by most clinicians and long-term patients for decades. It also demonstrates the difficulty of conducting such “real world” research, and what happens when you’re not paying attention to recruitment and attrition problems as they arise.
Barber, J.P., Barrett, M.S., Gallop, R., Rynn, R.A., Rickels, K. (2011). Short-Term Dynamic Psychotherapy Versus Pharmacotherapy for Major Depressive Disorder: A Randomized, Placebo-Controlled Trial. The Journal of Clinical Psychiatry. doi:10.4088/JCP.11m06831
Read the horrible Reuters Article: Antidepressant, Talk Therapy Fail to Beat Placebo