The Hamilton Depression Rating Scale (Ham-D) is the most widely used clinician-administered assessment scale. In use since the 1960s, it is seen as the “gold standard” for assessing depression. As such, it was the assessment tool of choice when antidepressant clinical trials were being done. The only problem was many, if not most, of the antidepressants that came to market had a statistically significant effect on the Ham-D that was not observable clinically. Irving Kirsch called this a “dirty little secret.” Both the pharmaceutical companies bringing the drugs to market and the FDA knew there was essentially no difference between the effects of the drug and the placebo used in the clinical trial.
Kirsch’s research into the placebo effect with antidepressants has been established and repeatedly replicated; it wasn’t a fluke, one-and-done study. Search his name in Google or start here with “Dirty Little Secret,” “Modern Alchemy with Antidepressants,” or “Do No Harm with Antidepressants.” Kirsch showed clearly that study participants were able to regularly break the double blind methodology of the clinical trials because the researchers continually used inert placebos. The real drugs given to study participants produced side effects; the inert placebo pills didn’t. All you had to do was pay attention to any side effects you may or may not exhibit to have about a 75% chance of accurately predicting whether you were in the experimental group of the control group.
But Swedish researchers suggested that the reason SSRI antidepressants haven’t performed better than placebo was because they were measured incorrectly. Heironymus et al. said that if 16 of the 17 items in the Ham-D were ignored and only the single item assessing depressed mood was utilized, “scientifically valid support for the tested drug being antidepressant” could be shown. They said their decision to focus on depressed mood was because it was one of the two key symptoms required by the DSM-IV definition of depression; and it is given particular importance by the FDA when the agency evaluates the efficacy of an antidepressant.
While not claiming that assessing depressed mood only is the optimal way of recording symptom severity, or that other symptoms are irrelevant, we do suggest that a treatment faithfully outperforming placebo in reducing depressed mood can hardly be regarded as ineffective.
Perceived flaws with the Ham-D have been pointed out by previous researchers. In 2004, Bagby et al. looked at the psychometric properties of the Ham-D and found the internal, interrater, and retest reliability estimates overall were mostly good. However, many of the individual scale items were poor contributors when measuring the severity of depression. Some had poor interrater and retest reliability. “For many items, the format for response options is not optimal.” They concluded that:
Evidence suggests that the Hamilton depression scale is psychometrically and conceptually flawed. The breadth and severity of the problems militate against efforts to revise the current instrument. After more than 40 years, it is time to embrace a new gold standard for assessment of depression.
They said many of the individual items were poorly designed and add up to a total score whose meaning was unclear. At the very least, they thought the Ham-D needed a complete overhaul of its items. The researchers thought it was time to retire the Ham-D, as it is measuring a conception of depression that is several decades old. “The field needs to move forward and embrace a new gold standard that incorporates modern psychometric methods and contemporary definitions of depression.” In other words, the new gold standard needs to include current DSM symptoms.
I wonder if the Bagby et al. study may be suggesting we set aside the Ham-D prematurely. A cursory comparison of the Ham-D and the current edition of the DSM, the DSM-5, suggested there is a good bit of overlap. An article by Michael Schreiner, “Major Depressive Disorder DSM 5 Criteria,” gives the DSM-5 diagnostic criteria; the Ham-D is described here in a NIH document.
The DSM-5 lists nine potential symptoms of depression, five of which are required to exist within a two-week period of time for a diagnosis of major depression. The symptoms have to cause clinically significant distress or impairment in social, occupational or other important areas of functioning. Every one of the nine symptoms is mentioned one way or another within the Ham-D.
The nine DSM-5 symptoms are:
1. Depressed mood most of the day, almost every day, indicated by your own subjective report or by the report of others. This mood might be characterized by sadness, emptiness, or hopelessness.
2. Markedly diminished interest or pleasure in all or almost all activities most of the day nearly every day.
3. Significant weight loss when not dieting or weight gain.
4. Inability to sleep or oversleeping nearly every day.
5. Psychomotor agitation or retardation nearly every day.
6. Fatigue or loss of energy nearly every day.
7. Feelings of worthlessness or excessive or inappropriate guilt (which may be delusional) nearly every day.
8. Diminished ability to think or concentrate, or indecisiveness, nearly every day.
9. Recurrent thoughts of death (not just fear of dying), recurrent suicidal ideation without a specific plan, or a suicide attempt or a specific plan for committing suicide.
There are clear differences between them as well. The Ham-D scale devoted three items to sleep disturbance; the DSM-5, only one. Some items in the Ham-D, like agitation and retardation (slowness of thought and speech) were mixed into two different symptoms in the DSM-5. Hypocondriasis was in the Ham-D, but not the DSM-5. The Ham-D item on “work and activities” was broken into a symptom of fatigue and also appeared in a separate category B: “Symptoms cause clinically significant distress or impairment in social, occupational or other important areas of functioning.”
The case for a more psychometrically sensitive depression scale, one that has a greater correspondence to how depression is currently diagnosed, makes some sense. But is the problem really that a more effective diagnostic scale needs to be developed? Perhaps the issue is that the Ham-D hasn’t had a very good track record in demonstrating the effectiveness of antidepressant drugs. So researchers and pharmaceutical companies would like a scale that more clearly demonstrates efficacy with medications than the Ham-D. Or maybe the Ham-D is being scapegoated for the failures of antidepressant drugs. Then again, maybe the problem is trying to “treat” a complex human condition like depression by manipulating one or two neurotransmitters with antidepressants.