University of Sydney

Example: Asthma

A large group of infants with mild respiratory problems (but asthma free) is split into those with a family history of hay fever, and those without.

Random samples of 85 from the first group and 405 from the second group are selected for special study. Of these, the number diagnosed with asthma by the age of 12 are 25 and 70 respectively.

\[ \begin{array}{c|cc|c} & \mbox{Asthma} & \mbox{No Asthma} & \\ \hline \mbox{Hay fever} & 25 & 60 & \bf{85}\\ \mbox{No hay fever} & 70 & 335 & \bf{405} \\ \hline & 95 & 395 & \\ \end{array} \]

Does a family history of hay fever increase the risk of developing asthma?

Example: Hodgkin's disease

In 1971, Vianna et al. collected data on a group of 101 patients suffering from Hodgkin's disease and a comparable control group of 107 non-Hodgkin's patients. They were interested in the effect of tonsil tissue as a barrier to Hodgkin's disease. They found that in the Hodgkin's disease group, there had been 67 tonsillectomies. The corresponding figure for the non-Hodgkin's patients was 43.

\[ \begin{array}{c|cc|c} & \mbox{Hodgkin's} & \mbox{disease} & \\ \mbox{Tonsillectomy} & \mbox{Yes} & \mbox{No} & \\ \hline \mbox{Yes} & 67 & 43 & 110\\ \mbox{No} & 34 & 64 & 98 \\ \hline & \bf{101} & \bf{107} & \\ \end{array} \]

Does having a tonsillectomy increase your risk of developing Hodgkin's disease?

Prospective and Retrospective studies

How a study is conducted over time will effect various conditional probabilities.

To illustrate this we will use the following symbols.

  • \(D^+\) is the event that an individual has a particular disease.
  • \(D^-\) is the event that an individual does not have a particular disease.
  • \(R^+\) is the event that an individual has a risk factor.
  • \(R^-\) is the event that an individual does not have a risk factor.

Prospective (or Cohort Study) studies

A study design where one or more samples (called cohorts) are followed prospectively and subsequent status evaluations with respect to a disease or outcome are conducted to determine which initial participants exposure characteristics (risk factors) are associated with it. As the study is conducted, an outcome from participants in each cohort is measured and relationships with specific characteristics determined.


More simply:

  • A prospective study is based on subjects who are initially identified as disease-free and classified by presence or absence of a risk factor.

  • A random sample from each group is followed in time (prospectively) until eventually classified by disease outcome.

Retrospective (or Case Control) studies

A study that compares patients who have a disease or outcome of interest (cases) with patients who do not have the disease or outcome (controls), and looks back retrospectively to compare how frequently the exposure to a risk factor is present in each group to determine the relationship between the risk factor and the disease.


More simply:

  • A retrospective study is based on random samples from each of the two outcome categories which are followed back (retrospectively) to determine the presence or absence of the risk factor for each individual.

Relative risk and odds ratios

These are different ways to measure the association between a risk factor/treatment and the disease outcome.

How the data is sampled will greatly impact the ways in which these methods are applicable and interpretable.

Relative Risk – Prospective studies

Consider the Table \[ \begin{array}{c|cc|c} & D^+ & D^- & \mbox{Total} \\ \hline R^+ & a & b & {\bf a+b} \\ R^- & c & d & {\bf c+d} \\ \hline & a+c & b+d & a+b+c+d \\ \end{array} \]

If this is data from a prospective study or from a sample of completed records we can estimate these

  • \(\displaystyle P(D^+|R^+) = \frac{a}{a+b}\)

  • \(\displaystyle P(D^+|R^-) = \frac{c}{c+d}\)

  • \(\displaystyle \widehat{RR} = \frac{P(D^+|R^+)}{P(D^+|R^-)} = \frac{a(c+d)}{c(a+b)}\)

Relative Risk - Retrospective studies

Consider the Table \[ \begin{array}{c|cc|c} & D^+ & D^- & \mbox{Total} \\ \hline R^+ & a & b & a+b \\ R^- & c & d & c+d \\ \hline & {\bf a+c} & {\bf b+d} & a+b+c+d \\ \end{array} \]

If this is data from a retrospective study we cannot estimate

  • \(\displaystyle P(D^+|R^+)\)

  • \(\displaystyle P(D^+|R^-)\)

  • \(\displaystyle RR = \frac{P(D^+|R^+)}{P(D^+|R^-)}\)

because \(a+c\) and \(b+d\) are fixed in advance. We can estimate \(P(R^+|D^+)\) and \(P(R^-|D^+)\) but these are not used in the calculation of relative risk.

Example: Aspirin

The table below is from New England Journal of Medicine, (1984) 318, 262–4. The data come from a 5 year (blind) study into the effect of taking aspirin every second day on the incidence of heart attacks.

\[ \begin{array}{l|cc|r} & \mbox{Myocardial} & \mbox{Infarction} & \\ & \mbox{Yes} (D^+) &\mbox{No} (D^-) & \\ \hline \mbox{Aspirin} (R^+) & 104 &10,933 &11,037 \\ \mbox{Placebo} (R^-) & 189 &10,845 &11,034 \\ \hline &293 &21,778 &22,071 \\ \end{array} \]

Example: Aspirin

\[ \begin{array}{l|cc|r} & \mbox{Myocardial} & \mbox{Infarction} & \\ & \mbox{Yes} (D^+) &\mbox{No} (D^-) & \\ \hline \mbox{Aspirin} (R^+) & 104 &10,933 &11,037 \\ \mbox{Placebo} (R^-) & 189 &10,845 &11,034 \\ \hline &293 &21,778 &22,071 \\ \end{array} \]

The estimates for the proportion of each subpopulation having heart attacks are

\[ \begin{array}{rl} P(D^+|R^+) & = \frac{104}{10,933 + 104} = 0.0094 \\ P(D^+|R^-) & = \frac{189}{10,845 + 189} = 0.0171 \end{array} \]

The estimated relative risk in the above case is \(0.0094/0.017 = 0.55\). Hence, you are roughly half as likely to have Myocardial Infarction if you take aspirin.

Relative risk – Interpretation

Note that the relative risk: \[ \displaystyle RR = \frac{P(D^+|R^+)}{P(D^+|R^-)} \]

is a ratio of two probabilities. Since probabilities are bounded between 0 and 1 \[ \displaystyle RR = \frac{P(D^+|R^+)}{P(D^+|R^-)} \to \infty \quad \mbox{as} \quad P(D^+|R^-) \to 0, \]

\[ \displaystyle RR= \frac{P(D^+|R^+)}{P(D^+|R^-)} \to 0 \quad \mbox{as} \quad P(D^+|R^+) \to 0, \]

and \[ RR \approx 1 \quad \mbox{when} \quad P(D^+|R^+) \approx P(D^+|R^-) \]

If \(D\) and \(R\) are independent then \(P(D|R) = P(D)\) and so \[ RR = \frac{P(D^+|R^+)}{P(D^+|R^-)} = \frac{P(D^+)}{P(D^+)} = 1 \]

Odds Ratio

Many health workers prefer to work with the Odds Ratio, often denoted \(OR\), rather than the relative risk.

The odds of success is the ratio of the chance of success, say \(p\), to the chance of failure, \(1-p\) and is given by \[ \mbox{Odds of Success} = \frac{p}{1-p}. \]

For example, if the success probability is \(\frac{1}{3}\), the odds of success is \[ \frac{1}{3}\big/\frac{2}{3} = \frac{1}{2} \]

In the risk/disease setting, the chance of disease for \(R^+\) patients is \(p=P(D^+|R^+)\).

Equivalent Definitions of Odds Ratio

The ratio of the odds of a disease for \(R^+\) patients to the corresponding odds for \(R^-\) patients is the odds ratio, \(OR:\) \[ \mbox{Definition 1:} \ \ OR = \frac{P(D^+|R^+)}{P(D^-|R^+)} \Big/ \frac{P(D^+|R^-)}{P(D^-|R^-)}. \]

We can show that this ratio is identical to \[ \mbox{Definition 2:} \ \ OR = \frac{P(R^+|D^+)}{P(R^-|D^+)} \Big/ \frac{P(R^+|D^-)}{P(R^-|D^-)}. \]

This means that \(OR\) can be found form all three types of study (observational, prospective or retrospective), unlike \(RR\).

Odds Ratios

Consider the table

\[ \begin{array}{c|cc|c} & D^+ & D^- & \mbox{Total} \\ \hline R^+ & a & b & a+b \\ R^- & c & d & c+d \\ \hline & a+c & b+d & a+b+c+d \\ \end{array} \]


\[ \mbox{Def. 1:} \ OR = \frac{P(D^+|R^+)}{P(D^-|R^+)} \Big/ \frac{P(D^+|R^-)}{P(D^-|R^-)} = \left(\frac{\frac{a}{a+b}}{\frac{b}{a+b}}\right) \Big/ \left( \frac{\frac{c}{c+d}}{\frac{d}{c+d}} \right) = \frac{ad}{bc} \]

\[ \mbox{Def. 2:} \ OR = \frac{P(R^+|D^+)}{P(R^-|D^+)} \Big/ \frac{P(R^+|D^-)}{P(R^-|D^-)} = \left(\frac{\frac{a}{a+c}}{\frac{c}{a+c}}\right) \Big/ \left( \frac{\frac{b}{b+d}}{\frac{d}{b+d}} \right) = \frac{ad}{bc} \]

Odds ratio – Interpretation

Note that the odds ratio: \[ OR = \frac{P(D^+|R^+)}{P(D^-|R^+)} \Big/ \frac{P(D^+|R^-)}{P(D^-|R^-)} = \frac{ad}{bc} \]

is a ratio of two probabilities. If \(D\) and \(R\) are independent then \(P(D|R) = P(D)\) and \[ OR = \frac{P(D^+|R^+)}{P(D^-|R^+)} \Big/ \frac{P(D^+|R^-)}{P(D^-|R^-)} = \frac{P(D^+)}{P(D^-)} \Big/ \frac{P(D^+)}{P(D^-)} = 1 \]

It can be shown that \(OR = 1\) if and only if \(D\) and \(R\) are independent (there is no relationship between risk and disease.)

Large odds ratios (\(OR>1\)) implies increased risk of disease and small odd ratios (\(OR<1\)) implies decreased risk of disease.

Example: Aspirin

\[ \begin{array}{l|cc|r} & \mbox{Myocardial} & \mbox{Infarction} & \\ & \mbox{Yes} (D^+) &\mbox{No} (D^-) & \\ \hline \mbox{Aspirin} (R^+) & 104 &10,933 &11,037 \\ \mbox{Placebo} (R^-) & 189 &10,845 &11,034 \\ \hline &293 &21,778 &22,071 \\ \end{array} \]

The odds ratio is

\[OR = \frac{104 \times 10,845}{189\times 10,933} = 0.55\]

The estimated odds of heart attack for patients taking the aspirin is \(0.55\) times the estimated odds for those taking the placebo.

Compare this with the relative risk of \(0.55\). These are similar because the disease is rare.

Relationship between RR and OR

Suppose that the disease is rare so that \(P(D^-)\), \(P(D^-|R^+)\) and \(P(D^-|R^-)\) are close to \(1\).

Then \[ \begin{array}{rl} \displaystyle \mbox{OR} & \displaystyle = \frac{P(D^+|R^+)}{P(D^-|R^+)} \times \frac{P(D^-|R^-)}{P(D^+|R^-)} \\ & \\ & \displaystyle \approx \frac{P(D^+|R^+)}{1} \times \frac{1}{P(D^+|R^-)} \\ & \\ & \displaystyle = \frac{P(D^+|R^+)}{P(D^+|R^-)} \\ & \\ & \displaystyle = \mbox{RR} \end{array} \]

Relationship between RR and OR

If \(RR=1\) then \[ \begin{array}{rl} \frac{a(c+d)}{c(a+b)} = 1 & \implies a(c+d) = c(a+b) \\ & \implies ac+ad = ca+cb \\ & \implies ad = cb \\ & \implies \frac{ad}{cb} = 1 \\ & \implies OR = 1 \end{array} \]

If \(OR = 1\) then \[ \frac{ad}{cb} = 1 \implies ad = cb \]

and \[ RR = \frac{a(c+d)}{c(a+b)} = \frac{ac+ad}{ac+cb} = \frac{ac+ad}{ac+ad} = 1 \]

Hence, \(OR=1\) if and only if \(RR=1\).

Standard Errors and Confidence Intervals

The odds ratio estimator \(OR\) has a skewed distribution on \((0,\infty)\), with the neutral value being \(1\). The log odds estimator \(\log(OR)\) has a more symmetric distribution centred at \(0\) if there is no difference between the two groups.

Note: an odds ratio of \(a\in(0,1)\) is equivalent to a value of \(a^{-1}\in(1,\infty)\) just by relabelling the categories. The log transformation is such that \(\log(a^{-1}) = -\log(a)\).

Standard Errors and Confidence Intervals

A large sample \(95\%\) confidence interval for log \(\theta\) is approximately \[ \displaystyle \log(OR) \pm 1.96 \times \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} \]

from which we can approximate a confidence interval for the odds-ratio itself. Using

\[ \left( \exp\left( OR - 1.96 \sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} \right), \exp\left( OR + 1.96\sqrt{\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}} \right) \right) \]

Note that these should only be applied if \(a,b,c\) and \(d\) are reasonably large (so that asymptotics hold).

Example: Hodgkin's disease

In 1971, Vianna et al. collected data on a group of 101 patients suffering from Hodgkin's disease and a comparable control group of 107 non-Hodgkin's patients. They were interested in the effect of tonsil tissue as a barrier to Hodgkin's disease. They found that in the Hodgkin's disease group, there had been 67 tonsillectomies. The corresponding figure for the non-Hodgkin's patients was 43.

\[ \begin{array}{c|cc|c} & \mbox{Hodgkin's} & \mbox{disease} & \\ \mbox{Tonsillectomy} & \mbox{Yes} & \mbox{No} & \\ \hline \mbox{Yes} & 67 & 43 & 110\\ \mbox{No} & 34 & 64 & 98 \\ \hline & \bf{101} & \bf{107} & \\ \end{array} \]

Example: Hodgkin's disease

The data are from a case-control (retrospective) study.

We would like to know if tonsillectomy is related to Hodgkin's disease.

  • Pearson's \(\chi^2\) statistic is \(t = 14.3\) and the \(p\)-value is \(P(\chi_1^2 > 14.3) = 0.00016\).

  • So we reject the null hypothesis of independence and conclude that tonsillectomy is associated with Hodgkin's disease.

Note: This does not mean tonsillectomy causes Hodgkin's disease. Suppose, for example, that doctors gave tonsillectomies to the most seriously ill patients. The the association between tonsillectomies and Hodgkin's disease may be due to the fact that those with tonsillectomies were the most ill patients and hence the more likely to have a serious disease.

Example: Hodgkin's disease

The estimated odds-ratio and log odds-ratio are

\[ OR = \frac{67\times 64}{43 \times 34} = 2.93 \quad \mbox{and} \quad \log(OR) = \log(2.93) = 1.07 \]

Hence, tonsillectomy patients are three times as likely to have Hodgkin's disease. The standard error of the log odds-ratio is

\[ \sqrt{\frac{1}{67} + \frac{1}{43} + \frac{1}{34} + \frac{1}{64}} = 0.29. \]

A \(95\%\) confidence interval for the log odds-ratio is \[ 1.07 \pm 1.96 \times 0.29 \approx (0.51,1.63) \]

and a \(95\%\) confidence interval for the odds-ratio is \((e^{0.51},e^{1.63}) \approx (1.66,5.1)\).

Example: Aspirin

The table below is from New England Journal of Medicine, (1984) 318, 262–4. The data come from a 5 year (blind) study into the effect of taking aspirin every second day on the incidence of heart attacks.

\[ \begin{array}{l|cc|r} & \mbox{Myocardial} & \mbox{Infarction} & \\ & \mbox{Yes} (D^+) &\mbox{No} (D^-) & \\ \hline \mbox{Aspirin} (R^+) & 104 &10,933 &11,037 \\ \mbox{Placebo} (R^-) & 189 &10,845 &11,034 \\ \hline &293 &21,778 &22,071 \\ \end{array} \]

Example: Aspirin

\[ \begin{array}{l|cc|r} & \mbox{Myocardial} & \mbox{Infarction} & \\ & \mbox{Yes} (D^+) &\mbox{No} (D^-) & \\ \hline \mbox{Aspirin} (R^+) & 104 &10,933 &11,037 \\ \mbox{Placebo} (R^-) & 189 &10,845 &11,034 \\ \hline &293 &21,778 &22,071 \\ \end{array} \]

The odds-ratio, log odds-ratio and the standard error of the log odds-ratio are

\[OR = \frac{104 \times 10,845}{189\times 10,933} = 0.55 \mbox{ and } log(OR) = -0.6\]

\[ SE(log(OR)) = \sqrt{\frac{1}{104} + \frac{1}{189} + \frac{1}{10933} + \frac{1}{10845}} = 0.12. \]

A \(95\%\) confidence interval for the log odds-ratio is \[ -0.6 \pm 1.96 \times 0.12 \approx (-0.84,-0.36) \]

and a \(95\%\) confidence interval for the odds-ratio is \((e^{-0.84},e^{-0.36}) \approx (0.43,0.69)\).

Example: Asthma

A large group of infants with mild respiratory problems (but asthma free) is split into those with a family history of hay fever, and those without.

Random samples of 85 from the first group and 405 from the second group are selected for special study. Of these, the number diagnosed with asthma by the age of 12 are 25 and 70 respectively.

\[ \begin{array}{c|cc|c} & \mbox{Asthma} & \mbox{No Asthma} & \\ \hline \mbox{Hay fever} & 25 & 60 & \bf{85}\\ \mbox{No hay fever} & 70 & 335 & \bf{405} \\ \hline & 95 & 395 & \\ \end{array} \]

Does a family history of hay fever increase the risk of developing asthma?

Example: Asthma

\[ \begin{array}{c|cc|c} & \mbox{Asthma} & \mbox{No Asthma} & \\ \hline \mbox{Hay fever} & 25 & 60 & \bf{85}\\ \mbox{No hay fever} & 70 & 335 & \bf{405} \\ \hline & 95 & 395 & \\ \end{array} \]

The odds-ratio, log odds-ratio and the standard error of the log odds-ratio are

\[OR = \frac{25 \times 335}{60\times 70} = 1.99 \mbox{ and } log(OR) = 0.69\]

\[ SE(log(OR)) = \sqrt{\frac{1}{25} + \frac{1}{60} + \frac{1}{70} + \frac{1}{335}} = 0.27. \]

A \(95\%\) confidence interval for the log odds-ratio is \[ 0.69 \pm 1.96 \times 0.27 \approx (0.16,1.22) \]

and a \(95\%\) confidence interval for the odds-ratio is \((e^{0.16},e^{1.22}) \approx (1.17,3.4)\).

Summary


  • Relative risk is the ratio of the probability of "developing the disease if you have the risk factor" relative to the probability of "developing the disease if you do not have the risk factor".


  • Odds ratio is the probability of success relative to the probability of failure. The probability of "having the risk factor and the disease or not having the risk factor and no disease" relative to "having the risk factor and no disease or not having the risk factor and having disease"