University of Sydney

Does smoking cause cancer?

According to the Australian Cancer Council
"Tobacco smoking is the largest preventable cause of cancer, responsible for more cancer deaths in Australia than any other single factor. It is also directly responsible for many heart and lung diseases".


However, in a hearing to the Australian High Court in 2012 disputing the introduction of cigarette plain packaging with health warnings, while British American Tobacco was prepared to accept that there are serious health consequences caused by smoking, Imperial Tobacco responded "some people say that …"

SMH

The Need for Observational Studies

  • By necessity, many research questions require an observational study, rather than a controlled experiment.
    • For example, with a study on the effects of smoking, investigators cannot choose which subjects will be in the Treatment Group (smoking). Rather, they must observe medical results for the 2 groups.
    • Similarly, most educational research is based on observational studies.
  • The conclusions of observational studies require great care.

Precautions

Observational studies can't establish causation.

  • While a good randomized controlled experiment can establish causation, an observational study can only establish association.
    • It may suggest causation, but it can't prove causation.
  • Comparing smokers and non-smokers, there is a higher rate of liver cancer among the smokers.
    • So if you smoke, you are more likely to get liver cancer, but this does not imply that smoking causes liver cancer!

Statistical Thinking

What could explain the fact that smokers have a higher rate of liver cancer?

Observational Studies can have misleading hidden confounders.

  • Confounding occurs when the Treatment Group and Control Group differ by some third variable (other than the treatment) which influences the response that is studied.

  • Confounders can be hard to find, and can mislead about a cause and effect relationship.

  • Confounding (or lurking) variables can be introduced into a randomised study if any of the subjects drop out, causing selection bias or survivor bias. Similarly, if not all subjects keep taking the treatment or placebo, we get the confounding of adherers and non-adherers.

Examples

Statistical Thinking

A study finds that having yellow fingertips is associated with lung cancer. Does having yellow fingertips cause lung cancer?

A study finds that smokers tend to have higher rates of lung cancer. Does smoking cause lung cancer?

Strategy for dealing with confounders

  • Sometimes we can make the groups more comparable by dividing them into subgroups with respect to the confounder.

  • For example, if alcohol consumption is a potential confounding factor for smoking's affect on liver cancer, we can divide our subjects into 3 groups:
    • heavy drinkers
    • medium drinkers
    • light drinkers.
  • This is called controlling for alcohol consumption.

Controlling for confounding

We can control for confound by making 3 separate comparisons:

  • heavy drinking: smokers vs non-smokers
  • medium drinking: smokers vs non-smokers
  • light drinking: smokers vs non-smokers

Statistical Thinking

What are the limitations of this strategy?

Observational studies with a confounding variable can lead to Simpson's Paradox

  • Simpson's Paradox (or the reversing paradox) was first mentioned by British statistician Udny Yule in 1903.
  • It was named after Edward H. Simpson.
  • Sometimes there is a clear trend in individual groups of data that disappears when the groups are pooled together.
  • It occurs when relationships between percentages in subgroups are reversed when the subgroups are combined, because of a confounding or lurking variable.
  • The association between a pair of variables (X,Y) reverses sign upon conditioning of a third variable Z, regardless of the value taken by Z.

Simpson's Paradox and Smoking

  • A famous study shows Simpson's Paradox in analysing the effect of smoking on mortality rates in women.
  • The data came 2 studies:
    • inital data from a 1 in 6 survey from an electoral roll in a mixed urban and rural area near Newcastle upon Tyne UK.
    • follow-up data 20 years later.
  • The study concentrated on the 1314 women who were either smokers or non-smokers (in the full data, only 162 had stopped smoking and only 18 did not record their status).

Initial Results

Status Died Survived Total Mortality Rate
Smoker 139 443 582 23.9%
Non-smoker 230 502 732 31.4%
Total 369 945 1314 28.1%


Now examine by age group.

Age Group Smokers Died Smokers Survived Non-Smokers Died Non-Smokers Survived
18-24 2 53 1 61
25-34 3 121 5 152
35-44 14 95 7 114
45-54 27 103 12 66
55-64 51 64 40 81
65-74 29 7 101 28
75+ 13 0 64 0