1 Preparation for lab

Please watch the video on tests of independence and homogeneity which extends your knowledge of chi-square tests to two-by-two contingency tables (if you are feeling rusty with chi-square tests, ask in class and we’ll bring you up to speed). Also, please watch the video on odds ratios and relative risk.

2 Group work

The link to google docs for this week’s group work is here

2.1 Design thinking in data analytics

In small groups please discuss…

  • What features in experimental design and analysis are crucial for good hypothesis driven scientific practice?

  • Would an analysis framework using a design thinking process be compatible with good science?

  • When would a design thinking approach be appropriate or inappropriate for data analysis?

2.2 Design thinking for putting results in context

Using the Framingham Heart Study, choose one interesting question that you formulated in week 1 or that you have generated for your lab reports.

  • Perform a STEEPLE analysis on this question. That is brainstorm the Social, Technological, Economic, Environmental, Political, Legal and Ethical implications of the information you will find from your question.

Some examples of what you might consider are:

  • Next, identify a stake-holder that you feel would be most invested in each of the STEEPLE implications.

  • What would be the best media to communicate the question, its answer and its implication to the stake-holders?

3 Data exercises

3.1 Respiratory illness

This data contains the respiratory status of patients recruited for a randomised clinical multicenter trial. In each of two centres, eligible patients were randomly assigned to active treatment or placebo. During the treatment, the respiratory status (categorised poor or good) was determined at each of four, monthly visits. The trial recruited 111 participants (54 in the active group, 57 in the placebo group) and there were no missing data for either the responses or the covariates. The question of interest is to assess whether the treatment is effective and to estimate its effect.

  • Q1: Read in the data and summarise it into a contingency table.
respiratory <- read.delim("https://wimr-genomics.vip.sydney.edu.au/AMED3002/data/respiratory.txt", sep = "\t")

tab <- table(respiratory$treatment, respiratory$status)
tab
  • Q2: We would like to test if there is any evidence that receiving the treatment altered your chance of having a good respiratory status? Rephrase this question into a null and alternate hypothesis that is consistent with a chi-square test.

  • Q3: Perform a chi-square test

chisq.test(tab)
  • Q4: Check the assumptions for a chi-square test
test = chisq.test(tab)
test$expected >= 5
  • Q5: What is your conclusion for this test?

  • Q6: Interpret the relationship further by calculating a relative risk or odds ratio. Are both appropriate in this case?

OR <- (tab[1,1]*tab[2,2])/(tab[2,1]*tab[1,2])
OR

3.2 Random data (extension)

Lets perform a test of independence with randomly generated data.

  • E1: Create two vectors of random data.
## Set a seed so that results are reproducible.
set.seed(51773)

## Generate two random vectors of size n. We can do this with the "sample" function.
n = 50
AB = sample(c('A','B'), n, replace = TRUE)
CD = sample(c('C','D'), n, replace = TRUE)

head(AB)
head(CD)
  • E2: Create a contingency table to view the relationship between AB and CD
tabABCD = table(AB,CD)
tabABCD
  • E3: Test for independence between AB and CD
chisq.test(tabABCD)
  • E4: Check assumptions of test
chisq.test(tabABCD)$expected
  • E5: Conclusion. As the p-value is greater than 0.05 there is no evidence that AB and CD are not independent.

4 Lab report

Only this section needs to be included in your Module 1 Lab Report to be handed in by 23:59 on Friday 10th March.

While I expect that you should explore the Framingham data, I am not opposed to you submitting a report on a dataset that you are incredibly engaged with.

Report guidelines

There are no hard and fast guidelines to the final content of your submitted lab reports. For this lab you will be assessed on your ability to generate statistical questions, explore these with graphical summaries and interpret your findings. Your report will also need to be well-presented:

  • Think about how your report might be structured (eg. Summary, Introduction, Results, Conclusion).
  • Do your figures have captions?
  • Are your figures legible?
  • Are you using sub-headings effectively?

It is expected that your report will construct and communicate an interesting story in 4 - 6 paragraphs (ish). To do this, you should be a ‘bad’ scientist and explore the data until you find something that you think is interesting, or, can use to address the marking criteria. When preparing your report always think “is your report something that you would be proud to show your friends?”, “would your family be interested in the conclusions you made?” and “would they find it easy to read?”

Marking criteria

  • 1 mark - Visualisation – At least 2 plots.
  • 0.5 mark - Communication – Context around results.
  • 0.5 mark - Communication – Insightful context around results.
  • 1 mark - Presentation – No extra R output, use of headings.
  • 1 mark - Innovation – Try at least one of captions, table of contents, embedded numbers in text, a plot that isn’t a bar-plot, boxplot or histogram etc.

Lab instructions

4.1 Week 3

Continue with the framingham data from week 1. If you are comfortable, feel free to:

  • Formulate a question that looks for an association between two categorical variables.
    • Formally test your hypothesis with chi-square tests.
    • Communicate the association with an odds-ratio or relative risk
  • In the Framingham data, every subject is observed three times. Think about how you could use this structure to construct pseudo retrospective and prospective studies. Discuss this on the discussion board if needed.

4.2 Rmarkdown report

Provide your data analytics code as well as a summary of your findings in a reproducible report (e.g. Rmarkdown report).