University of Sydney
Soft drink contents in a pack of six cans (in milliliters) are:
Is the soft drink content less than the 375 mL claimed on the label?
  \(H_0 : \theta = \theta_0\)
    VS
  \(H_1 : \theta > \theta_0\) (upper-side alternative)
  \(H_1 : \theta < \theta_0\) (lower-side alternative)
  \(H_1 : \theta \ne \theta_0\) (two-sided alternative)
  \(p\)-value \(= P(T \ge t_0 )\); for \(H_1: \theta > \theta_0\)
  \(p\)-value \(= P(T \le -t_0 )\); for \(H_1: \theta < \theta_0\)
  \(p\)-value \(= P(T \ge t_0 )\); for \(H_1: \theta \ne \theta_0\)
## TableGrob (1 x 3) "arrange": 3 grobs ## z cells name grob ## 1 1 (1-1,1-1) arrange gtable[layout] ## 2 2 (1-1,2-2) arrange gtable[layout] ## 3 3 (1-1,3-3) arrange gtable[layout]
An observed large positive or negative value of \(t_0\) and hence small \(p\)-value is taken as evidence of poor agreement with \(H_0\).
If the p-value is small, then either \(H_0\) is true and the poor agreement is due to an unlikely event, or \(H_0\) is false. Therefore..
A large \(p\)-value does not mean that there is evidence that \(H_0\) is true
The level of significance, \(\alpha\), is the strength of evidence needed to reject \(H_0\) (often \(\alpha = 0.05\)).
Suppose we have a sample \(X_1, X_2, ..., X_n\) of the size \(n\) drawn from a normal population with an unknown variance \(\sigma^2\). Let \(x_1, x_2, ..., x_n\) be the observed values. We want to test the population mean \(\mu\).
Suppose we have a sample \(X_1, X_2, ..., X_n\) of the size \(n\) drawn from a normal population with an unknown variance \(\sigma^2\). Let \(x_1, x_2, ..., x_n\) be the observed values. We want to test the population mean \(\mu\).
Soft drink contents in a pack of six cans (in milliliters) are:
Is the soft drink content less than the 375 mL claimed on the label?
x = c(374.8, 375, 375.3, 374.8, 374.4, 374.9) mean(x)
## [1] 374.8667
sd(x)
## [1] 0.294392
t.test(x, mu = 375, alternative = "less")
## ## One Sample t-test ## ## data: x ## t = -1.1094, df = 5, p-value = 0.1589 ## alternative hypothesis: true mean is less than 375 ## 95 percent confidence interval: ## -Inf 375.1088 ## sample estimates: ## mean of x ## 374.8667
There are times that we want to test if the population means of two samples are different.
Here we are left with two possible scenarios
Discuss and write down a few examples of situations were the two samples would be independent or dependent.
Blood samples are taken from \(11\) smokers and \(11\) non-smokers to measure aggregation of blood platelets.
Non.Smokers = c(25, 25, 27, 44, 30, 67, 53, 53, 52, 60, 28) Smokers = c(27, 29, 37, 36, 46, 82, 57, 80, 61, 59, 43) c(mean(Smokers), mean(Non.Smokers), sd(Smokers), sd(Non.Smokers))
## [1] 50.63636 42.18182 18.89589 15.61293
Is the aggregation affected by smoking?
library(ggplot2) Blood.platelet.levels = c(Non.Smokers, Smokers) Group = rep(c("Non-smokers", "Smokers"), c(length(Non.Smokers), length(Smokers))) qplot(Group, Blood.platelet.levels, geom = "boxplot")
Hypotheses: \(H_0: \mu_x=\mu_y\) vs \(H_1: \mu_x>\mu_y, \ \mu_x<\mu_y\) or \(\mu_x\not =\mu_y\)
Test statistic: \(\tau_0 = \frac{{\bar x} - {\bar y}}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\) where \(s^2_p = \frac{(n_x-1) s_{x}^2 + (n_y-1) s_{y}^2}{n_x+n_y-2}\).
Assumptions: \(X_1,...,X_{n_x}\) are iid \(N(\mu_X,\sigma^2)\), \(Y_1,...,Y_{n_y}\) are iid \(N(\mu_Y,\sigma^2)\) and \(X_i's\) are independent of \(Y_i's\). Hence \(\tau_0 \sim t_{n_x+n_y-2}\)
\(P\)-value: \(P(t_{n_x+n_y-2}\le \tau_{0})\), \(P(t_{n_x+n_y-2}\ge \tau_{0})\) or \(2P(t_{n_x+n_y-2} \ge |\tau_{0}|)\).
Decision: If \(p\mbox{-value}<\alpha\), there is evidence against \(H_0\). If \(p\mbox{-value}> \alpha\), the data are consistent with \(H_0\).
t.test(Smokers, Non.Smokers, alternative = "two.sided")
## ## Welch Two Sample t-test ## ## data: Smokers and Non.Smokers ## t = 1.144, df = 19.313, p-value = 0.2666 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -6.997031 23.906122 ## sample estimates: ## mean of x mean of y ## 50.63636 42.18182
Blood samples from \(11\) individuals before and after they smoked a cigarette are used to measure aggregation of blood platelets.
Before = c(25, 25, 27, 44, 30, 67, 53, 53, 52, 60, 28) After = c(27, 29, 37, 36, 46, 82, 57, 80, 61, 59, 43)
Is the aggregation affected by smoking?
df = data.frame(Difference = After - Before, group = 1) p1 = ggplot(df, aes(group, Difference)) + geom_boxplot() + theme_bw() + geom_hline(yintercept = 0, linetype = "dashed") + ylab("Difference in blood platelet levels") + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank()) p1
t.test(Before - After)
## ## One Sample t-test ## ## data: Before - After ## t = -2.9065, df = 10, p-value = 0.01566 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## -14.93577 -1.97332 ## sample estimates: ## mean of x ## -8.454545
t.test(Before, After, paired = TRUE)
## ## Paired t-test ## ## data: Before and After ## t = -2.9065, df = 10, p-value = 0.01566 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -14.93577 -1.97332 ## sample estimates: ## mean of the differences ## -8.454545