This lesson introduces the joint hypothesis tests, ANOVA and the F-Test
After completing this module, students should be able to:
Previous Learning Module
This Week
Why can’t we just use the t-test?
Recall that for the t-test, we can take the difference between the sample mean of two groups and use it as the null hypothesis value. The key is that we are able to rewrite the hypothesis as a new single variable (the mean difference), that allow us to compute a single t-statistic for the test. But now that we have more than two groups, we are not going to be able to rewrite the null as a function of a single variable, e.g. if there are three groups, we need to test at least two mean difference.
How do we test for differences in means across multiple groups?
Assume that we have observations from \(g\) different groups, and we want to test if the population mean of each group are statistically different. Formally, we can formulate the following hypothesis:
\[\begin{eqnarray*} H_0 & : & \mu_1 = \mu_2 = ... = \mu_g \\ H_a & : & \text{At least one is different.} \end{eqnarray*}\]where \(\mu_i\) is the mean of group \(i\), \(\forall i = 1,2,...,g\)
Imagine that we collect data about a variable \(X\) for different groups (\(g\)), like in the image below.
Each group sample have individual mean and standard deviation.
The mean of a group (\(\mu_i\)) may be statistically different from another depending on the spread of each distribution (\(\sigma_i\)).
-The greater the mean difference among groups, the greater the probability that the samples are drawn from different population distributions. -But the more spread out each group is, the less different they are likely to be. In other words, the higher the standard deviation - the more disperse is the sample data - the harder it will be for us to statistically proof that the means are different (it will be harder to reject the null hypothesis).
Therefore, we need a ratio as a measure of the total probability that the group means are different:
\[\text{F-statistic} = \dfrac{\text{average variance between groups}}{\text{average variance within groups}}\]
The greater the numerator, the more different the means of the distributions are; but the higher the denominator, the less significant that difference is. This ratio, the F-Statistic increases when the sample average of the different groups are widely different; but decreases with the dispersion of the data.
Between-Group Variance is an overall measure of how different the means of all the groups are from each other. Formally, for \(g\) groups:
\[\text{Between Variance} = \dfrac{n_1 (\bar{y_1}- \bar{y})^2 + n_2 (\bar{y_2}- \bar{y})^2 + ... + n_g (\bar{y_g}- \bar{y})^2 }{g-1}\]
Where \(g\) is the number of groups, \(n_i\) is the sample size of group \(i\), \(\bar{y_i}\) is the sample mean of group \(i\), and \(\bar{y}\) is the overall mean — the mean that result from combining the observations of all samples.
The between variance can be calculated in five simple steps:
Within-Group Variance is an overall measure of the dispersion of different samples. Formally, for \(g\) groups:
\[\text{Within Variance} = \dfrac{(n_1-1)s_1^2 +(n_2-1)s_2^2 + ... + (n_g-1)s_g^2 }{N-g}\]
Where \(g\) is the number of groups, \(n_i\) is the sample size of group \(i\), \(s_i\) is the standard deviation of group \(i\), and \(N\) is the overall sample size.
The within variance can be calculated in just four steps:
Now that we know how to compute both the between- and within- group variance, we can compute the F-statistic using the following formula:
\[\begin{eqnarray*} \text{F-statistic} & = & \dfrac{\text{average variance between groups}}{\text{average variance within groups}} \\ & = & \dfrac{\dfrac{n_1 (\bar{y_1}- \bar{y})^2 + n_2 (\bar{y_2}- \bar{y})^2 + ... + n_g (\bar{y_g}- \bar{y})^2 }{g-1}}{\dfrac{(n_1-1)s_1^2 +(n_2-1)s_2^2 + ... + (n_g-1)s_g^2 }{N-g}} \\ & = & \left(\dfrac{N-g}{g-1} \right)\left(\dfrac{n_1 (\bar{y_1}- \bar{y})^2 + n_2 (\bar{y_2}- \bar{y})^2 + ... + n_g (\bar{y_g}- \bar{y})^2}{(n_1-1)s_1^2 +(n_2-1)s_2^2 + ... + (n_g-1)s_g^2} \right) \\ & = & \left(\dfrac{N-g}{g-1} \right)\left(\dfrac{\Sigma_i^g n_i (\bar{y_i}- \bar{y})^2}{\Sigma_i^g(n_i-1)s_i^2} \right) \end{eqnarray*}\]Note that there are two degrees of freedom expressions in the F statistic: - Degrees of freedom of the numerator: \(g-1\) - Degrees of freedom of the denominator: \(N-g\)
When computing the F-statistic threshold in R
you need to specify each degree of freedom separately.
Consider that we conduct a survey and ask individuals to provide their Party ID and to rank themselves on a 1-7 ideology scale (1 means far left, and 7 means far right).
Then we proposed the following Research Question: On an ideology scale (Liberal vs Conservative), does the average ideology differ for Party ID (Democrat, Independent, or, Republican)?
The table below shows the result from the survey:
Are the means of each party group (Democrats, Independents, and Republicans) different?
To answer this question we cannot rely on a simple t-test. This is a job for the F-statistic.
Recall that F-statistic is equal to:
\[\text{F-statistic} = \dfrac{\text{average variance between groups}}{\text{average variance within groups}}\]
How do we know if 26.1 is large enough to reject the null hypothesis?
Last week, we learned that the t distribution has one parameter (besides the mean and se) that affects its shape, the degree of freedom, which determines how much of the mass of the distribution is in the tails vs in the middle.
Degrees of freedom in the F distribution:
Instead of one, there are now two shape parameters, reflecting two different degrees of freedom:
Recall that the degrees of freedom of the denominator, call it \(df_1\), is equal to \(g-1\) and the degrees of freedom of the numerator, call it \(df_2\) is equal to \(N-g\). \(df_1\) and \(df_2\) together determine the shape of the F distribution, and thus whether the F test statistic is large enough to reject the null. In the figure above you can see different representations of the shape of the F distribution as a function of the two degrees of freedom.
As with the t distribution, we test whether the null hypothesis is true (i.e. whether observed variation between groups is simply due to chance).
The F distribution defines the probability density function of the ANOVA.
We want to know if the calculated F statistic is sufficiently unlikely to have been drawn by chance. If so, we reject the null, that all the means are the same, in favor of the alternative, that at least one is different from the rest.
Unlike the t, the F is always non-negative – we are adding non-negative numbers (squared numbers). Thus the F test is always one-tailed.
To reject the null we need a large F statistic: recall that the formula of the F statistic implies that the more mean difference the larger the F statistic.
A big F-stat reflects more different means, whereas the smallest possible value, 0, reflects means that are all identical (the null).
Then, we’ll reject the null hypothesis if:
\[F_\text{calculated} > F_\text{threshold}\]
Returning to our example, the two degrees of freedom are: - \(df_1 = g -1 = 3 -1 - 2\) - \(df_2 = N- g = 276 - 3 =273\) And the calculated F statistic \(F_\text{calculated} = 26.1\)
In R
, if the significance level is \(\alpha = 0.05\), to reject the null we need an F-statistic such that the area of the distribution is greater than the threshold.
#Parameters
N <- 276 # Total number of observations
g <- 3 #Three groups, DEM, REP, IND
alpha <- 0.05 #Significance Level
# Degrees of freedom
df1 <- g - 1 #Degrees of freedom of numerator
df2 <- N - g #Degrees of freedom of denominator
# F threshold
fVal <- qf(1 - alpha, df1, df2) #F value
fVal
## [1] 3.028847
The F threshold, 3.028847
, is less than the calculated F, 26.1. This means that we can reject the null that the means of all three groups are the same, in favor the the alternative hypothesis that at least one of them is different. Substantively, the ideology of the different Party ID is different.
Again as with the t test, we can also calculate directly the p-value for the score we got (26.1), which is the probability of getting something that large or larger assuming the null is true.
1 - pf(26.1, df1, df2)
## [1] 4.242806e-11
This is clearly much lower than the significance level (\(\alpha = 0.05\)), so once again we can reject the null.
Research Question: We are interested in “alertness” measurements for three different groups, each of which received a different dosage of some drug. Are there any differences in alertness levels across any of these three groups who received different dosages (coded as a, b, and c).
For that we are going to use data from the personality-project website.
datafilename="http://personality-project.org/r/datasets/R.appendix1.data"
data.ex1=read.table(datafilename,header=T)
head(data.ex1)
## Dosage Alertness
## 1 a 30
## 2 a 38
## 3 a 35
## 4 a 41
## 5 a 27
## 6 a 24
The way to conduct the F-test in R is using the aov
command aov
stands for Analysis of Variance or ANOVA for short. Using the data from the personality project dataset we can simply run the following commands in R
:
aov.ex1 <- aov(Alertness ~ Dosage, data = data.ex1)
summary(aov.ex1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Dosage 2 426.2 213.12 8.789 0.00298 **
## Residuals 15 363.8 24.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The test gives us:
Degrees of freedom (\(df_1 =2\), there are three groups) and \(df_2 = 15\) (there are 18 observations).
The between variance (213.12) and the within variance (24.25).
The F statistic (between variance/within variance) = 8.789.
p-value, which equals 1-pf(8.788,2,15) = 0.00298
.
Then, at \(\alpha = 0.05\) we reject the null hypothesis that the average attitude towards the future is equal among groups.
If you want to see the mean differences among groups you make a box-plot using ggplot
:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
ggplot(data = data.ex1, aes(y = Alertness, fill = Dosage, x = Dosage)) + geom_boxplot()