Statistical Analysis 5.2: The testing procedure

The testing procedure
The testing procedure summarized
Example
- What is our conclusion?
- Calculate P-value in R directly
Calculating in R

The testing procedure

Overview

This lesson shows the basic procedure of the one-sample hypothesis test.

Objectives

After completing this module, students should be able to:

Formulate a null and research hypothesis.
Select and explain the test type, p-value threshold, and number of tails.
Calculate a test statistic, p value, and 95% CI.
Determine whether to reject or fail to reject the null based on 1-3.

Readings

Lander, Chapter 15.3.1. Schumacker, Ch. 10

The testing procedure summarized

Here is the basic procedure:

Formulate null hypothesis and research hypothesis
Select test type (one- or two-tailed) and critical p-value
Gather data and calculate test statistic
Reject the null or not, and write up results: “we reject the null hypothesis in favor of the research hypothesis” or “we cannot reject the null hypothesis”

1. Hypotheses

Let’s continue with the SEI scores example. Let’s assume that in 2016, we have the following data: \(\mu = 47\) and \(sd=23\) and \(n=100\). From a survey in a previous year, we know that \(\mu=42\). We want to test if the mean SEI in 2016 is significantly higher than the one previously.

Research hypothesis \(H_{a}: \mu > 42\). (One-tailed.)

Research hypothesis \(H_{a}: \mu \neq 42\). (Two-tailed.)

Null hypothesis \(H_{0}: \mu = 42\)

Note that these are still just examples and observe that the Null Hypothesis doesn’t have to be \(0\).

2. Choose the test type and critical p-value

Do we apply a one-tailed or two-tailed test?

Our research hypothesis suggests one-tailed.
But, usually a test should be two-tailed because that is more conservative, which means there are two rejection regions and two rejection thresholds, a high and a low.
The p-value threshold, denoted as \(\alpha\) or alpha, associated with two-tailed testing is 0.05. If you choose 0.05 and a two-tailed test, remember that each region is now 0.025% of the total.

3-4 Data, test statistic, and rejecting the null

Our data can usually summarized with three numbers: \(\bar{x}\), \(s\), and \(n\).

Three equivalent ways to conduct our test:

Construct a 95% CI around the sample mean as we saw in the previous Module (using the t distribution). If your p-value threshold (\(\alpha\)) is other than 0.05, you just construct the appropriate (1-p) confidence interval.
Calculate the test statistic and see whether it falls into the rejection region. The test statistic is as usual:

\[\textrm{Test statistic } = \frac{\bar{x} - \mu_{0}}{se}\]

The test statistic shows how many standard errors the mean is from the null hypothesis. For p-value threshold of 0.05 and a two-tailed test, the rejection region is any test statistic larger than qt(.975,99) or less than qt(.025,99) (assuming your \(n\) is 100 in this example).
For a one-tailed test and p-value is 0.01, then the rejection region would be any test statistic larger than qt(.99,99) if your research hypothesis was greater than the null, and any value less than qt(.01,99) if the hypothesis was less than the null.

Calculate the test statistic, and from that, calculate the precise p-value for your data: that is, the chance of getting something as extreme as that or more.

Which approach to choose?

The second is the most standard. Generally one does a two-tailed test, with an \(\alpha\) of 0.05, and then constructs the t statistic from the data and rejects the null if that statistic is greater than (or lesser than) the critical value (ie, whether it’s in the rejection region).

Example

Research hypothesis, \(H_{a}: \mu > 42\) (One-tailed) Null hypothesis, \(H_{0}: \mu = 42\).
Test type: The research hypothesis above would suggest a one-tailed test, but it is nevertheless better practice in most cases to go with the more stringent two-tailed test (which is technically: \(H_{a}: \mu \neq 42\)). Associated critical p-value: 0.025 in each tail.
Data: \(\bar{x} = 47\), \(s = 22\), \(n = 100\).

\[\textrm{Test statistic } = \frac{\bar{x} - \mu_{0}}{se}\] What is our test statistic?

Show me how to calculate the standard error

What is our critical value (rejection region)?

Show me how to get the critical region

What is our conclusion?

Is our sample mean in the rejection region so that we can reject the null hypothesis in favor of the research hypothesis?

Note that if it passes the two-tailed test, it would have passed the one-tailed test as well.

Calculate P-value in R directly

We could have directly calculated our p-value, which would be the area in the right tail greater than 4 plus the symmetrical area in the left tail, or

2*(1-pt(2.27,99))

[1] 0.02537549

(Make sure you understand this calculation!) This is obviously less than 0.05, so again we would reject the null.

Calculating in R

Calculating in R with the GSS data:

From http://gss.norc.org/About-The-GSS: “For more than four decades, the General Social Survey (GSS) has studied the growing complexity of American society. It is the only full-probability, personal-interview survey designed to monitor changes in both social characteristics and attitudes currently being conducted in the United States.”

setwd("/Users/econphd/Dropbox/PPUA5301 Compstats Material - Share with Carlos/GSS") 
gss <- readRDS("GSS2016.rds")
gss$SEI10[gss$SEI10==-1] <- NA # See GSS codebook: -1 was used for "Not applicable" 
t.test(gss$SEI10,alternative="two.sided",mu=42)


    One Sample t-test

data:  gss$SEI10
t = 10.412, df = 2747, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 42
95 percent confidence interval:
 45.64076 47.33020
sample estimates:
mean of x 
 46.48548

R calculates the t statistic, degrees of freedom, and the p-value, although it leaves it to you to interpret this result.

What if I really want to do a one-sided test instead