StatAnalytica

Hypothesis Testing: A Complete Guide for Beginners

Hypothesis Testing

In this blog, we’ll explain statistical hypothesis testing from the basics to more advanced ideas, making it easy to understand even for 10th-grade students.

By the end of this blog, you’ll be able to understand hypothesis testing and how it’s used in research.

What is a Hypothesis?

Table of Contents

A hypothesis is a statement that can be tested. It’s like a guess you make after observing something, and you want to see if that guess holds when you collect more data.

For example:

  • “Eating more vegetables improves health.”
  • “Students who study regularly perform better in exams.”

These statements are testable because we can gather data to check if they are true or false.

What is Hypothesis Testing?

Hypothesis testing is a statistical process that helps us make decisions based on data. Suppose you collect data from an experiment or survey. Hypothesis testing helps you decide whether the results are significant or could have happened by chance.

For example, if you believe a new teaching method helps students score better, hypothesis testing can help you decide if the improvement is real or just a random fluctuation.

Null and Alternative Hypothesis

Hypothesis testing usually involves two competing hypotheses:

  • Example: “There is no difference in exam scores between students using the new method and those who don’t.”
  • Example: “Students using the new method perform better in exams than those who don’t.”

Key Terms in Hypothesis Testing

Before diving into the details, let’s understand some important terms used in hypothesis testing:

1. Test Statistic

The test statistic is a number calculated from your data that is compared against a known distribution (like the normal distribution) to test the null hypothesis. It tells you how much your sample data differs from what’s expected under the null hypothesis.

The p-value is the probability of observing the sample data or something more extreme, assuming the null hypothesis is true. A smaller p-value suggests that the null hypothesis is less likely to be true. In many studies, a p-value of 0.05 or less is considered statistically significant.

3. Significance Level (α)

The significance level is the threshold at which you decide to reject the null hypothesis. Commonly, this level is set at 5% (α = 0.05), meaning there’s a 5% chance of rejecting the null hypothesis even when it is true.

4. Critical Value

The critical value is the boundary that defines the region where we reject the null hypothesis. It is calculated based on the significance level and tells us how extreme the test statistic needs to be to reject the null hypothesis.

5. Type I and Type II Errors

  • Type I Error (False Positive): Rejecting the null hypothesis when it’s true.
  • Type II Error (False Negative): Failing to reject the null hypothesis when it’s false.

In simpler terms:

  • Type I error is like thinking something has changed when it hasn’t.
  • Type II error is like thinking nothing has changed when it actually has.

Types of Hypothesis Testing

1. one-tailed test.

A one-tailed test checks for an effect in a single direction. For example, if you are only interested in testing whether students who study 2 hours daily score higher than those who don’t, that’s a one-tailed test.

2. Two-Tailed Test

A two-tailed test checks for an effect in both directions. This means you’re testing if the scores are different , regardless of whether they are higher or lower. For example, “Do students who study 2 hours daily score differently than those who don’t?” That’s a two-tailed test.

Steps in Hypothesis Testing

Step 1: define hypotheses.

Start by defining the:

  • Null Hypothesis (H₀): The status quo or no change.
  • Alternative Hypothesis (H₁): The hypothesis you believe in, suggesting that something has changed.

Step 2: Set the Significance Level (α)

Next, set the significance level, typically 0.05 . This means you’re willing to accept a 5% risk of incorrectly rejecting the null hypothesis.

Step 3: Collect and Analyze Data

Conduct your experiment or survey and collect data. Then, analyze this data to calculate the test statistic. The formula you use depends on the type of test you’re conducting (e.g., Z-test, T-test).

Step 4: Calculate the P-value or Critical Value

Compare the test statistic to a standard distribution (such as the normal distribution). If you calculate a p-value , compare it to the significance level. If the p-value is less than the significance level, reject the null hypothesis.

Alternatively, you can compare your test statistic to a critical value from statistical tables to determine if you should reject the null hypothesis.

Step 5: Make a Decision

Based on your calculations:

  • If the p-value is less than the significance level (e.g., p < 0.05), reject the null hypothesis.
  • If the p-value is greater than the significance level, do not reject the null hypothesis.

Step 6: Interpret the Results

Finally, interpret the results in context. If you reject the null hypothesis, you have evidence to support the alternative hypothesis. If not, the data does not provide enough evidence to reject the null.

P-Value and Significance

The p-value is a key part of hypothesis testing. It tells us the likelihood of getting results as extreme as the observed data, assuming the null hypothesis is true. In simple terms:

  • A low p-value (≤ 0.05) suggests strong evidence against the null hypothesis, so you reject it.
  • A high p-value (> 0.05) means the data is consistent with the null hypothesis, and you don’t reject it.

Here’s a table to summarize:

Common Hypothesis Tests

There are different types of hypothesis tests depending on the data and what you are testing for.

Example of Hypothesis Testing

Let’s say a nutritionist claims that a new diet increases the average weight loss for people by 5 kg in a month.

  • Null Hypothesis (H₀): The average weight loss is not 5 kg (no difference).
  • Alternative Hypothesis (H₁): The average weight loss is greater than 5 kg.

Suppose we collect data from 30 people and find that the average weight loss is 5.5 kg. Now we follow these steps:

  • Significance level : Set α = 0.05 (5%).
  • Calculate the test statistic: Using the T-test formula.
  • Find the p-value : Calculate the p-value for the test statistic.
  • Make a decision : Compare the p-value to the significance level.

If the p-value is less than 0.05, we reject the null hypothesis and conclude that the new diet results in more than 5 kg of weight loss.

Statistical hypothesis testing is an essential method in statistics for making informed decisions based on data. By understanding the basics of null and alternative hypotheses, test statistics, p-values, and the steps in hypothesis testing, you can analyze experiments and surveys effectively.

Hypothesis testing is a powerful tool for everything from scientific research to everyday decisions, and mastering it can lead to better data analysis and decision-making.

Also Read: Step-by-step guide to hypothesis testing in statistics

What is the difference between the null hypothesis and the alternative hypothesis?

The null hypothesis (H₀) is the default assumption that there is no effect or no difference. It’s what we try to disprove. The alternative hypothesis (H₁) is what you want to prove. It suggests that there is a significant effect or difference.

What is the difference between a one-tailed test and a two-tailed test?

A one-tailed test looks for evidence of an effect in one direction (either greater or smaller). A two-tailed test checks for evidence of an effect in both directions (whether greater or smaller), making it a more conservative test.

Can we always reject the null hypothesis if the p-value is less than 0.05?

Yes, if the p-value is less than 0.05 , we typically reject the null hypothesis. However, this does not guarantee that the alternative hypothesis is true; it simply indicates that the data provide strong evidence against it.

Related Posts

best way to finance car

Step by Step Guide on The Best Way to Finance Car

how to get fund for business

The Best Way on How to Get Fund For Business to Grow it Efficiently

Leave a comment cancel reply.

Your email address will not be published. Required fields are marked *

  • Python for Data Science
  • Data Analysis
  • Machine Learning
  • Deep Learning
  • Deep Learning Interview Questions
  • ML Projects
  • ML Interview Questions

Understanding Hypothesis Testing

Hypothesis testing is a fundamental statistical method employed in various fields, including data science , machine learning , and statistics , to make informed decisions based on empirical evidence. It involves formulating assumptions about population parameters using sample statistics and rigorously evaluating these assumptions against collected data. At its core, hypothesis testing is a systematic approach that allows researchers to assess the validity of a statistical claim about an unknown population parameter. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

Table of Content

What is Hypothesis Testing?

Why do we use hypothesis testing, one-tailed and two-tailed test, what are type 1 and type 2 errors in hypothesis testing, how does hypothesis testing work, real life examples of hypothesis testing, limitations of hypothesis testing.

A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

To test the validity of the claim or assumption about the population parameter:

  • A sample is drawn from the population and analyzed.
  • The results of the analysis are used to decide whether the claim is true or not.
Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

This structured approach to hypothesis testing in data science , hypothesis testing in machine learning , and hypothesis testing in statistics is crucial for making informed decisions based on data.

  • By employing hypothesis testing in data analytics and other fields, practitioners can rigorously evaluate their assumptions and derive meaningful insights from their analyses.
  • Understanding hypothesis generation and testing is also essential for effectively implementing statistical hypothesis testing in various applications.

Defining Hypotheses

  • Null hypothesis (H 0 ): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example : A company’s mean production is 50 units/per da H 0 : [Tex]\mu [/Tex] = 50.
  • Alternative hypothesis (H 1 ): The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis.  Example: A company’s production is not equal to 50 units/per day i.e. H 1 : [Tex]\mu [/Tex] [Tex]\ne [/Tex] 50.

Key Terms of Hypothesis Testing

  • Level of significance : It refers to the degree of significance in which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting a hypothesis, so we, therefore, select a level of significance that is usually 5%. This is normally denoted with  [Tex]\alpha[/Tex] and generally, it is 0.05 or 5%, which means your output should be 95% confident to give a similar kind of result in each sample.
  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.

Understanding hypothesis testing in statistics is essential for data scientists and machine learning practitioners, as it provides a structured framework for statistical hypothesis generation and testing. This methodology can also be applied in hypothesis testing in Python , enabling data analysts to perform robust statistical analyses efficiently. By employing techniques such as multiple hypothesis testing in machine learning , researchers can ensure more reliable results and avoid potential pitfalls associated with drawing conclusions from statistical tests. 

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

  • Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true parameter value is less than the null hypothesis. Example: H 0 ​: [Tex]\mu \geq 50 [/Tex] and H 1 : [Tex]\mu < 50 [/Tex]
  • Right-Tailed (Right-Sided) Test : The alternative hypothesis asserts that the true parameter value is greater than the null hypothesis. Example: H 0 : [Tex]\mu \leq50 [/Tex] and H 1 : [Tex]\mu > 50 [/Tex]

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]

To delve deeper into differences into both types of test: Refer to link

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

  • Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha( [Tex]\alpha [/Tex] ).
  • Type II errors : When we accept the null hypothesis, but it is false. Type II errors are denoted by beta( [Tex]\beta [/Tex] ).

Step 1: Define Null and Alternative Hypothesis

State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ​), suggesting an effect or difference.

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

  • If the p-value is less than or equal to the significance level i.e. ( [Tex]p\leq\alpha [/Tex] ), you reject the null hypothesis. This indicates that the observed results are unlikely to have occurred by chance alone, providing evidence in favor of the alternative hypothesis.
  • If the p-value is greater than the significance level i.e. ( [Tex]p\geq \alpha[/Tex] ), you fail to reject the null hypothesis. This suggests that the observed results are consistent with what would be expected under the null hypothesis.

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

  • [Tex]\bar{x} [/Tex] is the sample mean,
  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]

  • [Tex]O_{ij}[/Tex] is the observed frequency in cell [Tex]{ij} [/Tex]
  • i,j are the rows and columns index respectively.
  • [Tex]E_{ij}[/Tex] is the expected frequency in cell [Tex]{ij}[/Tex] , calculated as : [Tex]\frac{{\text{{Row total}} \times \text{{Column total}}}}{{\text{{Total observations}}}}[/Tex]

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Case A

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] ​ and we get accordingly , Z =2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Python Implementation of Case B

Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.

Although hypothesis testing is a useful technique in data science , it does not offer a comprehensive grasp of the topic being studied.

  • Lack of Comprehensive Insight : Hypothesis testing in data science often focuses on specific hypotheses, which may not fully capture the complexity of the phenomena being studied.
  • Dependence on Data Quality : The accuracy of hypothesis testing results relies heavily on the quality of available data. Inaccurate data can lead to incorrect conclusions, particularly in hypothesis testing in machine learning .
  • Overlooking Patterns : Sole reliance on hypothesis testing can result in the omission of significant patterns or relationships in the data that are not captured by the tested hypotheses.
  • Contextual Limitations : Hypothesis testing in statistics may not reflect the broader context, leading to oversimplification of results.
  • Complementary Methods Needed : To gain a more holistic understanding, it’s essential to complement hypothesis testing with other analytical approaches, especially in data analytics and data mining .
  • Misinterpretation Risks : Poorly formulated hypotheses or inappropriate statistical methods can lead to misinterpretation, emphasizing the need for careful consideration in hypothesis testing in Python and related analyses.
  • Multiple Hypothesis Testing Challenges : Multiple hypothesis testing in machine learning poses additional challenges, as it can increase the likelihood of Type I errors, requiring adjustments to maintain validity.

Hypothesis testing is a cornerstone of statistical analysis , allowing data scientists to navigate uncertainties and draw credible inferences from sample data. By defining null and alternative hypotheses, selecting significance levels, and employing statistical tests, researchers can validate their assumptions effectively.

This article emphasizes the distinction between Type I and Type II errors, highlighting their relevance in hypothesis testing in data science and machine learning . A practical example involving a paired T-test to assess a new drug’s effect on blood pressure underscores the importance of statistical rigor in data-driven decision-making .

Ultimately, understanding hypothesis testing in statistics , alongside its applications in data mining , data analytics , and hypothesis testing in Python , enhances analytical frameworks and supports informed decision-making.

Understanding Hypothesis Testing- FAQs

What is hypothesis testing in data science.

In data science, hypothesis testing is used to validate assumptions or claims about data. It helps data scientists determine whether observed patterns are statistically significant or could have occurred by chance.

How does hypothesis testing work in machine learning?

In machine learning, hypothesis testing helps assess the effectiveness of models. For example, it can be used to compare the performance of different algorithms or to evaluate whether a new feature significantly improves a model’s accuracy.

What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

What is the difference between hypothesis testing and data mining?

Hypothesis testing focuses on evaluating specific claims or hypotheses about a dataset, while data mining involves exploring large datasets to discover patterns, relationships, or insights without predefined hypotheses.

How is hypothesis generation used in business analytics?

In business analytics , hypothesis generation involves formulating assumptions or predictions based on available data. These hypotheses can then be tested using statistical methods to inform decision-making and strategy.

What is the significance level in hypothesis testing?

The significance level, often denoted as alpha (α), is the threshold for deciding whether to reject the null hypothesis. Common significance levels are 0.05, 0.01, and 0.10, indicating the probability of making a Type I error in statistical hypothesis testing .

Similar Reads

  • Data Science
  • data-science

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Hypothesis Testing Calculator

Type ii error.

The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is known as a t test and we use the t distribution. Use of the t distribution relies on the degrees of freedom, which is equal to the sample size minus one. Furthermore, if the population standard deviation σ is unknown, the sample standard deviation s is used instead. To switch from σ known to σ unknown, click on $\boxed{\sigma}$ and select $\boxed{s}$ in the Hypothesis Testing Calculator.

Next, the test statistic is used to conduct the test using either the p-value approach or critical value approach. The particular steps taken in each approach largely depend on the form of the hypothesis test: lower tail, upper tail or two-tailed. The form can easily be identified by looking at the alternative hypothesis (H a ). If there is a less than sign in the alternative hypothesis then it is a lower tail test, greater than sign is an upper tail test and inequality is a two-tailed test. To switch from a lower tail test to an upper tail or two-tailed test, click on $\boxed{\geq}$ and select $\boxed{\leq}$ or $\boxed{=}$, respectively.

In the p-value approach, the test statistic is used to calculate a p-value. If the test is a lower tail test, the p-value is the probability of getting a value for the test statistic at least as small as the value from the sample. If the test is an upper tail test, the p-value is the probability of getting a value for the test statistic at least as large as the value from the sample. In a two-tailed test, the p-value is the probability of getting a value for the test statistic at least as unlikely as the value from the sample.

To test the hypothesis in the p-value approach, compare the p-value to the level of significance. If the p-value is less than or equal to the level of signifance, reject the null hypothesis. If the p-value is greater than the level of significance, do not reject the null hypothesis. This method remains unchanged regardless of whether it's a lower tail, upper tail or two-tailed test. To change the level of significance, click on $\boxed{.05}$. Note that if the test statistic is given, you can calculate the p-value from the test statistic by clicking on the switch symbol twice.

In the critical value approach, the level of significance ($\alpha$) is used to calculate the critical value. In a lower tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the lower tail of the sampling distribution of the test statistic. In an upper tail test, the critical value is the value of the test statistic providing an area of $\alpha$ in the upper tail of the sampling distribution of the test statistic. In a two-tailed test, the critical values are the values of the test statistic providing areas of $\alpha / 2$ in the lower and upper tail of the sampling distribution of the test statistic.

To test the hypothesis in the critical value approach, compare the critical value to the test statistic. Unlike the p-value approach, the method we use to decide whether to reject the null hypothesis depends on the form of the hypothesis test. In a lower tail test, if the test statistic is less than or equal to the critical value, reject the null hypothesis. In an upper tail test, if the test statistic is greater than or equal to the critical value, reject the null hypothesis. In a two-tailed test, if the test statistic is less than or equal the lower critical value or greater than or equal to the upper critical value, reject the null hypothesis.

When conducting a hypothesis test, there is always a chance that you come to the wrong conclusion. There are two types of errors you can make: Type I Error and Type II Error. A Type I Error is committed if you reject the null hypothesis when the null hypothesis is true. Ideally, we'd like to accept the null hypothesis when the null hypothesis is true. A Type II Error is committed if you accept the null hypothesis when the alternative hypothesis is true. Ideally, we'd like to reject the null hypothesis when the alternative hypothesis is true.

Hypothesis testing is closely related to the statistical area of confidence intervals. If the hypothesized value of the population mean is outside of the confidence interval, we can reject the null hypothesis. Confidence intervals can be found using the Confidence Interval Calculator . The calculator on this page does hypothesis tests for one population mean. Sometimes we're interest in hypothesis tests about two population means. These can be solved using the Two Population Calculator . The probability of a Type II Error can be calculated by clicking on the link at the bottom of the page.

Interested in teaching this course?

Lumen can help! Review our up-to-date Introduction to Statistics by clicking the link below. From there, you can request a demo and review the course materials in your Learning Management System (LMS).

Module 9: Hypothesis Testing With One Sample

Distribution needed for hypothesis testing, learning outcomes.

  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation known
  • Conduct and interpret hypothesis tests for a single population mean, population standard deviation unknown

Earlier in the course, we discussed sampling distributions.  Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student’s t- distribution . (Remember, use a Student’s t -distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large or the sample size is large).

If you are testing a  single population mean , the distribution for the test is for means :

[latex]\displaystyle\overline{{X}}\text{~}{N}{\left(\mu_{{X}}\text{ , }\frac{{\sigma_{{X}}}}{\sqrt{{n}}}\right)}{\quad\text{or}\quad}{t}_{{{d}{f}}}[/latex]

The population parameter is [latex]\mu[/latex]. The estimated value (point estimate) for [latex]\mu[/latex] is [latex]\displaystyle\overline{{x}}[/latex], the sample mean.

If you are testing a  single population proportion , the distribution for the test is for proportions or percentages:

[latex]\displaystyle{P}^{\prime}\text{~}{N}{\left({p}\text{ , }\sqrt{{\frac{{{p}{q}}}{{n}}}}\right)}[/latex]

The population parameter is [latex]p[/latex]. The estimated value (point estimate) for [latex]p[/latex] is p′ . [latex]\displaystyle{p}\prime=\frac{{x}}{{n}}[/latex] where [latex]x[/latex] is the number of successes and [latex]n[/latex] is the sample size.

Assumptions

When you perform a  hypothesis test of a single population mean μ using a Student’s t -distribution (often called a t-test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed . You use the sample standard deviation to approximate the population standard deviation. (Note that if the sample size is sufficiently large, a t-test will work even if the population is not approximately normally distributed).

When you perform a  hypothesis test of a single population mean μ using a normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a  hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution which are as follows: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np  and nq must both be greater than five ( np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and [latex]\displaystyle\sigma=\sqrt{{\frac{{{p}{q}}}{{n}}}}[/latex] . Remember that q = 1 – p .

Concept Review

In order for a hypothesis test’s results to be generalized to a population, certain requirements must be satisfied.

When testing for a single population mean:

  • A Student’s t -test should be used if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with an unknown standard deviation.
  • The normal test will work if the data come from a simple, random sample and the population is approximately normally distributed, or the sample size is large, with a known standard deviation.

When testing a single population proportion use a normal test for a single population proportion if the data comes from a simple, random sample, fill the requirements for a binomial distribution, and the mean number of success and the mean number of failures satisfy the conditions:  np > 5 and nq > n where n is the sample size, p is the probability of a success, and q is the probability of a failure.

Formula Review

If there is no given preconceived  α , then use α = 0.05.

Types of Hypothesis Tests

  • Single population mean, known population variance (or standard deviation): Normal test .
  • Single population mean, unknown population variance (or standard deviation): Student’s t -test .
  • Single population proportion: Normal test .
  • For a single population mean , we may use a normal distribution with the following mean and standard deviation. Means: [latex]\displaystyle\mu=\mu_{{\overline{{x}}}}{\quad\text{and}\quad}\sigma_{{\overline{{x}}}}=\frac{{\sigma_{{x}}}}{\sqrt{{n}}}[/latex]
  • A single population proportion , we may use a normal distribution with the following mean and standard deviation. Proportions: [latex]\displaystyle\mu={p}{\quad\text{and}\quad}\sigma=\sqrt{{\frac{{{p}{q}}}{{n}}}}[/latex].

Candela Citations

  • Distribution Needed for Hypothesis Testing. Provided by : OpenStax. Located at : . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

9.3 Distribution Needed for Hypothesis Testing

Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t -distribution . (Remember, use a Student's t -distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.) We perform tests of a population proportion using a normal distribution (usually n is large).

Assumptions

When you perform a hypothesis test of a single population mean μ using a Student's t -distribution (often called a t -test), there are fundamental assumptions that need to be met in order for the test to work properly. Your data should be a simple random sample that comes from a population that is approximately normally distributed . You use the sample standard deviation to approximate the population standard deviation. Note that if the sample size is sufficiently large, a t -test will work even if the population is not approximately normally distributed.

When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z -test), you take a simple random sample from the population. The population you are testing is normally distributed or your sample size is sufficiently large. You know the value of the population standard deviation which, in reality, is rarely known.

When you perform a hypothesis test of a single population proportion p , you take a simple random sample from the population. You must meet the conditions for a binomial distribution , which are the following: there are a certain number n of independent trials, the outcomes of any trial are success or failure, and each trial has the same probability of a success p . The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( np > 5 and nq > 5). Then the binomial distribution of a sample (estimated) proportion can be approximated by the normal distribution with μ = p and σ = p q n σ = p q n . Remember that q = 1 – p .

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-3-distribution-needed-for-hypothesis-testing

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Probability and Statistics for Scientists and Engineers

18 hypothesis testing with known distributions, 18.1 objectives.

Know and properly use the terminology of a hypothesis test, to include: permutation test , exact test , null hypothesis , alternative hypothesis , test statistic , \(p\) -value , and power .

Conduct all four steps of a hypothesis test using probability models.

18.2 Hypothesis testing using probability models

As a lead into the Central Limit Theorem in 19 and mathematical sampling distributions, we will look at a class of hypothesis testing where the null hypothesis specifies a probability model. In some cases, we can get an exact answer, and in others, we will use simulation to get an empirical \(p\) -value. By the way, a permutation test is an exact test ; by this we mean we are finding all possible permutations in the calculation of the \(p\) -value. However, since the complete enumeration of all permutations is often difficult, we approximate it with randomization, simulation. Thus, the \(p\) -value from a randomization test is an approximation of the exact (permutation) test.

Let’s use three examples to illustrate the ideas of this chapter.

18.3 Tappers and listeners

Here’s a game you can try with your friends or family. Pick a simple, well-known song. Tap that tune on your desk, and see if the other person can guess the song. In this simple game, you are the tapper, and the other person is the listener.

A Stanford University graduate student named Elizabeth Newton conducted an experiment using the tapper-listener game. 89 In her study, she recruited 120 tappers and 120 listeners into the study. About 50% of the tappers expected that the listener would be able to guess the song. Newton wondered, is 50% a reasonable expectation?

18.3.1 Step 1- State the null and alternative hypotheses

Newton’s research question can be framed into two hypotheses:

\(H_0\) : The tappers are correct, and, in general, 50% of listeners are able to guess the tune. \(p = 0.50\) \(H_A\) : The tappers are incorrect, and either more than or less than 50% of listeners are able to guess the tune. \(p \neq 0.50\)

Exercise : Is this a one-sided or two-sided hypothesis test? How many variables are in this model?

The tappers think that listeners will guess the song correctly 50% of the time, and this is a two-sided test since we don’t know beforehand if listeners will be better or worse than 50%.

There is only one variable of interest, whether the listener is correct.

18.3.2 Step 2 - Compute a test statistic

In Newton’s study, only 42 (we changed the number to make this problem more interesting from an educational perspective) out of 120 listeners ( \(\hat{p} = 0.35\) ) were able to guess the tune! From the perspective of the null hypothesis, we might wonder, how likely is it that we would get this result from chance alone? That is, what’s the chance we would happen to see such a small fraction if \(H_0\) were true and the true correct-guess rate is 0.50?

Now before we use simulation, let’s frame this as a probability model. The random variable \(X\) is the number of correct guesses out of 120. If the observations are independent and the probability of success is constant (each listener has the same probability of guessing correctly), then we could use a binomial model. We can’t assess the validity of these assumptions without knowing more about the experiment, the subjects, and the data collection. For educational purposes, we will assume they are valid. Thus, our test statistic is the number of successes in 120 trials. The observed value is 42.

18.3.3 Step 3 - Determine the \(p\) -value

We now want to find the \(p\) -value as \(2 \cdot \mbox{P}(X \leq 42)\) where \(X\) is a binomial random variable with \(p = 0.5\) and \(n = 120\) . Again, the \(p\) -value is the probability of the observed data or something more extreme, given the null hypothesis is true. Here, the null hypothesis being true implies that the probability of success is 0.50. We will use R to get the one-sided \(p\) -value and then double it to get the two-sided \(p\) -value for the problem. We selected \(\mbox{P}(X \leq 42)\) because “more extreme” means the observed values and values further from what you would get if the null hypothesis were true, which is 60 for this problem.

That is a small \(p\) -value.

18.3.4 Step 4 - Draw a conclusion

Based on our data, if the listeners were guessing correct 50% of the time, there is less than a \(0.0013\) probability that only 42 or less (or 78 or more) listeners would get correctly. This is the probability of what we observed or something more extreme, given the null hypothesis is true. This probability is much less than 0.05, so we reject the null hypothesis that the listeners are guessing correctly half of the time and conclude that the correct-guess rate rate is different from 50%.

This decision region looks like the pmf in Figure 18.1 . Any observed values inside the red boundary lines would be consistent with the null hypothesis. That is, any observed values inside the red boundary lines would result in a \(p\) -value larger than 0.05. Any values at the red line or more extreme would be in the rejection region, resulting in a \(p\) -value smaller than 0.05. We also plotted the observed value in black.

Binomial pmf

Figure 18.1: Binomial pmf

18.3.5 Repeat using simulation

We will repeat the analysis using an empirical (observed from simulated data) \(p\) -value. Step 1, stating the null and alternative hypothesis, is the same.

18.3.6 Step 2 - Compute a test statistic

We will use the proportion of listeners that get the song correct instead of the number of listeners that get it correct. This is a minor change since we are simply dividing by 120.

18.3.7 Step 3 - Determine the \(p\) -value

To simulate 120 games under the null hypothesis where \(p = 0.50\) , we could flip a coin 120 times. Each time the coin comes up heads, this could represent the listener guessing correctly, and tails would represent the listener guessing incorrectly. For example, we can simulate 5 tapper-listener pairs by flipping a coin 5 times:

\[ \begin{array}{ccccc} H & H & T & H & T \\ Correct & Correct & Wrong & Correct & Wrong \\ \end{array} \]

After flipping the coin 120 times, we got 56 heads for a proportion of \(\hat{p}_{sim} = 0.467\) . As we did with the randomization technique, seeing what would happen with one simulation isn’t enough. In order to evaluate whether our originally observed proportion of 0.35 is unusual or not, we should generate more simulations. Here, we’ve repeated this simulation 10,000 times:

Note, we could simulate it a number of ways. Here is a way using do() that will look like what we’ve done for other randomization tests.

The estimated sampling distribution.

Figure 18.2: The estimated sampling distribution.

Notice in Figure 18.2 how the sampling distribution is centered at 0.5 and looks symmetrical.

The \(p\) -value is found using the prop1 function. In this problem, we really need the observed value to be included to prevent a \(p\) -value of zero.

18.3.8 Step 4 - Draw a conclusion

In these 10,000 simulations, we see very few results close to 0.35. Based on our data, if the listeners were guessing correct 50% of the time, there is less than a \(0.0012\) probability that only 35% or less or 65% or more listeners would get it right. This \(p\) -value is much less than 0.05, so we reject that the listeners are guessing correctly half of the time and conclude that the correct-guess rate is different from 50%.

Exercise : In the context of the experiment, what is the \(p\) -value for the hypothesis test? 90
Exercise : Do the data provide statistically significant evidence against the null hypothesis? State an appropriate conclusion in the context of the research question. 91

18.4 Cardiopulmonary resuscitation (CPR)

Let’s return to the CPR example from last chapter. As a reminder, we will repeat some of the background material.

Cardiopulmonary resuscitation (CPR) is a procedure sometimes used on individuals suffering a heart attack. It is helpful in providing some blood circulation to keep a person alive, but CPR chest compressions can also cause internal injuries, which complicate additional treatment efforts. For instance, blood thinners may be used to help release a clot that is causing the heart attack, but blood thinners negatively affect internal injuries.

Patients who underwent CPR for a heart attack and were subsequently admitted to a hospital 92 were randomly assigned to either receive a blood thinner (treatment group) or not receive a blood thinner (control group). The outcome variable of interest was whether the patient survived for at least 24 hours.

18.4.1 Step 1- State the null and alternative hypotheses

We want to understand whether blood thinners are helpful or harmful. We’ll consider both of these possibilities using a two-sided hypothesis test.

\(H_0\) : Blood thinners do not have an overall survival effect; survival rate is independent of experimental treatment group. \(p_c - p_t = 0\) . \(H_A\) : Blood thinners have an impact on survival, either positive or negative, but not zero. \(p_c - p_t \neq 0\) .

Let’s put the data in a table.

18.4.2 Step 2 - Compute a test statistic.

In this example, we can think of the data as coming from a hypergeometric distribution. This is really a binomial from a finite population. We can calculate the \(p\) -value using this probability distribution. The random variable is the number of control patients that survived from a total population of 90 patients, where 50 are control patients and 40 are treatment patients, and where a total of 25 survived.

18.4.3 Step 3 - Determine the \(p\) -value.

In this case, we want to find \(\mbox{P}(X \leq 11)\) (the observed number of control patients that survived) and double it since it is a two-sided test.

Note: We could have picked the lower right cell as the reference cell. But now I want the \(\mbox{P}(X \geq 14)\) (the observed number of treatment patients that survived) with the appropriate change in parameter values. Notice we get the same answer.

We could do the same thing for the other two cells. Here we find \(\mbox{P}(X \leq 26)\) (the observed number of treatment patients that did not survive).

Here we find \(\mbox{P}(X \geq 39)\) (the observed number of control patients that did not survive).

R also has a built in function, fisher.test() , that we could use. This function calculates Fisher’s exact test, where \(p\) -values are obtained using the hypergeometric distribution.

The \(p\) -value is slightly different since the hypergeometric distribution is not symmetric. For this reason, doubling the \(p\) -value from the single side result is not quite right. The algorithm in fisher.test() finds and adds all probabilities less than or equal to value of \(\mbox{P}(X = 11)\) , see Figure 18.3 . Using fisher.test() gives the correct \(p\) -value.

Hypergeometric pmf showing the cutoff for $p$-value calculation.

Figure 18.3: Hypergeometric pmf showing the cutoff for \(p\) -value calculation.

This is how fisher.test() is calculating the \(p\) -value:

The randomization test in the last chapter yielded a \(p\) -value of 0.257 so all tests are consistent.

18.4.4 Step 4 - Draw a conclusion

Since this \(p\) -value is larger than 0.05, we fail to reject the null hypothesis. That is, we do not find statistically significant evidence that the blood thinner has any influence on survival of patients who undergo CPR prior to arriving at the hospital. Once again, we can discuss the causal conclusion since this is an experiment.

Notice that in these first two examples, we had a test of a single proportion and a test of two proportions. The single proportion test did not have an equivalent randomization test since there is not a second variable to shuffle. We were able to get answers since we found a probability model that we could use instead.

18.5 Golf Balls

Our last example will be interesting because the distribution has multiple parameters and a test metric is not obvious at this point.

The owners of a residence located along a golf course collected the first 500 golf balls that landed on their property. Most golf balls are labeled with the make of the golf ball and a number, for example “Nike 1” or “Titleist 3”. The numbers are typically between 1 and 4, and the owners of the residence wondered if these numbers are equally likely (at least among golf balls used by golfers of poor enough quality that they lose them in the yards of the residences along the fairway.)

We will use a significance level of \(\alpha = 0.05\) since there is no reason to favor one decision error over the other.

18.5.1 Step 1- State the null and alternative hypotheses

We think that the numbers are not all equally likely. The question of one-sided versus two-sided is not relevant in this test. You will see this when we write the hypotheses.

\(H_0\) : All of the numbers are equally likely. \(\pi_1 = \pi_2 = \pi_3 = \pi_4\) Or \(\pi_1 = \frac{1}{4}, \pi_2 =\frac{1}{4}, \pi_3 =\frac{1}{4}, \pi_4 =\frac{1}{4}\) \(H_A\) : There is some other distribution of the numbers in the population. At least one population proportion is not \(\frac{1}{4}\) .

Notice that we switched to using \(\pi\) instead of \(p\) for the population parameter. There is no reason other than to make you aware that both are used.

This problem is an extension of the binomial. Instead of two outcomes, there are four outcomes. This is called a multinomial distribution. You can read more about it if you like, but our methods will not make it necessary to learn the probability mass function.

Out of the 500 golf balls collected, 486 of them had a number between 1 and 4. We will deal with only these 486 golf balls. Let’s get the data from `golf_balls.csv”.

18.5.2 Step 2 - Compute a test statistic.

If all numbers were equally likely, we would expect to see 121.5 golf balls of each number. This is a point estimate and thus not an actual value that could be realized. Of course, in a sample we will have variation and thus departure from this state. We need a test statistic that will help us determine if the observed values are reasonable under the null hypothesis. Remember that the test statistic is a single number metric used to evaluate the hypothesis.

Exercise : What would you propose for the test statistic?

With four proportions, we need a way to combine them. This seems tricky, so let’s just use a simple approach. Let’s take the maximum number of balls across all cells of the table and subtract the minimum. This is called the range and we will denote the parameter as \(R\) . Under the null hypothesis, this should be zero. We could re-write our hypotheses as:

\(H_0\) : \(R=0\) \(H_A\) : \(R>0\)

Notice that \(R\) will always be non-negative, thus this test is one-sided.

The observed range is 34, \(138 - 104\) .

18.5.3 Step 3 - Determine the \(p\) -value.

We don’t know the distribution of our test statistic, so we will use simulation. We will simulate data from a multinomial distribution under the null hypothesis and calculate a new value of the test statistic. We will repeat this 10,000 times and this will give us an estimate of the sampling distribution.

We will use the sample() function again to simulate the distribution of numbers under the null hypothesis. To help us understand the process and build the code, we are only initially using a sample size of 12 to keep the printout reasonable and easy to read.

Notice this is not using tidyverse coding ideas. We don’t think we need tibbles or data frames so we went with straight nested R code. You can break this code down by starting with the code in the center.

We are now ready to ramp up to the full problem. Let’s simulate the data under the null hypothesis. We are sampling 486 golf balls (instead of 12) with the numbers 1 through 4 on them. Each number is equally likely. We then find the range, our test statistic. Finally we repeat this 10,000 to get an estimate of the sampling distribution of our test statistic.

Figure 18.4 is a plot of the sampling distribution of the range.

Sampling distribution of the range.

Figure 18.4: Sampling distribution of the range.

Notice how this distribution is skewed to the right. The \(p\) -value is 0.14.

18.5.4 Step 4 - Draw a conclusion

Since this \(p\) -value is larger than 0.05, we fail to reject the null hypothesis. That is, based on our data, we do not find statistically significant evidence against the claim that the number on the golf balls are equally likely. We can’t say that the proportion of golf balls with each number differs from 0.25.

18.6 Repeat with a different test statistic

The test statistic we developed was helpful, but it seems weak because we did not use the information in all four cells. So let’s devise a metric that does this. The hypotheses are the same, so we will jump to step 2.

18.6.1 Step 2 - Compute a test statistic.

If each number were equally likely, we would have 121.5 balls in each bin. We can find a test statistic by looking at the deviation in each cell from 121.5.

Now we need to collapse these into a single number. Just adding will always result in a value of 0, why? So let’s take the absolute value and then add the cells together.

This will be our test statistic.

18.6.2 Step 3 - Determine the \(p\) -value.

We will use similar code from above with our new metric. Now we sample 486 golf balls with the numbers 1 through 4 on them, and find our test statistic, the sum of the absolute deviations of each cell of the table from the expected count, 121.5. We repeat this process 10,000 times to get an estimate of the sampling distribution of our test statistic.

Figure 18.5 is a plot of the sampling distribution of the absolute value of deviations.

Sampling distribution of the absolute deviations.

Figure 18.5: Sampling distribution of the absolute deviations.

Notice how this distribution is skewed to the right and our test statistic seems to be more extreme.

The \(p\) -value is 0.014. This value is much smaller than our previous result. The test statistic matters in our decision process as nothing about this problem has changed except the test statistic.

18.6.3 Step 4 - Draw a conclusion

Since this \(p\) -value is smaller than 0.05, we reject the null hypothesis. That is, based on our data, we find statistically significant evidence against the claim that the numbers on the golf balls are equally likely. We conclude that the numbers on the golf balls are not all equally likely, or that at least one is different.

18.7 Summary

In this chapter, we used probability models to help us make decisions from data. This chapter is different from the randomization section in that randomization had two variables (one of which we could shuffle) and a null hypothesis of no difference. In the case of a single proportion, we were able to use the binomial distribution to get an exact \(p\) -value under the null hypothesis. In the case of a \(2 \times 2\) table, we were able to show that we could use the hypergeometric distribution to get an exact \(p\) -value under the assumptions of the model.

We also found that the choice of test statistic has an impact on our decision. Even though we get valid \(p\) -values and the desired Type 1 error rate, if the information in the data is not used to its fullest, we will lose power. Note: power is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true.

In the next chapter, we will learn about mathematical solutions to finding the sampling distribution. The key difference in all these methods is the selection of the test statistic and the assumptions made to derive a sampling distribution.

18.8 Homework Problems

Repeat the analysis of the yawning data from last chapter, but this time use the hypergeometric distribution.

Is yawning contagious?

An experiment conducted by the MythBusters , a science entertainment TV program on the Discovery Channel, tested if a person can be subconsciously influenced into yawning if another person near them yawns. 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a group where there wasn’t a person yawning near them (control). The following table shows the results of this experiment.

\[ \begin{array}{cc|ccc} & & &\textbf{Group}\\ & & \text{Treatment } & \text{Control} & \text{Total} \\ & \hline \text{Yawn} & 10 & 4 & 14 \\ \textbf{Result} & \text{Not Yawn} & 24 & 12 & 36 \\ &\text{Total} & 34 & 16 & 50 \\ \end{array} \]

The data is in the file yawn.csv .

What are the hypotheses?

Choose a cell, and calculate the observed statistic.

Find the \(p\) -value using the hypergeometric distribution.

Plot the the sampling distribution.

Determine the conclusion of the hypothesis test.

Compare your results with the randomization test.

Repeat the analysis of the golf ball data using a different test statistic.

Use a level of significance of 0.05.

State the null and alternative hypotheses.

Compute a test statistic.

Determine the \(p\) -value.

Draw a conclusion.

  • Body Temperature

Shoemaker 93 cites a paper from the American Medical Association 94 that questions conventional wisdom that the average body temperature of a human is 98.6 degrees Fahrenheit. One of the main points of the original article is that the traditional mean of 98.6 is, in essence, 100 years out of date. The authors cite problems with the original study’s methodology, diurnal fluctuations (up to 0.9 degrees F per day), and unreliable thermometers. The authors believe the average human body temperature is less than 98.6. Conduct a hypothesis test.

State the significance level that will be used.

Load the data from the file “temperature.csv” and generate summary statistics and a boxplot of the temperature data. We will not be using gender or heart rate for this problem.

Compute a test statistic. We are going to help you with this part. We cannot do a randomization test since we don’t have a second variable. It would be nice to use the mean as a test statistic but we don’t yet know the sampling distribution of the sample mean.

Let’s get clever. If the distribution of the sample is symmetric (this is an assumption but look at the boxplot and summary statistics to determine if you are comfortable with it), then under the null hypothesis, the observed values should be equally likely to either be greater or less than 98.6. Thus, our test statistic is the number of cases that have a positive difference between 98.6 and the observed value. This will be a binomial distribution with a probability of success (having a positive difference) of 0.5. You must also account for the possibility that there are observations of 98.6 in the data.

Solutions Manual

COMMENTS

  1. 9.4: Distribution Needed for Hypothesis Testing

    This page titled 9.4: Distribution Needed for Hypothesis Testing is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform. When testing for a single population mean: A Student's t-test should be used if the data come from a ...

  2. Hypothesis Testing: A Complete Guide for Beginners

    Key Terms in Hypothesis Testing. Before diving into the details, let's understand some important terms used in hypothesis testing: 1. Test Statistic. The test statistic is a number calculated from your data that is compared against a known distribution (like the normal distribution) to test the null hypothesis.

  3. Understanding Hypothesis Testing

    If Test Statistic>Critical Value: Reject the null hypothesis. If Test Statistic≤Critical Value: Fail to reject the null hypothesis. Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t ...

  4. Hypothesis Testing Calculator with Steps

    Hypothesis Testing Calculator. The first step in hypothesis testing is to calculate the test statistic. The formula for the test statistic depends on whether the population standard deviation (σ) is known or unknown. If σ is known, our hypothesis test is known as a z test and we use the z distribution. If σ is unknown, our hypothesis test is ...

  5. Understanding Hypothesis Testing. A simple yet detailed dive into all

    The selection of which statistical test to use your hypothesis testing depends on several factors, such as — the distribution of the sample (whether it is normally distributed (follows the normal distribution)), what the sample size is, whether the variance is known, the type of data that you have, amongst some other things.

  6. Statistical hypothesis test

    A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. ... For example, the test statistic might follow a Student's t distribution with known degrees of freedom, or a normal distribution with known mean and variance.

  7. 9.3 Probability Distribution Needed for Hypothesis Testing

    Assumptions. When you perform a hypothesis test of a single population mean μ using a normal distribution (often called a z-test), you take a simple random sample from the population. The population you are testing is normally distributed, or your sample size is sufficiently large.You know the value of the population standard deviation, which, in reality, is rarely known.

  8. Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  9. 9.3 Distribution Needed for Hypothesis Testing

    Earlier in the course, we discussed sampling distributions. Particular distributions are associated with hypothesis testing. Perform tests of a population mean using a normal distribution or a Student's t-distribution. (Remember, use a Student's t-distribution when the population standard deviation is unknown and the distribution of the sample mean is approximately normal.)

  10. Chapter 18 Hypothesis Testing with Known Distributions

    18.2 Hypothesis testing using probability models. As a lead into the Central Limit Theorem in 19 and mathematical sampling distributions, we will look at a class of hypothesis testing where the null hypothesis specifies a probability model. In some cases, we can get an exact answer, and in others, we will use simulation to get an empirical \(p\)-value.. By the way, a permutation test is an ...