Understanding Hypothesis Tests

Lesson Features »

Lesson Priority: VIP Knowledge

Prob and Stats $\longrightarrow$
Inference Testing $\longrightarrow$
  • Learn why hypothesis tests exist and what they are intended to do
  • Define the Null Hypothesis and the Alternative Hypothesis
  • Understand the concept of statistical significance
  • Learn how test statistics are usually calculated and how they are used in hypothesis tests
  • Know what the p-value result is and what it means
  • Begin to understand the difference between measuring single populations and measuring the difference between two populations
Lesson Description

Hands down, one of the most powerful tools in Statistics is inference testing, which gives us mathematical evidence of a hypothesis. This method is used in real world medical and science applications every single day. This introductory lesson will help us understand how hypothesis tests generally work, and how to use them appropriately.

Practice Problems

Practice problems and worksheet coming soon!


Judgement Day

Many students find the concept of hypothesis testing a bit convoluted. This lesson explains how it works while the following lessons walk through actual mathematical examples (one lesson for each type of hypothesis test). I highly recommend reading this lesson with a mind to understand the concept, working on one or two of the next lessons, and then giving this one another read. The back-and-forth of concept and application will greatly help your understanding of hypothesis testing.There is a lot of conceptual stuff here and practice will help!
As we continue to see, the purpose of statistics is to understand the unknown through the lens of probability. Hypothesis tests give us a formal way to determine whether or not we have reason to believe a claim based on statistical likelihood, given some observed data.In this introductory lesson, our goal will be to understand and set up hypothesis test statements. We'll look at executing specific tests over the next few lessons, for specific types of tests, such as:Simply stated, hypothesis tests are a mathematical way to make an evidence-based decision, much like in a court of law. We start out with a claim, we collect some sample data, we use that data and statistical knowledge as evidence, and then we draw our verdict.

A First Example

If you've just seen hypothesis tests in class for the first time and feel overwhelmed, don't panic - you certainly aren't alone! Some of the details we eventually need to master make it harder to initially understand what the concept is all about.Let's look at a high level example.Hypothesis Test - First ExampleIn a very large metropolitan school district, parents and students have complained that at least $30 \%$ of school windows are significantly damaged and in need of replacement. The local government agrees to fund the replacement of all windows in all schools if this claim appears true.Since there are at least $100,000$ windows, it is impractical to test each one. Instead, they send out a surveyor in a simple random sample of $100$ windows. The survey was designed so that every window had an equal chance of being selected.The surveyor came back and found that 33 windows were damaged. The government needs to determine whether this result statistically suggests that more than $30 \%$ of all windows are truly damaged.Let's dissect this situation to understand and learn some important ideas and vocab.The ClaimThe most important part of understanding and setting up a hypothesis test is identifying the claim. Here, the claim is that at least thirty percent of all school windows are damaged.The HypothesisStatistical hypotheses are often viewed as backward from what you might expect, because we're always going to assume the claim is wrong unless there is evidence to suggest otherwise. This assumption is referred to as the null hypothesis, and is notated as an H (which stands for hypothesis) with a zero subscript: $H_{0}$.For this scenario, the null hypothesis will be that the true proportion of damaged school windows is less than $30 \%$:$$H_{0}: \;\; p < 30$$where $p$ represents the true proportion of damaged windows.Underneath a null hypothesis, we will always write down the alternative hypothesis. It will be the opposite of the null hypothesis mathematically. This hypothesis will represent the suspected claim, and is notated with an H and an A subscript: $H_{A}$ (some texts notate it as $H_{1}$ but you shouldn't lose points either way).For this scenario, the alternative hypothesis will be that the true proportion of damaged school windows is at least $30 \%$:$$H_{A}: \;\; p \ge 30$$
You Should Know
While the alternative hypothesis will always represent the claim, some textbooks tell you that the null hypothesis should always have an equal sign no matter what, even when the alternative hypothesis has an inequality sign. I find this confusing and unintuitive, but if you encounter this way of doing it and your teacher insists, of course do what he or she says.For example, in this problem we would represent the hypotheses as follows, if our teacher follows this ideology:$$H_{0}: \;\; p = 30$$$$H_{A}: \;\; p /ge 30$$
The TestWe will soon learn what specific statstics calculations to perform to determine whether or not we believe the null hypothesis is true.Generally speaking, we will calculate a test statistic (often notated $T_{\mathrm{stat}}$) from our sample data, and measure it's cumulative likelihood on a probability distribution. We will cite its location on the probability distribution as evidence against the null hypothesis. Depending on the test, very high or low locations of $T_{\mathrm{stat}}$ will be strong evidence that we should doubt the null hypothesis.The ConclusionThe test statistic's location on the probability curve is considered against a pre-determined max or min value called the critical value. If $T_{\mathrm{stat}}$ is further from the middle than the critical value, we will reject the null hypothesis and believe the alternative hypothesis.It is very important to note that we never say the null hypothesis is true - instead we either say we have reason to doubt it, or we do not have reason to doubt it. It may seem like we're simply mincing words but it is an important difference and we'll continue to practice this concept.Additionally, if we reject the null hypothesis at our chosen significance level, we instead choose to believe the alternative hypothesis (the original claim). This doesn't mean we know it to be true, just that we believe it to be true. Again it may seem like semantics but the distinction can be important.For example, if we determine that the $33$ out of $100$ broken windows are enough to statistically suggest that with $95 \%$ likelihood that we believe the claim, we still don't actually know if the claim is true. The only way to know if the claim is true is to go out and survey every single one of the $100,000$ windows - which is exactly what we are trying to avoid by using statistics. Instead we would say that the $33$ broken windows make us substantially doubt the null hypothesis that less than $30 \%$ are broken, and therefore believe that more than $30 \%$ are broken.
Pro Tip
Think of this null / alternative hypothesis process in the context of a court case. The plaintiff makes a claim but the defendant is innocent until proven guilty. Similarly, we assume the null hypothesis is true until suggested false, and we always set the null hypothesis to the opposite of the claim.If you follow this idea, then all you have to do to get comfortable setting up hypothesis tests is become masterful at identifying the claim when you read a problem, which you'll pick up on quickly after doing a few problems.

Significance Level

In any hypothesis test, we will choose how strong the evidence needs to be to convince us that we should doubt the null hypothesis. After all, if someone told you there was only a $50 \%$ chance that the claim was correct, you wouldn't have as much conviction in believing that claim as you would if someone told you there was a $99 \%$ chance that the claim is correct.Problems will often dictate this number to us, and it is usually $95 \%$. The way in which we state the required strength of evidence is with the probabilistic complement of the assurance likelihood, or $1$ minus the probability. For $95 \%$ assurance, we would be told to use $5 \%$ significance. This is referred to as the significance level.
You Should Know
When a problem doesn't specify a significance level, you should pick one and state it. Picking $5 \%$ is a safe bet.
The reason that we use the complement is based on how we interpret the result we get when we measure our test statistic against the critical value.One of the ways to interpret $T_{\mathrm{stat}}$ is what's called the p-value. The p-value is a probability that represents how likely it is that the sample data was observed, if the null hypothesis was really true. This is because the p-value comes from measuring $T_{\mathrm{stat}}$, and $T_{\mathrm{stat}}$ is calculated based on the sample data.If the p-value is $3 \%$, it means that there is only a $3 \%$ chance that we would have observed the sample data that we observed, if the null was true. Loosely speaking, this gives us $97 \%$ assurance that we should doubt the null and believe the alternative hypothesis (the claim).
Pro Tip
The test statistic value and the resulting p-value can be used interchangeably to draw the conclusion, because the p-value is calculated using the test statistic. Teachers who explain this ambiguously often leave students thinking they are separate measures. The critical value cutoff is chosen such that any larger $T_{\mathrm{stat}}$ would give a p-value that is small enough to reject the null. Therefore, getting a $T_{\mathrm{stat}}$ large enough to reject the null is equivalent to getting a p-value that is small enough to reject the null.

Other Concepts

After a test is conducted, we often need to answer follow-up questions or further explain the results. The focus of this follow-up is the two types of mistakes that could be made, relative to the truth of the claim.Recall that earlier we pointed out that the truth is unknowable. We have a claim, and we conduct a probabilistic test using sample data, but we aren't able to know whether the claim is actually true or false - we only get to make a conclusion based on likelihood. Even if we can be $99.99 \%$ sure we know the truth, we can never be $100 \%$.This means that there are two types of errors that can be made in a hypothesis test, depending on the outcome.If we reject the null hypothesis based on the test results, meaning we believe the claim is true, it is possible that in reality, the claim is false. After all, we don't actually know the truth. This is always going to be a relatively small probability, and we refer to it as the probability of a Type I error.If we don't reject the null hypothesis, it is possible that the claim is true even though we do not have the statistical evidence to support it. The probability that this happens is referred to as the probability of a Type II error.The probability of making a Type I error is easy to find, because it is simply equal to the significance level. This will never be a mystery or a difficult number to find.We will learn and practice how to calculate Type II error probabilities in each specific hypothesis test lesson that follows. Type II probabilities are a nuisance to calculate, but while they are part of the syllabus and show up on tests, you'll still do well grade-wise if you can do everything else correctly, so don't lose heart if Type II errors become a challenge for you.We refer to the probability of a Type II error as $\beta$, and the probability $1-\beta$ is referred to as the power of the test because it tells us the probability that we correctly failed to reject the null hypothesis (i.e. the probability that we don't have evidence to support the claim, and at the same time the claim is false).

Final Vocab Recap

You should know the terms we discussed, including which errors Type I and Type II refer to. Additionally, you should know the symbols commonly used.I strongly recommend that you go over these somewhat quickly, and then revisit them once you've learned one or two specific types of hypothesis tests, such as proportion tests » or mean value tests ».ClaimA statement about a hypothesized state of being that we wish to obtain mathematical evidence about, to decide whether we believe it to be true or false.Null HypothesisDenoted $H_{0}$. This represents the baseline assumption that we have in the absence of the claim, or if the claim wasn't made, and therefore is the mathematical opposite of the claim.Alternative HypothesisDenoted $H_{A}:$. This represents the claim, which we will believe to be true if we find probability evidence against the null. It is the mathematical equivalent of the claim, and therefore the mathematical opposite of the null.Test StatisticDenoted $T_{\mathrm{stat}}$. This is a statistical value derived from the sample data we observed.Critical ValueThis is the maximum or minimum limit that we use to determine whether the test statistic is too extreme, dependent on the significance level. If it is, the p-value will be small and we will reject the null hypothesis.Significance LevelThis measures how strong we desire the evidence against the null hypothesis to be before saying it is strong enough to reject the null hypothesis. Often denoted $\alpha$, this is a given value and is usually $0.05$.p-valueThis is a measure of the likelihood of observing the test statistic, given that the null hypothesis is true. In other words, it is the probability that we would have observed the sample data that we observed, if the null was true. We reject the null if this value is less than the critical value.Type I ErrorThis occurs when we mistakenly reject the null hypothesis. The probability of a Type I error is equal to the significance level, $\alpha$.Type II ErrorThis occurs when we mistakenly fail to reject the null hypothesis. The probability of a Type II error is $\beta$, and we'll learn how to calculate it in future lessons on specific types of tests.Power of the TestThis is $1-\beta$, and represents the probability that we correctly accepted the null hypothesis.

Put It To The Test

Each of the following examples are focused on the setup and drawing conclusions based on the general ideas of hypothesis tests. Try to answer each question, and remember that many students find it helpful to revisit these concepts after working on actual tests that you'll execute in the following lessons.
Example 1Many people believe scientists are generally introverted. A magazine article recently claimed that $40 \%$ of scientists are self-proclaimed introverts.One outgoing scientist found this hard to believe, thinking the real number would be lower, and consequently sampled $250$ scientist friends, asking whether or not they would self-describe as an introvert.Write the claim and corresponding hypotheses that belong in this hypothesis test.
Show solution
$\blacktriangleright$ The claim being made by the disbelieving outgoing scientist is that less than $40 \%$ of scientists are self-proclaimed introverts. We know this because he is "thinking the real number would be lower."The alternative hypothesis should match the claim, and the null hypothesis will be the mathematical opposite. It is also important to state what your symbols mean, if none were given to you.Let $p$ represent the true proportion of scientists who self-identify as introverts. We will test the following hypotheses:$$H_{0}: \;\;\;\; p \ge 0.4$$$$H_{A}: \;\;\;\; p < 0.4$$$\blacksquare$
Example 2Jack's high school has a lot of musicians, but so does his friend Bill's in the next town over. Jack claims that the proportion of all students who are musicians is higher at his high school than at Bill's. They will collect an appropriate sample survey at each of their schools and compare. Let $\mu_{J}$ represent the true proportion of students at Jack's high school who are musicians, and let $\mu_{B}$ represent the true proportion of students at Bill's high school who are musicians.Write the correct pair of hypothesis for the test that Jack wants to conduct.
Show solution
$\blacktriangleright$ The claim is explicitly stated in the problem for us, so we only need to interpret it as the alternative hypothesis and write the correct null. Make sure to use the same symbols as the problem has already defined.$$H_{0}: \;\;\;\; \mu_{J} \le \mu_{B}$$$$H_{A}: \;\;\;\; \mu_{J} > \mu_{B}$$We often prefer to write hypotheses that involve comparing two quantities as a difference. Obtain this equivalent version of the hypothesis by subtracting $\mu_{B}$ from each side of the inequality in each hypothesis.$$H_{0}: \;\;\;\; \mu_{J} - \mu_{B} \le 0$$$$H_{A}: \;\;\;\; \mu_{J} - \mu_{B} > 0$$It makes the actual testing process easier to use this form, as we'll soon learn.$\blacksquare$
Example 3One particularly proud dairy farmer believed his cows to be the biggest and best in all the land. On average that year, dairy cows in the United States weighed an average of $1,650$ lbs, with a standard deviation of $164$ lbs. He owns several thousand cows, so he does not have the time or resources to weigh them all to prove his claim.Instead, he will conduct a hypothesis test using a sample of cows. He selects $53$ cows at random and records their weight. These cows weigh $1,690$ lbs on average.Write a claim that you think the farmer would make, and the accompanying hypothesis pair for the test.
Show solution
$\blacktriangleright$ If the farmer believes that he has larger than average cows, then he would want to test the claim that his cows are statistically larger than the dairy cow population average.Let $\mu_{C}$ represent the average weight of the farmer's dairy cow population.The corresponding hypothesis are$$H_{0}: \;\;\;\; \mu_{C} \le 1,650$$$$H_{A}: \;\;\;\; \mu_{C} > 1,650$$$\blacksquare$
Example 4Orange Home Store claims that their prices are better than their competitor, Blue Home Store, on average, for all products that both stores sell. The average price is defined by the per item average price of every possible item - hundreds of thousands of items are on this list.A hypothesis test is to be conducted to test this claim by selecting 300 random products that both stores sell, averaging the price at each store, and comparing the results. Let $\mu_{O}$ represent the population average price at Orange Home Store, and let $\mu_{B}$ represent the population average price at Blue Home Store.Write the null and alternative hypotheses that correspond to Orange Home Store's claim.
Show solution
$\blacktriangleright$ For this comparison test, the claim is that $\mu_{O}$ is smaller than $\mu_{B}$. Another way of saying this is$$\mu_{O} - \mu_{B} < 0$$This will be our alternative hypothesis, so the null will be the opposite.$$H_{0}: \;\;\;\; \mu_{O} - \mu{B} \ge 0$$$$H_{A}: \;\;\;\; \mu_{O} - \mu{B} < 0$$Recall that setting up our hypotheses differently by saying something like$$H_{0}: \;\;\;\; \mu_{O} \ge \mu{B}$$$$H_{A}: \;\;\;\; \mu_{O} < \mu{B}$$is not wrong, but as mentioned earlier, it is going to be convenient to use the version with the zeroes. The version without the zeroes is arguably easier to digest and more intuitive, so it's ok if you start out with that to make sure you've got it down properly, but you should ultimately end up with the "zeroes" version.$\blacksquare$
Example 5Given$$H_{0}: \;\;\; \mu < 50$$$$H_{A}: \;\;\; \mu \ge 50$$A sample is collected and a test statistic is created from the data. The resulting p-value is $0.07$ and the significance level is $0.05$.Write the appropriate conclusion of the test.
Show solution
$\blacktriangleright$ Unless the p-value is smaller than the significance level, we do not have sufficient evidence to reject the null hypothesis.Therefore, for this test, we fail to reject the null hypothesis that $\mu$ is less than $50$. The data we collected are not statistically significant enough to believe otherwise.(and for the millionth time, remember that this does not mean that we "accept" the null hypothesis, only that we cannot refute it!)$\blacksquare$
Example 6In Example 4 we looked at a claim from Orange Home Store. Suppose that the hypothesis test was conducted at the $5 \%$ significance level and found that the test statistic was an extreme value, leading to a very small p-value for the test - less than $5 \%$.State the conclusion of the hypothesis test and write a full in-context sentence giving the interpretation of that conclusion.
Show solution
$\blacktriangleright$ Because the p-value is smaller than the significance level, we say the sample data has statistical significance, and we reject the null hypothesis $H_{0}$ in lieu of this, instead accepting the alternative hypothesis $H_{A}$.As requested (and this is commonly asked for), write this conclusion using the context of the specific claim from the problem:Based on the observed data, I reject $H_{0}$ at the $5 \%$ significance level. There is sufficient evidence to doubt that Orange Home Store has higher prices, so we instead accept the claim that Orange Home Store has lower prices.$\blacksquare$
Example 7In Example 1, the introverted scientist hypothesis test was conducted and the null hypothesis that the proportion of introverted scientists was at least $40 \%$ was consequently rejected.What type of error could have occurred? State what happened in context of the problem, if the error did occur.
Show solution
$\blacktriangleright$ A Type I error is possible when we conclude that we should reject the null hypothesis. If we mistakenly reject the null, that means we mistakenly believed the claim had strong enough evidence - which means the claim was not true despite the fact that there was evidence.In context, if a Type I error occurred, then the true proportion of scientists who self-identify as introverts is $40 \%$ or more in actuality, even though our sample data provided strong evidence that it is lower than $40 \%$. $\blacksquare$
Example 8In Example 3, a farmer was about to conduct a hypothesis test to test his claim that his cows were statistically above average. The test was conducted and there was insufficient evidence to reject the null hypothesis.What type of error could have occurred, and, in context of the situation, what would have had to have been true for that to happen?
Show solution
$\blacktriangleright$ If we fail to reject the null hypothesis, it is possible that in reality the claim is true, and we should have rejected the null. This is called a Type II error.In context, if a Type II error occurred in the farmer's hypothesis test, it is because the sample of his cows did not weigh enough on average to provide strong enough evidence that his whole cow population weighs measurably more than the national average, even though in actuality, his whole cow population does weigh more than the national average. $\blacksquare$
Lesson Takeaways
  • Know what a hypothesis test is and what it is intended to do
  • Be able to identify a claim in a problem, and write the corresponding null and alternative hypotheses
  • Begin to understand the specific evidence-based process of rejecting or not rejecting the null hypothesis
  • Be familiar with the terms we use to describe the conclusion of a hypothesis test, and what they mean
  • Remember the difference between Type I and Type II errors, and be able to state their meaning in context

Lesson Metrics

At the top of the lesson, you can see the lesson title, as well as the DNA of Math path to see how it fits into the big picture. You can also see the description of what will be covered in the lesson, the specific objectives that the lesson will cover, and links to the section's practice problems (if available).

Key Lesson Sections

Headlines - Every lesson is subdivided into mini sections that help you go from "no clue" to "pro, dude". If you keep getting lost skimming the lesson, start from scratch, read through, and you'll be set straight super fast.

Examples - there is no better way to learn than by doing. Some examples are instructional, while others are elective (such examples have their solutions hidden).

Perils and Pitfalls - common mistakes to avoid.

Mister Math Makes it Mean - Here's where I show you how teachers like to pour salt in your exam wounds, when applicable. Certain concepts have common ways in which teachers seek to sink your ship, and I've seen it all!

Put It To The Test - Shows you examples of the most common ways in which the concept is tested. Just knowing the concept is a great first step, but understanding the variation in how a concept can be tested will help you maximize your grades!

Lesson Takeaways - A laundry list of things you should be able to do at lesson end. Make sure you have a lock on the whole list!

Special Notes

Definitions and Theorems: These are important rules that govern how a particular topic works. Some of the more important ones will be used again in future lessons, implicitly or explicitly.

Pro-Tip: Knowing these will make your life easier.

Remember! - Remember notes need to be in your head at the peril of losing points on tests.

You Should Know - Somewhat elective information that may give you a broader understanding.

Warning! - Something you should be careful about.

Return to Lesson