Intro to Hypothesis Testing with MATLAB
24 Sep 2008 Rob Slazas 3 comments 2085 views
This is the first post on Hypothesis Testing. Click here to see the full list of statistics posts.
Hypothesis testing is one of the fundamental tools of experimentation. It is also sometimes called difference testing, because the question underlying most experiments is whether (or not) there is a difference between the behavior of two groups. In this post we’ll get familiar with the moving parts of a hypothesis test, and survey the tests that MATLAB and the Statistics Toolbox have to offer.
Contents
- Contemplating Quan
- The Basic Setup of A Hypothesis Test
- Picking the Right Test
- Am I Really Better Than Quan In Long Jumping?
- Wrapping Up
Contemplating Quan

So I was thinking about challenging Quan to another contest, but this time in the standing long jump. I’ve heard he is good, but I’m much better at long jumping than holding my breath. I have some inside information from Dan that Quan averages 180cm per jump and that his jumps are normally distributed. I took some practice jumps and found the following: my jumps are normally distributed too; and they look like this (download the data file here).
load robpracticejumps.mat; h1 = figure('Position',[100 100 300 300],'Color','w'); h2 = scatter(ones(numel(robjumps),1),robjumps); hold all; plot([0 2],[180 180],'g-','linewidth',2); set(gca,'ylim',[140 220],'xtick',[]); grid on; title('Rob''s jumps vs. Quan''s mean','fontweight','bold'); xlabel('Jumps'); ylabel('distance (cm)');

Looking at this data, can I conclude that I’m better than Quan? Well, a simple analysis might be to compare the two means. In this case it looks like my mean is higher. But, by reaching the conclusion based upon the means alone I wouldn’t know how confident I can be about it. How much higher than Quan’s mean should mine be to conclude that it really is higher, and not due to random chance in our sample jumps? So, to build in some scientific rigor, I need to use another method. Yes, you guessed right. It’s time for a hypothesis test.
The Basic Setup of A Hypothesis Test
When you want to test your hypothesis that there is (or isn’t) a difference between the behavior of 2 groups, a standardized method to do that has already been devised. Here are the players:
| H0 = the null hypothesis | H0 is the assumed conclusion unless the test can prove otherwise. For most tests, H0 is that the two groups are equal. If you prove that they are not equal, then you get to “reject the null hypothesis”. In our example, accepting H0 means that my average jump is not statistically different than Quan’s. |
| H1 = the alternative hypothesis | H1 is what you conclude to be true IF you can reject H0. For most tests, H1 is that the two groups are not equal. H1 can also be made more specific, such as group1 > group2 or group1 < group2. In our example, rejecting H0 and accepting H1 means that my average jump IS statistically different than Quan's. |
| alpha = the risk of incorrectly rejecting H0 | Before performing the test, you need to decide how much risk of being wrong you can accept. alpha describes that risk and ranges from 0 to 1. The closer to 0 alpha is, the less chance there is of rejecting H0 when you should not reject it. The trade-off is that lower values of alpha require more samples to test, so they are more expensive experiments. The default for most tests is 0.05, or 5% chance of this error. |
| Side note on risks | There is a second, complimentary risk described by beta. It is taken care of when determining sample size, not a direct input into the hypothesis test. We’ll cover alpha, beta, and sample size more deeply in a later post. |
| The assumptions | Many hypothesis tests make assumptions about the groups being tested, such as what their underlying distribution is. This is both valuable and limiting. Valuable since the test can be more powerful with less samples if it leverages what is already known about a distribution. Limiting since it only applies to groups from the assumed distribution. Be careful not to apply tests to groups of data that don’t fit the assumptions! |
Picking the Right Test
MATLAB offers many hypothesis tests (some rather obscure), each designed to answer a different kind of question. Here is a guide to the most commonly used tests so we can select the proper one for our question above: On average, am I better at the standing long jump than Quan?
| Answers this question | Assumptions | Hypothesis Test |
| Do the values in this data group appear in random order? | none | runstest |
| Does this data group come from a Standard Normal distribution? | distribution parameters are known (not estimated) | kstest- Kolmogorov-Smirnov normality test |
| Does this data group come from a Normal distribution? | none | lillietest- Lilliefors normality test |
| Is the mean of this data different than a given mean? | data is Normally distributed, variance of data is unknown | ttest - one sample t-test |
| Does the mean of this data differ before/after a treatment? | data is Normally distributed, variance of data is unknown | ttest- paired t-test |
| Are the means of these two groups of data different? | data is Normally distributed, variance of data is unknown | ttest2 - two sample t-test |
| Is the variance of this data different than a given variance? | data is Normally distributed | vartest - one sample chi-square test |
| Are the variances of these two groups of data different? | data is Normally distributed | vartest2- two sample F-test |
One notable omission is the Anderson-Darling normality test. It is a commonly used test to see if a sample comes from a normally distributed population. File exchange to the rescue! Credit goes to Antonio Trujillo-Ortiz for supplying this important function.
And for completeness, since I only listed the most common tests above, here is a list of all the hypothesis tests in the Statistics Toolbox. You’ll find it is pretty extensive.
Am I Really Better Than Quan In Long Jumping?
Now that we have reviewed the basics of hypothesis tests, which test did you select to answer our question? If you picked the one sample t-test, nice job! We want to compare the mean of my jump data to the given mean of Quan’s jumps.
To call the ttest function for a one sample t-test, we need our data (got it), the given mean (got it), and the alpha risk we can tolerate. For this test we will accept 10% risk, more than the default of 5%, so alpha will be 0.10 in the input arguments.
[h,p,ci] = ttest(robjumps,180,0.10)
h =
1
p =
0.0692
ci =
180.3339
186.3661
In the output, h = 1 means that we accept the alternative hypothesis H1 (rejecting the null H0). Optionally, little p and the confidence interval around my mean are displayed. In this case you can say with (1 - alpha = 90%) confidence that my average long jump is between 180.3cm and 186.4cm. So, on average, I am slightly better than Quan at the standing long jump! WOO-HOO!
Wrapping up
This overview of Hypothesis Testing should get you started. If you are using a test for the first time, please take a quick look at your stats book or the MATLAB help to check the assumptions and usage. If there are some specifics of the tests you would like to see covered in a future post, please let us know in the comments below.
3 Responses to “Intro to Hypothesis Testing with MATLAB”
Leave a Reply
Include MATLAB code in your comment by doing the following:
<pre lang="MATLAB">
%insert code here
</pre>

There is a typo in your link to the data file.
Antasi, nice catch. Quan beat me to the fix!
[...] before because they write great introductory tutorials on how to use MATLAB in various fields. Their latest one, written by Rob Slazas, takes a look at how you can use some of the functions in the MATLAB [...]