# F-Test: Compare Two Variances in R

**F-test**is used to assess whether the

**variances**of two populations (A and B) are equal.

**Contents**

## When to you use the F-test?

Comparing two variances is useful in several cases, including:

When you want to perform a two samples t-test to check the equality of the variances of the two samples

When you want to compare the variability of a new measurement method to an old one. Does the new method reduce the variability of the measure?

## Research questions and statistical hypotheses

Typical research questions are:

- whether the variance of group A (\(\sigma^2_A\))
*is equal*to the variance of group B (\(\sigma^2_B\))? - whether the variance of group A (\(\sigma^2_A\))
*is less than*the variance of group B (\(\sigma^2_B\))? - whether the variance of group A (\(\sigma^2_A\))
*is greather than*the variance of group B (\(\sigma^2_B\))?

In statistics, we can define the corresponding *null hypothesis* (\(H_0\)) as follow:

- \(H_0: \sigma^2_A = \sigma^2_B\)
- \(H_0: \sigma^2_A \leq \sigma^2_B\)
- \(H_0: \sigma^2_A \geq \sigma^2_B\)

The corresponding *alternative hypotheses* (\(H_a\)) are as follow:

- \(H_a: \sigma^2_A \ne \sigma^2_B\) (different)
- \(H_a: \sigma^2_A > \sigma^2_B\) (greater)
- \(H_a: \sigma^2_A < \sigma^2_B\) (less)

Note that:

- Hypotheses 1) are called
**two-tailed tests** - Hypotheses 2) and 3) are called
**one-tailed tests**

## Formula of F-test

The test statistic can be obtained by computing the ratio of the two variances \(S_A^2\) and \(S_B^2\).

\[F = \frac{S_A^2}{S_B^2}\]

The degrees of freedom are \(n_A - 1\) (for the numerator) and \(n_B - 1\) (for the denominator).

Note that, the more this ratio deviates from 1, the stronger the evidence for unequal population variances.

Note that, the F-test requires the two samples to be normally distributed.

## Compute F-test in R

### R function

The R function **var.test**() can be used to compare two variances as follow:

```
# Method 1
var.test(values ~ groups, data,
alternative = "two.sided")
# or Method 2
var.test(x, y, alternative = "two.sided")
```

**x,y**: numeric vectors**alternative**: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.

### Import and check your data into R

To import your data, use the following R code:

```
# If .txt tab file, use this
my_data <- read.delim(file.choose())
# Or, if .csv file, use this
my_data <- read.csv(file.choose())
```

Here, we’ll use the built-in R data set named ToothGrowth:

```
# Store the data in the variable my_data
my_data <- ToothGrowth
```

To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function **sample_n**()[in **dplyr** package]:

```
library("dplyr")
sample_n(my_data, 10)
```

```
len supp dose
43 23.6 OJ 1.0
28 21.5 VC 2.0
25 26.4 VC 2.0
56 30.9 OJ 2.0
46 25.2 OJ 1.0
7 11.2 VC 0.5
16 17.3 VC 1.0
4 5.8 VC 0.5
48 21.2 OJ 1.0
37 8.2 OJ 0.5
```

We want to test the equality of variances between the two groups OJ and VC in the column “supp”.

### Preleminary test to check F-test assumptions

F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.

Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use **Q-Q plot** (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.

If there is doubt about normality, the better choice is to use **Levene’s test** or **Fligner-Killeen test**, which are less sensitive to departure from normal assumption.

### Compute F-test

```
# F-test
res.ftest <- var.test(len ~ supp, data = my_data)
res.ftest
```

```
F test to compare two variances
data: len by supp
F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3039488 1.3416857
sample estimates:
ratio of variances
0.6385951
```

### Interpretation of the result

**F-test** is p = 0.2331433 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.

### Access to the values returned by var.test() function

The function **var.test**() returns a list containing the following components:

**statistic**: the value of the F test statistic.**parameter**: the degrees of the freedom of the F distribution of the test statistic.**p.value**: the p-value of the test.**conf.int**: a confidence interval for the ratio of the population variances.**estimate**: the ratio of the sample variances

The format of the **R** code to use for getting these values is as follow:

```
# ratio of variances
res.ftest$estimate
```

```
ratio of variances
0.6385951
```

```
# p-value of the test
res.ftest$p.value
```

`[1] 0.2331433`

## Infos

This analysis has been performed using **R software** (ver. 3.3.2).

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

## Recommended for You!

## Recommended for you

This section contains best data science and self-development resources to help you on your path.

### Coursera - Online Courses and Specialization

#### Data science

- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University

#### Popular Courses Launched in 2020

- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services

#### Trending Courses

- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts

### Books - Data Science

#### Our Books

- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

#### Others

- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet