One-Proportion Z-Test in R

What is one-proportion Z-test?
Research questions and statistical hypotheses
Formula of the test statistic
Compute one proportion z-test in R
See also
Infos

What is one-proportion Z-test?

The One proportion Z-test is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of one-proportion z-test and provides practical examples using R software.

For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.

We want to know, whether the cancer affects more male than female?

In this setting:

the number of successes (male with cancer) is 95
The observed proportion (\(p_o\)) of male is 95/160
The observed proportion (\(q\)) of female is \(1 - p_o\)
The expected proportion (\(p_e\)) of male is 0.5 (50%)
The number of observations (\(n\)) is 160

One Proportion Z-Test in R

Research questions and statistical hypotheses

Typical research questions are:

whether the observed proportion of male (\(p_o\)) is equal to the expected proportion (\(p_e\))?
whether the observed proportion of male (\(p_o\)) is less than the expected proportion (\(p_e\))?
whether the observed proportion of male (\(p\)) is greater than the expected proportion (\(p_e\))?

In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:

\(H_0: p_o = p_e\)
\(H_0: p_o \leq p_e\)
\(H_0: p_o \geq p_e\)

The corresponding alternative hypotheses (\(H_a\)) are as follow:

\(H_a: p_o \ne p_e\) (different)
\(H_a: p_o > p_e\) (greater)
\(H_a: p_o < p_e\) (less)

Note that:

Hypotheses 1) are called two-tailed tests
Hypotheses 2) and 3) are called one-tailed tests

Formula of the test statistic

The test statistic (also known as z-test) can be calculated as follow:

\[ z = \frac{p_o-p_e}{\sqrt{p_oq/n}} \]

where,

\(p_o\) is the observed proportion
\(q = 1-p_o\)
\(p_e\) is the expected proportion
\(n\) is the sample size

if \(|z| < 1.96\), then the difference is not significant at 5%
if \(|z| \geq 1.96\), then the difference is significant at 5%
The significance level (p-value) corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.

The confidence interval of \(p_o\) at 95% is defined as follow:

\[ p_o \pm 1.96\sqrt{\frac{p_oq}{n}} \]

Note that, the formula of z-statistic is valid only when sample size (\(n\)) is large enough. \(np_o\) and \(nq\) should be \(\geq\) 5. For example, if \(p_o = 0.1\), then \(n\) should be at least 50.

Compute one proportion z-test in R

R functions: binom.test() & prop.test()

The R functions binom.test() and prop.test() can be used to perform one-proportion test:

binom.test(): compute exact binomial test. Recommended when sample size is small
prop.test(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial

The syntax of the two functions are exactly the same. The simplified format is as follow:

binom.test(x, n, p = 0.5, alternative = "two.sided")
prop.test(x, n, p = NULL, alternative = "two.sided",
          correct = TRUE)

x: the number of of successes
n: the total number of trials
p: the probability to test against.
correct: a logical indicating whether Yates’ continuity correction should be applied where possible.

Note that, by default, the function prop.test() used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument correct = FALSE in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)

Compute one-proportion z-test

We want to know, whether the cancer affects more male than female?

We’ll use the function prop.test()

res <- prop.test(x = 95, n = 160, p = 0.5, 
                 correct = FALSE)
# Printing the results
res


    1-sample proportions test without continuity correction
data:  95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5163169 0.6667870
sample estimates:
      p 
0.59375

The function returns:

the value of Pearson’s chi-squared test statistic.
a p-value
a 95% confidence intervals
an estimated probability of success (the proportion of male with cancer)

Note that:

if you want to test whether the proportion of male with cancer is less than 0.5 (one-tailed test), type this:

prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
           alternative = "less")

Or, if you want to test whether the proportion of male with cancer is greater than 0.5 (one-tailed test), type this:

prop.test(x = 95, n = 160, p = 0.5, correct = FALSE,
              alternative = "greater")

Interpretation of the result

The p-value of the test is 0.01771, which is less than the significance level alpha = 0.05. We can conclude that the proportion of male with cancer is significantly different from 0.5 with a p-value = 0.01771.

Access to the values returned by prop.test()

The result of prop.test() function is a list containing the following components:

statistic: the number of successes
parameter: the number of trials
p.value: the p-value of the test
conf.int: a confidence interval for the probability of success.
estimate: the estimated probability of success.

The format of the R code to use for getting these values is as follow:

# printing the p-value
res$p.value

[1] 0.01770607

# printing the mean
res$estimate

      p 
0.59375

# printing the confidence interval
res$conf.int

[1] 0.5163169 0.6667870
attr(,"conf.level")
[1] 0.95

Infos

This analysis has been performed using R software (ver. 3.2.4).

Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

Recommended for You!

Machine Learning Essentials: Practical Guide in R

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

R Graphics Essentials for Great Data Visualization

Network Analysis and Visualization in R

More books on R and data science

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Want to Learn More on R Programming and Data Science?

Follow us by Email On Social Networks:

Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (top-right position of this page)