Naive Bayes Classifier Essentials
The Naive Bayes classifier is a simple and powerful method that can be used for binary and multiclass classification problems.
Naive Bayes classifier predicts the class membership probability of observations using Bayes theorem, which is based on conditional probability, that is the probability of something to happen, given that something else has already occurred.
Observations are assigned to the class with the largest probability score.
In this chapter, you’ll learn how to perform naive Bayes classification in R using the klaR
and caret
package.
Contents:
Loading required R packages
tidyverse
for easy data manipulation and visualizationcaret
for easy machine learning workflow
library(tidyverse)
library(caret)
Preparing the data
The input predictor variables can be categorical and/or numeric variables.
Here, we’ll use the PimaIndiansDiabetes2
[in mlbench
package], introduced in Chapter @ref(classification-in-r), for predicting the probability of being diabetes positive based on multiple clinical variables.
We’ll randomly split the data into training set (80% for building a predictive model) and test set (20% for evaluating the model). Make sure to set seed for reproducibility.
# Load the data and remove NAs
data("PimaIndiansDiabetes2", package = "mlbench")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
# Inspect the data
sample_n(PimaIndiansDiabetes2, 3)
# Split the data into training and test set
set.seed(123)
training.samples <- PimaIndiansDiabetes2$diabetes %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- PimaIndiansDiabetes2[training.samples, ]
test.data <- PimaIndiansDiabetes2[-training.samples, ]
Computing Naive Bayes
library("klaR")
# Fit the model
model <- NaiveBayes(diabetes ~., data = train.data)
# Make predictions
predictions <- model %>% predict(test.data)
# Model accuracy
mean(predictions$class == test.data$diabetes)
## [1] 0.821
Using caret R package
The caret
R package can automatically train the model and assess the model accuracy using k-fold cross-validation Chapter @ref(cross-validation).
library(klaR)
# Build the model
set.seed(123)
model <- train(diabetes ~., data = train.data, method = "nb",
trControl = trainControl("cv", number = 10))
# Make predictions
predicted.classes <- model %>% predict(test.data)
# Model n accuracy
mean(predicted.classes == test.data$diabetes)
Discussion
This chapter introduces the basics of Naive Bayes classification and provides practical examples in R using the klaR
and caret
package.