UnderSampling the dataset using R

diabetes <- read.csv(“diabetes-dataset.csv”, sep = “,”, header = TRUE)
Summary of dataset
diabetes$Outcome <- as.factor(diabetes$Outcome)
Summary of dataset post factorisation of Output variable
#holds instances where outcome is 1
diabetes_true <- diabetes[(diabetes$Outcome == 1), ]
#holds instances where outcome is 0
diabetes_false <- diabetes[(diabetes$Outcome == 0), ]
#UnderSampling the data for biasing the outcome
diabetes_false <-
diabetes_false[sample(nrow(diabetes_false),1026, replace = FALSE, prob = NULL),]
diabetes_final <- rbind(diabetes_true,diabetes_false)
Before Sampling (Biased Data)
After Sampling

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Greeshma Lakshmi

Greeshma Lakshmi

An enthusiast in the field of Data Science and Technology