Load Dataset

We will continue using the simulated dataset credit for demonstration. The dataset contains the information of a large number of credit card holders1.

credit <- read.csv('https://albums.yuanting.lu/sta126/data/credit.csv')

Dataset Overview

  1. How large is this dataset?
dim(credit) # The dimension (rows by columns) of the dataset
  1. What are the variables (columns) in the dataset?
names(credit)
  1. Which of the variables are categorical? Which ones are numerical? We can take a peek of the dataset by viewing its first a few rows (e.g., first 10 rows).
head(credit, 10) # The head function shows the first a few rows of the dataset.

Dataset Summary

  1. Create a frequency table for a variable.2
table(credit$Student) 

To convert the frequency table to a relative frequency table, we can divide the numbers in the table by the total number of customers in the data set. If you remember the total number of customers is 400, do table(credit$Student) / 400. Otherwise, we can use the length function to calculate the total so that we do table(credit$Student) / length(credit$Student).

  1. Create a pie chart.
freq <- table(credit$Student) 
pie(freq, main = "Student Status")

  • The code is equivalent to pie(table(credit$Sstudent)).
  • The arrow sign (<-) in the first line of this code chunk deposits contents on the right-hand side of the arrow (i.e., the student status frequency table) to R variable on the left-hand side of the arrow (i.e., freq), so that from now on if we need the frequency table, we simply call freq instead of typing table(credit$Student) over and over again.
  • The option main = “Student Status” adds a title to the graph.
  1. Create a bargraph. We have already assigned freq to be the frequency table in the previous code chunk, so we can continue using it.
barplot(freq, main = "Student Status")

Practice Make the y-axis display relative frequency.


[Plus] Graphic Skills +

  • Customized colors.
    • The col = c(“lightblue”, “lightcoral”) option provides two colors to the graph.
    • The c function combines its arguments (e.g., “lightblue”, “lightcoral”) and creates a list.
  • Customized group tags.
    • The labels option allows us to change the group tags in a pie chart.
    • The names option allows us to change the group tags in a bar graph.
  • Customize text labels on the y-axis.
    • The ylab option allows us to change the text label on the y-axis.
  • Use lines to shade bars and slices.
    • The density option provides the density of line segments to each bar or slice.
    • The angle option sets the angles of the line segments (in degrees).
freq <- table(credit$Student) / length(credit$Student)
pie(freq, 
    col = c("lightblue", "lightcoral"), 
    labels = c("Non-student", "Student"), 
    main = "Student Status")

barplot(freq, 
        density = c(20, 10), 
        angle = 60, 
        names = c("Non-student", "Student"), 
        ylab = "Relative Frequency", 
        main = "Student Status") 



  1. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. (2013). An introduction to statistical learning : with applications in R. New York: Springer.↩︎

  2. summary(credit$Student) also works. The summary and table function are equivalent for categorical variables.↩︎