R and RStudio continue

Follow the below instructions to make yourself comfortable in using R/RStudio for statistical/scientific computing.

Working directory

Working directory is the default folder in which files are saved and read.

In the Files tab, find/create a folder for your STA100 work. Click More then Set As Working Directory.
Open the R script you worked on last week. Run the codes.
Working directory is not persistent after you you close the R/RStudio. Remember to set working directory each time you open a R/RStudio.

Read data

R has many built-in functions to read data, for example, read.table().

Let’s try to read the infamous Fisher’s Iris data set.
Go to the UCI Machine Learning Repository page of Iris Data Set. Click the link Data Folder and then download the two files iris.data and iris.names, and then save them into your working directory. T

lab2_data <- read.table("iris.data", sep = ",")

Good job! You have load the data into R. Look the the Environment tab. You can see we have just created a variable named lab2_data. Click on it and peek what is inside!

Ops, what are the columns V1, V2, V3, V4 and V5? Open then text file iris.names and read the description! They are actually

   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

Rename the columns:

colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

Check lab2_data again to see the change.

Put the codes

lab2_data <- read.table("iris.data", sep = ",")
colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

into your R script and save it! Now you can shut down your computer and go out to enjoy the sunshine! See you all next week!

Let’s do some statistics!

To access a column in a data frame, for example the sepal_length column in lab2_data, type lab2_data$sepal_length or lab2_data["sepal_length"].

R has lots of built-in functions for statistics. For example, the function summary() show you:

summary(lab2_data)

##   sepal_length    sepal_width     petal_length    petal_width       class          
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   Length:150        
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   Class :character  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300   Mode  :character  
##  Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.199                     
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                     
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500

Let’s compute the mean and standard deviation of “sepal length” with functions median, mean() and sd():

median(lab2_data$sepal_length)

## [1] 5.8

mean(lab2_data$sepal_length)

## [1] 5.843333

sd(lab2_data$sepal_length)

## [1] 0.8280661

Let’s plot a histogram.

hist(lab2_data$sepal_length,
     main = "Iris Data Set: Sepal Length in cm",
     xlab = "cm")

Let’s standardize the sepal length

mu <- mean(lab2_data$sepal_length)
sigma <- sd(lab2_data$sepal_length)
standardized_sepal_length <- (lab2_data$sepal_length - mu)/sigma

par(mfrow=c(2,1))
hist(lab2_data$sepal_length,
     breaks = seq(mu-3*sigma, mu+3*sigma,length.out=13),
     main = "Iris Data Set: Sepal Length in cm",
     xlab = "cm")
hist(standardized_sepal_length,
     breaks = seq(-3, 3, length.out=13),
     main = "Iris Data Set: Sepal Length standardized",
     xlab = "Sepal Length standardized")

Help/Documentation

Read the R Documentation of the functions you learn today. Enter the following codes, one by one, and read the corresponding documentation/help in the Help tab.

?read.table

?colnames

?summary

?median

?mean

?sd

?hist

?par

How Po knows all those built-in functions in R?

Way 1: Google it
Way 2: Read R Help/Documentation

Give a man a fish and you feed him for a day; Teach a man to fish and you feed him for a lifetime.

Po	Codes in labs	Google	R Documentation
Fisherman	Fish	Fish Rod	Sea

STA100 Lab4

Po

4/20/2021

Lab 2 Assignment (Lab projects?)

R and RStudio continue

Working directory

Read data

Let’s do some statistics!

Help/Documentation

More R tutorial