Go though Lab2Assignment.pdf
Lab2Assignment.pdf (Yup, 2 at middle.)
Download and open Lab2Assignment.Rmd
Lab2Assignment.Rmd in RStudio and run codes inside.
Download, open LabAssignment2_otg.Rmd
Lab2Assignment_otg.Rmd and save as a new Rmd file for your Lab Assignment submission. Knit this Rmd file to see if you can make a html, pdf or words. You should be able to get a document. If you couldn’t, just use RStudio Cloud or the Binder RStudio in the lab note homepage. Po likes RStudio. Everything you need is on the cloud.
Run/add/edit/delete the codes in your own copy of Lab2Assignment_otg.Rmd
and add/edit/delete some writings. Knit to see how the .pdf looks like.
Tidy up the codes and writings. Knit and submit!
Follow the below instructions to make yourself comfortable in using R/RStudio for statistical/scientific computing.
Working directory is the default folder in which files are saved and read.
In the Files tab, find/create a folder for your STA100 work. Click More then Set As Working Directory.
Open the R script you worked on last week. Run the codes.
Working directory is not persistent after you you close the R/RStudio. Remember to set working directory each time you open a R/RStudio.
R has many built-in functions to read data, for example, read.table()
.
Let’s try to read the infamous Fisher’s Iris data set.
Go to the UCI Machine Learning Repository page of Iris Data Set. Click the link Data Folder and then download the two files iris.data
and iris.names
, and then save them into your working directory. T
lab2_data <- read.table("iris.data", sep = ",")
Good job! You have load the data into R. Look the the Environment tab. You can see we have just created a variable named lab2_data
. Click on it and peek what is inside!
Ops, what are the columns V1
, V2
, V3
, V4
and V5
? Open then text file iris.names
and read the description! They are actually
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica
Rename the columns:
colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")
Check lab2_data
again to see the change.
lab2_data <- read.table("iris.data", sep = ",")
colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")
into your R script and save it! Now you can shut down your computer and go out to enjoy the sunshine! See you all next week!
To access a column in a data frame, for example the sepal_length
column in lab2_data
, type lab2_data$sepal_length
or lab2_data["sepal_length"]
.
R has lots of built-in functions for statistics. For example, the function summary()
show you:
summary(lab2_data)
## sepal_length sepal_width petal_length petal_width class
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 Length:150
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Class :character
## Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mode :character
## Mean :5.843 Mean :3.054 Mean :3.759 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Let’s compute the mean and standard deviation of “sepal length” with functions median
, mean()
and sd()
:
median(lab2_data$sepal_length)
## [1] 5.8
mean(lab2_data$sepal_length)
## [1] 5.843333
sd(lab2_data$sepal_length)
## [1] 0.8280661
Let’s plot a histogram.
hist(lab2_data$sepal_length,
main = "Iris Data Set: Sepal Length in cm",
xlab = "cm")
mu <- mean(lab2_data$sepal_length)
sigma <- sd(lab2_data$sepal_length)
standardized_sepal_length <- (lab2_data$sepal_length - mu)/sigma
par(mfrow=c(2,1))
hist(lab2_data$sepal_length,
breaks = seq(mu-3*sigma, mu+3*sigma,length.out=13),
main = "Iris Data Set: Sepal Length in cm",
xlab = "cm")
hist(standardized_sepal_length,
breaks = seq(-3, 3, length.out=13),
main = "Iris Data Set: Sepal Length standardized",
xlab = "Sepal Length standardized")
Read the R Documentation of the functions you learn today. Enter the following codes, one by one, and read the corresponding documentation/help in the Help tab.
?read.table
?colnames
?summary
?median
?mean
?sd
?hist
?par
How Po knows all those built-in functions in R?
Way 1: Google it
Way 2: Read R Help/Documentation
Give a man a fish and you feed him for a day; Teach a man to fish and you feed him for a lifetime.
Po | Codes in labs | R Documentation | |
---|---|---|---|
Fisherman | Fish | Fish Rod | Sea |
Interactive R course lessons swirl https://swirlstats.com/students.html
Install swirl with the following commands.
# If you haven't installed the swirl package yet
install.packages("swirl")
library(swirl)
install_course_github("swirldev", "R_Programming_E")
After swirl is installed (you only need to do it once), you can start/resume the R tutorial with the commands:
library(swirl)
swirl()
Schedule some time to go though all the lessons by yourself! Almost all of the R codes you need to know are here in those lessons.
If you prefer more static readings (Po did read them to learn R!):
If you like RStudio’s data science things,