Lab 2 Assignment (Lab projects?)

  1. Go though Lab2Assignment.pdf Lab2Assignment.pdf (Yup, 2 at middle.)

  2. Download and open Lab2Assignment.Rmd Lab2Assignment.Rmd in RStudio and run codes inside.

  3. Download, open LabAssignment2_otg.Rmd Lab2Assignment_otg.Rmd and save as a new Rmd file for your Lab Assignment submission. Knit this Rmd file to see if you can make a html, pdf or words. You should be able to get a document. If you couldn’t, just use RStudio Cloud or the Binder RStudio in the lab note homepage. Po likes RStudio. Everything you need is on the cloud.

  4. Run/add/edit/delete the codes in your own copy of Lab2Assignment_otg.Rmd and add/edit/delete some writings. Knit to see how the .pdf looks like.

  5. Tidy up the codes and writings. Knit and submit!

R and RStudio continue

Follow the below instructions to make yourself comfortable in using R/RStudio for statistical/scientific computing.

Working directory

Working directory is the default folder in which files are saved and read.

  • In the Files tab, find/create a folder for your STA100 work. Click More then Set As Working Directory.

  • Open the R script you worked on last week. Run the codes.

  • Working directory is not persistent after you you close the R/RStudio. Remember to set working directory each time you open a R/RStudio.

Read data

R has many built-in functions to read data, for example, read.table().

lab2_data <- read.table("iris.data", sep = ",")

Good job! You have load the data into R. Look the the Environment tab. You can see we have just created a variable named lab2_data. Click on it and peek what is inside!

  • Ops, what are the columns V1, V2, V3, V4 and V5? Open then text file iris.names and read the description! They are actually

       1. sepal length in cm
       2. sepal width in cm
       3. petal length in cm
       4. petal width in cm
       5. class: 
          -- Iris Setosa
          -- Iris Versicolour
          -- Iris Virginica

    Rename the columns:

colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

Check lab2_data again to see the change.

  • Put the codes
lab2_data <- read.table("iris.data", sep = ",")
colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

into your R script and save it! Now you can shut down your computer and go out to enjoy the sunshine! See you all next week!

Let’s do some statistics!

  • To access a column in a data frame, for example the sepal_length column in lab2_data, type lab2_data$sepal_length or lab2_data["sepal_length"].

  • R has lots of built-in functions for statistics. For example, the function summary() show you:

    summary(lab2_data)
    ##   sepal_length    sepal_width     petal_length    petal_width       class          
    ##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   Length:150        
    ##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   Class :character  
    ##  Median :5.800   Median :3.000   Median :4.350   Median :1.300   Mode  :character  
    ##  Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.199                     
    ##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                     
    ##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500
  • Let’s compute the mean and standard deviation of “sepal length” with functions median, mean() and sd():

    median(lab2_data$sepal_length)
    ## [1] 5.8
    mean(lab2_data$sepal_length)
    ## [1] 5.843333
    sd(lab2_data$sepal_length)
    ## [1] 0.8280661
  • Let’s plot a histogram.

hist(lab2_data$sepal_length,
     main = "Iris Data Set: Sepal Length in cm",
     xlab = "cm")

  • Let’s standardize the sepal length
mu <- mean(lab2_data$sepal_length)
sigma <- sd(lab2_data$sepal_length)
standardized_sepal_length <- (lab2_data$sepal_length - mu)/sigma

par(mfrow=c(2,1))
hist(lab2_data$sepal_length,
     breaks = seq(mu-3*sigma, mu+3*sigma,length.out=13),
     main = "Iris Data Set: Sepal Length in cm",
     xlab = "cm")
hist(standardized_sepal_length,
     breaks = seq(-3, 3, length.out=13),
     main = "Iris Data Set: Sepal Length standardized",
     xlab = "Sepal Length standardized")

Help/Documentation

Read the R Documentation of the functions you learn today. Enter the following codes, one by one, and read the corresponding documentation/help in the Help tab.

?read.table
?colnames
?summary
?median
?mean
?sd
?hist
?par

How Po knows all those built-in functions in R?

  • Way 1: Google it

  • Way 2: Read R Help/Documentation

Give a man a fish and you feed him for a day; Teach a man to fish and you feed him for a lifetime.

Po Codes in labs Google R Documentation
Fisherman Fish Fish Rod Sea

More R tutorial