R and RStudio

Follow the below instructions to make yourself comfortable in using R/RStudio for statistical/scientific computing.

Scientific computing

  • In Console, type and enter 1+1 and you will get

    1+1
    ## [1] 2
  • Try 2.1*(3.7+4.9)/5.2, sin(pi/3), exp(0.8), log(1.0) and anything. You can now throw away your TI-who-cares calculator!

Working directory

Working directory is the default folder in which files are saved and read.

  • In the Files tab, find/create a folder for your STA100 work. Click More then Set As Working Directory.

  • Download the template.Rmd in Files/templates at STA100 Canvas site and save it into your working directory. Then, open it by clicking on the template.Rmd showed in Files tab.

  • Knit the template.Rmd into .pdf and .html respectively. The .pdf and .html file will appear alongside the template.Rmd.

R scripts

  • Click the “little green plus icon” and select R Script. Then you obtain a notebook that for your to put R codes. Type some codes, for example

    x <- 10.5
    y <- 8.1
    z <- x + y
    print(z)
    ## [1] 18.6

    Click Source to and see the result 18.6 in Console.

  • You can also Run line by line. Put your cursor to the line your want to run and click Run. You can also use hot-key Command+Enter in mac (or Control+Enter in Windows?).

  • Click the floppy icon to save your R script in your working directory. Usually we save R scripts with file extension .R, for example, the_best_script_ever.R. Now you can shut down your computer and go out to enjoy the sunshine! See you all next week!

  • Next week, open your script and work on it again.

Read data

R has many built-in functions to read data, for example, read.table().

  • Let’s try to read the infamous Fisher’s Iris data set.

  • Go to the UCI Machine Learning Repository page of Iris Data Set. Click the link Data Folder and then download the two files iris.data and iris.names, and then save them into your working directory. T

    lab2_data <- read.table("iris.data", sep = ",")

    Good job! You have load the data into R. Look the the Environment tab. You can see we have just created a variable named lab2_data. Click on it and peek what is inside!

  • Ops, what are the columns V1, V2, V3, V4 and V5? Open then text file iris.names and read the description! They are actually

       1. sepal length in cm
       2. sepal width in cm
       3. petal length in cm
       4. petal width in cm
       5. class: 
          -- Iris Setosa
          -- Iris Versicolour
          -- Iris Virginica

    Rename the columns:

    colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

    Check lab2_data again to see the change.

  • Put the codes

    lab2_data <- read.table("iris.data", sep = ",")
    colnames(lab2_data) <- c("sepal_length", "sepal_width", "petal_length", "petal_width", "class")

    into your R script and save it! Now you can shut down your computer and go out to enjoy the sunshine! See you all next week!

Mean and standard deviation

  • To access a column in a data frame, for example the sepal_length column in lab2_data, type lab2_data$sepal_length or lab2_data["sepal_length"].

  • Let’s compute the mean and standard deviation of “sepal length” with functions mean() and sd():

    mean(lab2_data$sepal_length)
    ## [1] 5.843333
    sd(lab2_data$sepal_length)
    ## [1] 0.8280661
  • R has lots of built-in functions for statistics. For example, the function summary() show you:

    summary(lab2_data)
    ##   sepal_length    sepal_width     petal_length    petal_width       class          
    ##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   Length:150        
    ##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   Class :character  
    ##  Median :5.800   Median :3.000   Median :4.350   Median :1.300   Mode  :character  
    ##  Mean   :5.843   Mean   :3.054   Mean   :3.759   Mean   :1.199                     
    ##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                     
    ##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500

Histogram

  • Let’s plot a histogram.

    hist(lab2_data$sepal_length,
         main = "Iris Data Set: Sepal Length in cm",
         xlab = "cm")

Help/Documentation

Read the R Documentation of the functions you learn today. Enter the following codes, one by one, and read the corresponding documentation/help in the Help tab.

?sin
?exp
?log
?read.table
?colnames
?mean
?sd
?summary
?hist

How Po knows all those built-in functions in R?

  • Way 1: Google it

  • Way 2: Read R Help/Documentation

Give a man a fish and you feed him for a day; Teach a man to fish and you feed him for a lifetime.

Po Codes in labs Google R Documentation
Fisherman Fish Fish Rod Sea