MANAGING OUR DATA!

Managing our data!

Well well well, perfect. We now know about some of the data we can work with in R. We even know that we can really easily take a look at our data by calling them directly by their name (because we are well educated!). Let’s take a look at a couple of way to manage them (e.g. editing, saving, importing).

First of all, let’s take a look at how to look at our data.

The function “print()” in its most simple form does the same thing as calling an object. But it can also offer more refined formatting options, depending on the type of object it’s dealing with.

Example:

bear
     [,1] [,2] [,3] [,4]
[1,] 2.00 2.17 2.57 3.40
[2,] 1.50 1.98 2.14 2.93
[3,] 2.75 3.02 4.44 4.46

print(bear)
     [,1] [,2] [,3] [,4]
[1,] 2.00 2.17 2.57 3.40
[2,] 1.50 1.98 2.14 2.93
[3,] 2.75 3.02 4.44 4.46

But what if our object is really large, like… really really large? We are not going to mess with our beautiful console every time we want to take a look at an object. Example:

randomdata=matrix(0,nrow=200,ncol=20)
randomdata

[insert here a really long data frame that I’m not going to show, because, as a matter of fact, yes, it is really long! But try it in your R session…]

Quite annoying, isn’t it? No problem, we can open our object in a separate window to look a it:

View(randomdata)

Or take a look at the first or last few lines of our data:

head(randomdata)      # First few lines
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]   0    0    0    0    0    0    0    0    0     0     0
[2,]   0    0    0    0    0    0    0    0    0     0     0
[3,]   0    0    0    0    0    0    0    0    0     0     0
[4,]   0    0    0    0    0    0    0    0    0     0     0
[5,]   0    0    0    0    0    0    0    0    0     0     0
[6,]   0    0    0    0    0    0    0    0    0     0     0

     [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,]     0     0     0     0     0     0     0     0     0
[2,]     0     0     0     0     0     0     0     0     0
[3,]     0     0     0     0     0     0     0     0     0
[4,]     0     0     0     0     0     0     0     0     0
[5,]     0     0     0     0     0     0     0     0     0
[6,]     0     0     0     0     0     0     0     0     0
 
tail(randomdata)      # Last few lines
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[195,]   0    0    0    0    0    0    0    0    0     0     0
[196,]   0    0    0    0    0    0    0    0    0     0     0
[197,]   0    0    0    0    0    0    0    0    0     0     0
[198,]   0    0    0    0    0    0    0    0    0     0     0
[199,]   0    0    0    0    0    0    0    0    0     0     0
[200,]   0    0    0    0    0    0    0    0    0     0     0
 
       [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[195,]     0     0     0     0     0     0     0     0     0
[196,]     0     0     0     0     0     0     0     0     0
[197,]     0     0     0     0     0     0     0     0     0
[198,]     0     0     0     0     0     0     0     0     0
[199,]     0     0     0     0     0     0     0     0     0
[200,]     0     0     0     0     0     0     0     0     0

… Tadaa! Tadaa? What? You don’t look impressed. Oh, I know, it’s because you think we could do so much more than just looking at our data, and you already know it? Right? Ok then. You can edit a R object with the function “fix()“, which will open your object in text editor or as a table, that you can modify. As you close the newly opened window, your modifications will be saved. Try to make some modifications to the object “bear”.

fix(bear)

Another option is the function “edit()“, that allows you to use an outside text editor.

As easy it is to create object in R, often our data will already be contained in external files (such as excel files), and too big to type in one by one this way. Fear no more my child, we can import that data in R, so we can do some proper work with it! But first, let’s see how we can save our existing data.

Let’s start with a simple case: tables, matrices, or data frames. How could we ask R to “write” a “table” as an external file? With a smile, and with the function “write.table()“. We just need to specify as arguments the object we want to write, and the name (and possibly location) of the file we want to create.

write.table(bear,"bear.csv", sep = ",")

Setting the argument “sep” to “,” allows us to create a comma separated file that will easily be read in Excel. For a wrapper function dedicated to CSV files, you can also check “write.csv()“.

You can import tables with the help of the function “read.table()“. With a couple of well thought arguments, it’s easier than stealing candy from a baby! (But, why would you do that? Are you such a bad person that you would steal candy from a baby?). You just need to specify the location of the file you want to open. You can even specify if there are column headers in your file, what is the character, series of characters separating each value, or even the type of decimal point (useful when you move from France to US!).

read.table("bear.csv",header=T,sep=",")
  col1 col2 col3 col4
1 2.00 2.17 2.57 3.40
2 1.50 1.98 2.14 2.93
3 2.75 3.02 4.44 4.46

Or even more easily with the wrapper function “read.csv()“.

read.csv("bear.csv")
  col1 col2 col3 col4
1 2.00 2.17 2.57 3.40
2 1.50 1.98 2.14 2.93
3 2.75 3.02 4.44 4.46

As you might have noticed, it just printed the table in the console. To actually include it in our R workspace, we need to assign it to an object (put it in a box, and give it a name).

bear2=read.csv("bear.csv")
bear2

A more general method that will allow you to save any R object so you can load it later is based on two functions, called… “save()” and “load()” reciprocally! Genius, isn’t it? It goes like this:

save(bear, color,shirts,file="OurData.RData")

Once again, hard to make simpler! Just the names of the objects you want to save and the name (and possibly path) for the workspace file to save in (with the extension “.RData”).

Or you can save the whole workspace with:

save.image(file = "OurWholeData.RData")

Conversely, we can load those data with the instruction:

load("OurData.RData")

Oh, and if you are curious about the objects present in your current workspace, you can have access to a list with:

ls()
[1] "awesomeness" "bear" "bear2" "cafeteria" "classes" "color"  "data"       
[8] "fear" "library" "numfac" "randomdata" "Roger" "shirts"   "students"   
[15] "tmp"   "vec"   "vec2"       

And before saving, you can remove specific objects with the function “rm()“. For a specific object:

rm(bear)

or

rm("bear")  # Returns an error message since we already removed this object

For several  objects, you can simply group their names in a list:

rm(list=c("color","shirts"))

Want to start with a fresh new plate?

rm(list=ls())

By the way, by default, R will store and access information in the folder from which the session has been started in. You want to know where your files are being saved to (and read from)? Ask R to get the working directory for you:

getwd()
[1] "C:/Users/fbled/Documents"

You don’t like it? You can change it!

setwd("C:\\Users\\preludeinR\\Work\\R course")

or

setwd("C:/Users/preludeinR/Work/R course")

(note the either doubling of the backslashes or the use of regular slashes, this is a weirdness of R, since it’s using simple blackslashes for specific instructions)

(oh, and don’t be that guy, don’t forget to change the path to the one you want on your computer… Don’t try to save stuff on mine…

INTRODUCTION

Once upon a time…

DATA!

R allows you to handle all kind of data. Here a short description of what you can use and define.

THE BASICS OF THE BASICS

You got to start somewhere! What is R and how does it work? How can I get help?

CONCLUSION

Ready to crank it to 11?