L_EARNing: A First R Session

Let’s make a simple data set (in R parlance, a vector ) consisting of the numbers
1, 2, and 4, and name it x:

> x <- c(1,2,4)

The standard assignment operator in R is <-. You can also use =, but this
is discouraged, as it does not work in some special situations. Note that there
are no fixed types associated with variables. Here, we’ve assigned a vector to
x, but later we might assign something of a different type to it. We’ll look at
vectors and the other types in Section 1.4.
The c stands for concatenate. Here, we are concatenating the numbers
1, 2, and 4. More precisely, we are concatenating three one-element vectors
that consist of those numbers. This is because any number is also considered
to be a one-element vector.
Now we can also do the following:

> q <- c(x,x,8)

which sets q to (1,2,4,1,2,4,8) (yes, including the duplicates).
Now let’s confirm that the data is really in x. To print the vector to the
screen, simply type its name. If you type any variable name (or, more generally,
any expression) while in interactive mode, R will print out the value
of that variable (or expression). Programmers familiar with other languages
such as Python will find this feature familiar. For our example, enter this:

> x
[1] 1 2 4

Yep, sure enough, x consists of the numbers 1, 2, and 4.
Individual elements of a vector are accessed via [ ]. Here’s how we can
print out the third element of x:

> x[3]
[1] 4

As in other languages, the selector (here, 3) is called the index or subscript.
Those familiar with ALGOL-family languages, such as C and C++,
should note that elements of R vectors are indexed starting from 1, not 0.
Subsetting is a very important operation on vectors. Here’s an example:

> x <- c(1,2,4)
> x[2:3]
[1] 2 4

The expression x[2:3] refers to the subvector of x consisting of elements
2 through 3, which are 2 and 4 here.
We can easily find the mean and standard deviation of our data set, as
follows:

> mean(x)
[1] 2.333333
> sd(x)
[1] 1.527525

This again demonstrates typing an expression at the prompt in order
to print it. In the first line, our expression is the function call mean(x). The
return value from that call is printed automatically, without requiring a call
to R’s print() function.
If we want to save the computed mean in a variable instead of just printing
it to the screen, we could execute this code:

> y <- mean(x)

Again, let’s confirm that y really does contain the mean of x:

> y
[1] 2.333333

As noted earlier, we use # to write comments, like this:

> y # print out y
[1] 2.333333

Comments are especially valuable for documenting program code, but
they are useful in interactive sessions, too, since R records the command
history (as discussed in Section 1.6). If you save your session and resume it
later, the comments can help you remember what you were doing.
Finally, let’s do something with one of R’s internal data sets (these are
used for demos). You can get a list of these data sets by typing the following:

> data()

One of the data sets is called Nile and contains data on the flow of the
Nile River. Let’s find the mean and standard deviation of this data set:

> mean(Nile)
[1] 919.35
> sd(Nile)
[1] 169.2275

We can also plot a histogram of the data:

> hist(Nile)

A window pops up with the histogram in it, as shown in Figure 1-1. This
graph is bare-bones simple, but R has all kinds of optional bells and whistles
for plotting. For instance, you can change the number of bins by specifying
the breaks variable. The call hist(z,breaks=12) would draw a histogram
of the data set z with 12 bins. You can also create nicer labels, make use of
color, and make many other changes to create a more informative and eyeappealing
graph. When you become more familiar with R, you’ll be able to
construct complex, rich color graphics of striking beauty.

Well, that’s the end of our first, five-minute introduction to R. Quit R
by calling the q() function (or alternatively by pressing CTRL-D in Linux or
CMD-D on a Mac):

> q()
Save workspace image? [y/n/c]: n

That last prompt asks whether you want to save your variables so that
you can resume work later. If you answer y, then all those objects will be
loaded automatically the next time you run R. This is a very important feature,
especially when working with large or numerous data sets. Answering y
here also saves the session’s command history. We’ll talk more about saving
your workspace and the command history in Section Startup and Shutdown.

L_EARNing

A First R Session

No comments:

Post a Comment