- Linda Stewart
How to Use Sample Data Sets: Step Four in Learning R Programming for Free
I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments to provide some helpful background:
In this segment, we will introduce sample data sets we can use in our learning.
There are a couple of ways to do this. One way is to build a data set on the console. We can also play with the data sets that come with R. To see the data sets that come with R, type: “data()”:

As we see, typing “data()” at the Console prompt lists a rather long list of data sets.
If we want to see the contents of one of the data sets, type its name at the “R prompt.”
Let’s look at the “CO2” data set (data frame):
> CO2
Plant Type Treatment conc uptake 1 Qn1 Quebec nonchilled 95 16.0 2 Qn1 Quebec nonchilled 175 30.4 3 Qn1 Quebec nonchilled 250 34.8 4 Qn1 Quebec nonchilled 350 37.2 5 Qn1 Quebec nonchilled 500 35.3 6 Qn1 Quebec nonchilled 675 39.2 7 Qn1 Quebec nonchilled 1000 39.7 8 Qn2 Quebec nonchilled 95 13.6 9 Qn2 Quebec nonchilled 175 27.3 10 Qn2 Quebec nonchilled 250 37.1 11 Qn2 Quebec nonchilled 350 41.8 12 Qn2 Quebec nonchilled 500 40.6 13 Qn2 Quebec nonchilled 675 41.4 14 Qn2 Quebec nonchilled 1000 44.3 15 Qn3 Quebec nonchilled 95 16. 16 Qn3 Quebec nonchilled 175 32.4 17 Qn3 Quebec nonchilled 250 40.3 18 Qn3 Quebec nonchilled 350 42.1 19 Qn3 Quebec nonchilled 500 42.9
Above are the first 19 lines of the 84 lines of the data set.
This is a good time to introduce the “head()” function which returns the first or last parts of a vector, matrix, table, data frame or function. “CO2” is a data frame. A data frame can be thought of as a table or two-dimensional array.
> head (CO2)
Plant Type Treatment conc uptake 1 Qn1 Quebec nonchilled 95 16.0 2 Qn1 Quebec nonchilled 175 30.4 3 Qn1 Quebec nonchilled 250 34.8 4 Qn1 Quebec nonchilled 350 37.2 5 Qn1 Quebec nonchilled 500 35.3 6 Qn1 Quebec nonchilled 675 39.2
If we have a large data set, we can use “head()” or “tail()” to have a smaller data set to work with while we are setting up our analysis.
Here is how to access the conc value on our first row of data:
> a <- CO2$conc[1] > a [1] 95
Here is how to refer to the data set elements:

The $ sign tells the interpreter we are going to access a certain column in the data frame.
To see all values in the “conc” column:
> a <- CO2$conc > a [1] 95 175 250 350 500 675 1000 95 175 250 350 500 675 1000 95 175 250 350 [19] 500 675 1000 95 175 250 350 500 675 1000 95 175 250 350 500 675 1000 95 [37] 175 250 350 500 675 1000 95 175 250 350 500 675 1000 95 175 250 350 500 [55] 675 1000 95 175 250 350 500 675 1000 95 175 250 350 500 675 1000 95 175 [73] 250 350 500 675 1000 95 175 250 350 500 675 1000
In addition to pre-built data sets, there are pre-built mathematical objects. In a previous segment, we assigned 3.14 to a variable named pi but, pi is actually built in. If we restart a fresh session or clear the workspace and type, we see a more precise definition of pi that is already available for us to use:
> pi [1] 3.141593
We’ll wrap up this blog post by discussing a few rules around variable naming.
There are a few rules we should know around naming variables in R. The basic rules are:
Variables start with a letter and they cannot contain spaces; use underscores to replace spaces for readability
We can use letters, numbers and the underscore when building variable names
We also want to avoid using variable names of objects that are already defined in R (like “pi”), because that could cause confusion. Also, we do not want to reuse reserved words like “package” or “data.”
In our next installment, I will discuss how to bring your own data sets into R.
Additional blog posts on more complex R concepts to follow; please contact communications@performancearchitects.com if you have any questions or need further help!
#installingR #Rprogramming #LearningR #BusinessIntelligence #datascience