How to Use Your Own Data Sets: Step Five in Learning R Programming for Free
I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments to provide some helpful background.
In the previous installment, we learned about data frames and how to access elements and columns in a pre-built data set. In this installment, we will explore bringing in our own data sets. We will then explore some of the mathematical operators we would want to use to do calculations on values in our data set.
Some say climate change is the biggest threat of our age, while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.
R has a function for reading character delimited files, “read.csv:”
mydata <- read.csv(file=”c:/mycsvfile.csv”, header=TRUE, sep=”,”)
Here, we see that the defaults for “read.csv” are “header row, set header=TRUE” and “sep” to the separator in the data, with the default as a comma.
On my computer, I happen to have a CSV file on climate change from Data.World where I extracted the temperatures for the USA since the 1700s. Let’s load it:
> globallandtemp <- read.csv(“C:/GlobalLandTemperaturesByCountryUSA.csv”)
Print out a sample to examine our data:
dt AverageTemperature AverageTemperatureUncertainty Country 1 1768-09-01 15.420 2.880 United States 2 1768-10-01 8.162 3.386 United States 3 1768-11-01 1.591 3.783 United States 4 1768-12-01 -2.882 4.979 United States 5 1769-01-01 -3.952 4.856 United States 6 1769-02-01 -2.684 3.311 United States
Our temperatures are in Celsius. Let’s convert them to Fahrenheit.
Our formula to make the conversion is:
f = (9/5) * celsius + 32
So, if we want to see “AverageTemperature” for September 1768 in Fahrenheit degrees, we can write:
> f <- (9/5) * globallandtemp$AverageTemperature + 32 > f  59.756
A bit chilly! So, now we know the basics of adding, multiplying and dividing. Just remember your order of operations and place parentheses so you get the expected results.
If we want to apply the formula to the entire data set:
> all_f <- (9/5) * globallandtemp$AverageTemperature + 32
If you happen to print “all_f” to the console, you will see many “NA” values. This means the value in the data set is NULL. When we get to table functions in a future blog, I will explain how to handle this (for instance, if we wanted to apply a function to get the annual mean temperature by year). In a future installment, I will introduce the mutate and select functions to allow the data set to be extended and to trim down the columns in the data set.
In our next installment, we will discuss data types in R.
Additional blog posts on more complex R concepts to follow; please contact firstname.lastname@example.org if you have any questions or need further help!