Search

Working with Data Frames: Step Eight in Learning R Programming for Free

I hope you are enjoying the “Learning R Programming for Free” series; here are links to the previous segments (Step One, Step Two, Step Three, Step Four, Step Five, Step Six, Step Seven) to provide some helpful background.


In the previous installment, we learned to work with vectors and how to use functions to complete operations on the vectors.


In this discussion, we look at the data frame.  A data frame is a table or a two-dimensional array-like structure where each column contains values and each row contains one set of values for each column. The data stored in a data frame can be of data types: numeric, factor or character.   Data frames are made up of vectors, (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.


In basic R, there is a built-in data frame named “mtcars.”  Let’s look at the contents of “mtcars:”

# List the entire data set:

Mtcars

Test if mtcars is a data frame:

> is.data.frame(mtcars) [1] TRUE

# List a few rows from “mtcars”:

> head(mtcars)


The first line of the table is called the “header” and it contains the column names. Each horizontal line below the header is a “data row.” A data row begins with the name of the row and is followed by the row data. Each data member of a row is called a “cell” just like in your favorite spreadsheet tool.


To retrieve data in a cell, we provide the row and column coordinates in single square brackets [ ]. We can specify more than one coordinate (cell) by making a comma-separated list.


Let’s introduce some useful searching techniques. Suppose we want to find the car with the worst fuel  mileage:


And which car has the best fuel mileage?


Notice how we used “min” and “max” to filter the search.

Let’s look at some more sophisticated searches using “which”:

Which cars have the worst fuel mileage?


Are we annoyed yet, that the car brand/model column has no label?  Let’s fix that:

mydf <- cbind(rownames(mtcars), mtcars) rownames(mydf) <- NULL colnames(mydf) <- c(“brand”,”mpg”,”cyl”,”disp”,”hp”,”drat”,”wt”,”qsec”,”vs”,”am”,”gear”,”carb”)

Now we can use “mtcars” as the named data frame: mydf.

Let’s look at the Porsche.  Using grep, we can now search our new “brand” column:

mydf[grep(“porsche”, mydf$brand, ignore.case=T),]

MTCARS Data Set Columns:

[, 1] mpg Miles/(US) gallon [, 2]  cyl Number of cylinders [, 3] disp Displacement (cu.in.) [, 4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile time [, 8] vs Engine (0 = V-shaped, 1 = straight) [, 9] am Transmission (0 = automatic, 1 = manual) [,10] gear Number of forward gears [,11] carb Number of carburetors

Let’s add a new row using “rbind”. We need a few more high-performance cars in our list.  Let’s add a new car.

df2 = data.frame(brand=”Porsche 911 Turbo S”,mpg=21,cyl=6,disp=231,hp=580,drat=3.44,wt=3.528,qsec=10.5,vs=””,am=0,gear=7,carb=0)

This appends the object “df2” as a new row with all of the values:

mydf3 <- rbind(mydf,df2)