<- 4 + 4
juice class(juice)
class(4)
class(8.3)
class(4L)
Data classes & data frames
1 Data classes
There are different data classes we will work with in R. Definitions of them and how to determine what data type you are working with is a key foundation of R. See Table 1.
Let’s start elaborating on what we’ve already used. Use class()
and str()
to explore data types.
1.1 Definitions
Use class()
and str()
to see how x
is defined in this table.
Term | Example | Code (x) | Description |
---|---|---|---|
Logical | TRUE, FALSE | x <- TRUE |
Two data values, also called boolean |
Numeric | 243, 92.1, or 3.459 | x <- 3.14 |
Numbers, including those with decimals. Standard input for most statistical tools. |
Integer | 10L, 65L, or 0L | x <- 314L |
Numbers without decimals. Needs to be specified as an integer. |
Character | “seven”, “hello world”, “FALSE”, ‘63.354’, “600” | x <- "150" |
Represents a string of values as a variable. Use '' for character variables and "" for string variables. |
Complex | 6 + 5i | x <- 6 + 8i |
Data type with an imaginary part, in this case i. |
The data classes above represent the core basic “data types”. Vectors are comprised of elements of different classes.
1.2 Exploring data types and vectors
Use the below examples to understand how R interprets different data classes and how they are elements within a vector.
What is “juice”?
What about x
?
<- c(1, 7, 9)
x class(x)
<- 5
apples <- 6
oranges <- apples + oranges fruit
How is the below vector different from above?
<- c("1", "7", "9")
x class(x)
Let’s write over x
, and set it equal to types of fruit.
<- c("banana", "grapefruit")
x class(x)
Instead of a numeric value, a character is a string type. Let’s experiment with the numeric vs. character data type. Lets combine our character list of fruit with the earlier defined “apples”.
x
apples
<- c(x, apples) new_x
What happened to the data class of new_x
when we combined character and numeric?
Create another vector
<- c(1, 10, 49, 5)
numeric_vector <- c("a", "b", "c", "d")
character_vector
names(numeric_vector) <- character_vector
What does names()
do above?
numeric_vector
# Isolate a numeric within the vector
<- numeric_vector[4]
q
# Modify elements within a vector
+ numeric_vector
q
# Evaluate vector based on value
<- q > numeric_vector
ans ans
2 Lists
A list can be a list of anything. But everything in the list does NOT need to be the same data type (in contrast to the vector, above). Because lists can include different data types and data sets, they can be used to store information, like a dictionary.
<- list(2^4, "cabbage", TRUE, 1+5i) list_example
<- list(2E2, "lettuce", VALUES = 1:25, VALUES_BINARY = FALSE) list_example_2
$VALUES
list_example_2
typeof(list_example_2$VALUES)
typeof(list_example_2$VALUES_BINARY)
3 Matrices
When do you use one over the other? It depends on what kind of data you’re working with. If you’re working with different types of information in the same table, you likely need a data frame. But if you’re working with a single data type, a matrix may be better for you. Additionally, matrices are more efficient with respect to memory. Therefore, a lot of statistical tools/methods require a matrix as input.
Create a matrix.
matrix(1:9, byrow = FALSE, nrow = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
<- c(460, 314)
q <- c(290, 247)
r <- c(309, 165)
w
c(q, r, w)
[1] 460 314 290 247 309 165
<- matrix(c(q, r, w), byrow = TRUE, nrow = 3)
mydataset mydataset
[,1] [,2]
[1,] 460 314
[2,] 290 247
[3,] 309 165
<- c("one", "two")
region <- c("A","B","C")
category
rownames(mydataset) <- category
colnames(mydataset) <- region
rowSums(mydataset)
A B C
774 537 474
<- rowSums(mydataset)
totals
cbind(mydataset,totals)
one two totals
A 460 314 774
B 290 247 537
C 309 165 474
4 Data frames
Create a data frame and explore the structure. Data frames are the most common way to view datasets that you eventually want to modify and/or make plots from. They can also include different types of data.
<- c("Hey", "Hi", "Howdy", "Hello", "Morning")
greetings
<- c(99, 15, 324, 54, 23)
n
<- data.frame(greetings, n) df
View the data frame you put together and what types of data are in your data frame?
# print (df)
# df
# View(df)
Data are representative of a type of greeting in the first column, and the number of times the greeting was observed (n).
4.1 Isolating elements of a data frame
We need to use specific R syntax to pull out individual rows and columns of a data frame. Isolate a single column.
We will use this [row, column]
# df$
# df[]
4.2 Modify the data frame
Let’s use the command rbind()
to add on a row. We forgot to add in the greeting “Afternoon”, let’s say this was observed 18 times.
<- c("Afternoon", 18)
addition # ?rbind()
<- rbind(df, addition) # writing over the original df df
And let’s change the headers so they are more meaningful.
colnames(df)
names(df)
colnames(df)[1:2]
colnames(df)[1:2] <- c("Greeting", "Observed")
4.3 Activity
Isolate the number of times “howdy” was observed. Set this equal to an R object.
4.4 Activity
Set the number of times a greeting occured to a numeric.