mtcars
3 Data Import
3.1 Importing Data in R
Built-in Datasets from Packages
R comes with a variety of built-in datasets that can be loaded directly from packages, here are some popular ones:
Package | Key Datasets | Load Command |
---|---|---|
datasets |
mtcars, iris | Built-in |
ggplot2 |
diamonds, mpg | library(tidyverse) |
nycflights13 |
flights, weather | install.packages() |
gapminder |
gapminder | library(gapminder) |
Usage Example:
Downloading External Data
From TidyTuesday
#install.packages("tidytuesdayR")
library(tidytuesdayR)
# Load 2024 Olympics data
<- tt_load('2024-08-06')
tuesdata <- tuesdata$olympics olympics
Direct from URL
library(tidyverse)
# Hong Kong graduates salary data
= "https://www.ugcs.gov.hk/datagovhk/Average_Annual_Salaries_FT_Employment(Eng).csv"
data_url <- read_csv(data_url)
hksalary_download
hksalary_download
Local File Import
In data analysis projects, importing local files is more common than importing data from the web. Here are some common file types and their uses:
- CSV Files: Simple, human-readable, and widely supported. Ideal for tabular data.
- Excel Files: Used for spreadsheets with multiple sheets or formatting. Imported with
readxl
oropenxlsx
. - SPSS, SAS, Stata Files: Common in social science and survey research. Use specialized R packages to import.
- RDS Files: Binary format for storing R objects, preserving their structure and class information.
- RData Files: Binary format for saving multiple R objects in a single file, often used for workspaces.
For this course, we will focus on CSV files, as they are simple and widely used.
CSV Files with readr
(tidyverse
) package
First, we download the CSV file from the web and save it locally as hksalary.csv
. Then, we import it using the read_csv()
function from the readr
package.
read_csv()
vs. read.csv()
Note that read_csv()
from readr
is preferred over read.csv()
from base R for its speed and consistency. In this course, we recommend using read_csv()
for CSV files.
# Relative path (recommended)
<- read_csv("data/hksalary.csv") hksalary
Rows: 368 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Academic Year, Level of Study, Broad Academic Programme Category
dbl (1): Average Annual Salary (HK$'000)
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hksalary
File Path Management
Path Type | Example | When to Use |
---|---|---|
Relative | data/hksalary.csv |
Default in projects |
Absolute | C:/Users/.../hksalary.csv |
Temporary analysis |
Use relative paths for portability and to avoid hardcoding directory paths.
3.2 Data Inspection
After importing data, it’s essential to inspect it to understand its structure and contents. Here are some common functions to help you get started:
First Look Tools
head()
: Shows the first few rows of the dataset.
# First 6 rows
head(hksalary)
glimpse()
: Provides a concise summary of the dataset’s structure.
# # Tidyverse alternative to str()
glimpse(hksalary)
Rows: 368
Columns: 4
$ `Academic Year` <chr> "2009/10", "2009/10", "2009/10", "…
$ `Level of Study` <chr> "Sub-degree", "Sub-degree", "Sub-d…
$ `Broad Academic Programme Category` <chr> "Medicine, Dentistry and Health", …
$ `Average Annual Salary (HK$'000)` <dbl> 292, 125, 125, 139, 163, 122, 155,…
summary()
: Displays a statistical summary of the dataset.
summary(hksalary)
Academic Year Level of Study Broad Academic Programme Category
Length:368 Length:368 Length:368
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Average Annual Salary (HK$'000)
Min. :120.0
1st Qu.:206.5
Median :269.0
Mean :283.8
3rd Qu.:350.5
Max. :714.0
Function | Output Focus | Tidyverse Equivalent |
---|---|---|
head() |
Top rows | slice_head() |
str() |
Data types & structure | glimpse() |
3.3 Practice: Import hksalary.csv
Data
Step-by-Step Practice
Setup Workspace
- Create
import-practice
project - Make
/data
subfolder
- Create
Store Data
- Download Hong Kong Graduates Annual Salary Data
- Save as
hksalary.csv
in/data
Import Data
library(tidyverse) <- read_csv("data/hksalary.csv") hksalary
Rows: 368 Columns: 4 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (3): Academic Year, Level of Study, Broad Academic Programme Category dbl (1): Average Annual Salary (HK$'000) ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Initial Inspection
glimpse(hksalary) summary(hksalary)
3.4 Key Functions Recap
Task | Function | Example |
---|---|---|
Load package | library() |
library(tidyverse) |
Read CSV | read_csv() |
read_csv("data/file.csv") |
View structure | glimpse() /str() |
glimpse(df) |
Show first rows | head() |
head(df, n = 10) |
Statistical summary | summary() |
summary(df$salary) |