Week 6 - R Data I/O & Wrangling
Objectives
- Overview of R Data Input and Output
- R Data Wrangling: Transformation using Base R
Topics
- Getting Data In is part of just about any analysis!
- R excels at this
- truly broad coverage of file formats
- as well as ‘backends’ such as databases
- or different web-based APIs
- Our focus: read/write of csv data
- Mention other formats: json, xml, …
- Efficient R-specific storage: rds
- Mention protobuf, msgpack, feather, fst, …
- Data wrangling topics
data.frame
manipulations
- modifying by adding columns
- subsetting and summaries
- conditional operation by groups
- merging (and its relationship to SQL joins)
- functional programming approaches
Core Material
- Lecture 11 Slides
- Lecture 12 Slides
- Video: Week 6 of Fa20
- Video 15: Data Input (note that spoken comment about
stringsAsFactor
changed, see slides)
- Video 15-1: Data Wrangling
- Chapter 6: Reading Data into R in Lander, R for Everyone, 2017.
- Chapter 5: Data Frames in Matloff, Art of R Programming, 2011.
- Chapter 6: Factor And Tables in Matloff, Art of R Programming, 2011.
- Sections 1 and 2 of Chapter 11, Group Manipulations in Lander, R for Everyone, 2017.
Additional Resources
- Titanic data set
- R Data Import/Export Manual
- While R for Data Science is popular, it skips Base R which these lectures cover