Note: I’m a graduate student at the University of Miami working on my capstone, a visualization of the Pictures of the Year International Archives. If you’re curious about my process, here are my posts tagged with capstone. Keep in mind, I’m learning as I go so I’m more than all ears for more efficient solutions. Please let me know in the comments!
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. A data wrangler is a person who performs these transformation operations.
This may include further munging, data visualization, data aggregation, training a statistical model, as well as many other potential uses. Data munging as a process typically follows a set of general steps which begin with extracting the data in a raw form from the data source, “munging” the raw data using algorithms (e.g. sorting) or parsing the data into predefined data structures, and finally depositing the resulting content into a data sink for storage and future use.
Now that I have as much of the POYi data as I believe exists, I’ve been in the process of learning how to use R to wrangle data and learning about possible ways to create a database. A few people have recommended a database but I’m not sure I really need one, yet.
One great source I’ve discovered is this post by Sharon Musings titled, Great R packages for data import, wrangling and visualization. So, this is where I’ve started. If I run into a roadblock there, I’ll see about a database. Either path is rich with learning but at least with R I have some familiarity with dyplr, tidyr, magriitr, ggplot and others in the list because of my last big data viz project about Anne Sexton.
Here I go!