Goals

In this vignette I will provide an overview of some of the more common strategies that you will use to manipulate and organize your data for subsequente analysis. We will be working with two packages that are part of the tidyverse package. The first, tidyr, provides a number of functions for reorganizing variables between long and wide format as well as separating out new variables based on the values of other variable. The second, dplyr, is used for manipulating data, that is, to select, filter, sort, etc. and for transforming values either through recoding or some other operation.

Data

Let’s take at a dataset included in the analyzr package. First, install and load the package, and the main tidyverse tools.

devtools::install_github("WFU-TLC/analyzr")

Let’s take a look at the sdac dataset.

This dataset is in the tidy format. Take a look at the R documentation for this dataset with ?sdac.

Manipulate data frames

There are a few tidyverse verbs that are very commonly used to manipulate data frames.

select() allows you to select a subset of columns

arrange() sorts a data frame by one or more columns

filter() allows you to select rows where the values match certain parameters

filter() can be combined with numerous operators and vector functions.

Summarize data

You often want to explore your data by summarizing. A basic summary is count().

You can also add column names to count() to group your count summary.

You can also use the group_by() function to expliciy group your data for multiple operations.

Using group_by() we can sample data as well.

  • mutate

  • summarize

  • Vector functions

    • n
    • row_number
    • case_when

Organize data frames

Found at STHDA website

Found at STHDA website

  • gather/ spread

  • separate/ unite

  • Two table verbs