Overview

There will be 8 sessions over the course of this academic year. Each session includes some preparatory work before the session and a follow-up task(s) after the session. The preparation is aimed at getting up to speed with general concepts and technical skills and the follow up is your opportunity to apply the knowledge that you have acquired to your particular project. The sessions themselves will cover both conceptual and technical concepts that need to be addressed in general in a quantitative research project but also include particular aspects that the learning community needs to achieve its particular project goals.

Note: A space on GitHub has been created for you to post questions and issues that you may have throughout the year. Here is the link to the flc_discussion_board issues tab. All members of the FLC will be able to see and respond there.

Session 1: Getting oriented

Preparation

  • Read Getting started with R and RStudio for instructions on how to download and install R and RStudio. This guide also provides some a preliminary pass at the basic concepts required to work with this software.
  • Read ModernDive: An introduction to statistical and data sciences via R (MD) chapter 1 “Introduction” and chapter 2 “Getting Started with Data in R”.

Session goals

In this session we aim to provide conceptual and technical orientation that will guide our learning community throughout the year. To get started everyone will have the opportunity to discuss their interest in this learning community and we will begin to flesh out where each member’s research stands at this point; Do you have a research question? Do you have data? What format is it in? How have you, or others, approached this topic in terms of analysis? etc.

On the technical side we will aim to sketch out links to some of the conceptual questions we will cover with data management strategies and programming approaches for conducting quantitative research with R. We will also make sure that everyone has R and RStudio up and working and answer any questions concerning the software as we move to set the stage for working with R in and between subsequent sessions.

Follow up

  • Reflect on the status of your project in more detail. What are your goals? What is your research question? What type of data do you have/ need to address this research question? What format should your data have to facilitate analysis? How do you believe you will explore/ analyze your data? How do you expect to communicate your findings?
  • Complete DataCamp’s Introduction to R which is a free online course to introduce you to R programming. (Note: you will have to register for a free account to access this and all other DataCamp courses.)

Session 2: Data organization

Preparation

Session work

In this session we will discuss the ‘tidy’ approach to data organization. We will work with various datasets to highlight how to set up data so that it is conducive to research.

We will also discuss how R represents various data types and provide examples of how to summarize data in tables and figures to explore and gain preliminary insight from data. We will take advantage of R Markdown to create reports that interleave prose, code, and results of this code in a human-readable format.

Finally we will step you through the process of connecting, updating, and publishing your R Markdown website to GitHub.

Follow up

  • Brainstorm the ideal structure of your data for conducting your research taking into account the ‘tidy’ approach to data that we covered in this session. Document your ideas on your personal website and push (upload) the site to GitHub. See the Resources page for a link to a refresher on editing, building, and updating R markdown websites.

Session 3: Import/ Export/ Convert Data

Preparation

Session work

In this session we will cover the first steps involved in working with data: importing/ exporting data and preliminary data exploration. First, we will discuss typical data formats generated by proprietary software such as SPSS, Excel, and Stata as well as open-data formats such as comma-separated values and other character delimited data files. Next, we will explore and evaluate data structure from datasets which conform and do not conform to the ‘tidy’ format. Finally, we will demonstrate the first steps in exploring a dataset through visualization.

Follow up

  • Attempt to import the data for your project into R using the rio package. If your data is imported from a proprietary format, convert and save this data to disk in a open data format such as .csv or .tsv.
  • Next, attempt to read the converted data back into R to confirm that the structure of the data has been preserved.
  • Once you have your data on disk in an open format, then explore the data through some preliminary visualizations.
  • Finally, document this work as a reproducible R Markdown document on your website. Include the steps to import and export the data, your visualizations, and any preliminary observations you can provide at this point.

Session 4: Wrangle data

Preparation

Reading on data wrangling:

Optional reading (reading and tidying text for analysis):

These two resources should give you an idea how to importing text from a single and multiple files (and from compressed data formats, such as .zip), how to create tabular data from text, and text tokenization (with the tidytext package). There are a number of concepts and coding strategies that we have covered and will cover in the upcoming session.

Project work:

Evaluate the current state of your project and be ready to share your thoughts with the FLC.

  • What is the current format/ structure of the data?
  • Is this structure ‘tidy’?
  • Does it contain the relevant variables needed for the upcoming analysis?
  • If not, what is needed to bring the data in-line with your analysis expectations?

Session work

As we have discussed, the basic data structure used for analysis in R is a data frame. A data frame is a tabular structure. The source data for some projects will already be in tabular format, while other data (text processing) may not. In either case, there is almost always some amount of re-organization that needs to take place before the data is ready for statistical modeling. For example, you may want to take information contained in one column and separate it into two columns, you may need to remove columns, you may need to clean up the values in a column or recode them, you may want to derive new columns based on the values of other columns, etc. These operations constitute the data wrangling/ munging/ curation stage of a project.

In this session we will introduce you to data wrangling in R with the tidyr and dplyr packages. We will provide you an opportunity to discuss the current state of your project data and evaluate what data wrangling is needed to best organize your data to address your research goals.

Follow up

Applying wrangling strategies to your data:

  • Move data structure forward by wrangling your data as necessary.
  • Visualize a primary research question
    • What trends do you observe?
    • Repeat this process with follow up questions and document your preliminary findings on your website.

Preparing for analysis:

Reading:

Project work (post-reading):

  • Consider the type of analysis that best meets your goals: exploratory analysis, inference/ hypothesis-testing analysis, or predictive analysis.
  • Update your website with your progress: please present ideas on the type of analysis you (tentatively) plan on employing in your analysis before the break. This will help Mike and I work to tailor our spring sessions.

Session 5: Choose an appropriate model

Preparation

  1. This session we will begin to start each session with a quick status update for each person. To this point in 3-4 minutes you should be able to give a status update on the following points:
  • What have I worked on since our last session?
  • Share at least one block of code (or pseudocode) that you have written since session 6.
  • What is the next question I am hoping to answer?

    This will help us all hold each other accountable and help shape the direction of subsequent sessions.

  1. Update your personal session notes or project page, especially for session 4, with the topics we covered and any code that you have written. This will give you some practice using your website and pushing new code to github. You can use your session notes as a journal to document what items that you would like us to cover or on which you need more support.

See these notes for how to update your website.

Session work

In this session we again start with our round table and orientation of all of our projects. At this point we might be able to start to break up into smaller groups to focus on the tools and methods most appropriate to your research projects (e.g. text analysis methods vs more traditional quantitative approaches).

  • Round table review of your project status (3-4 min per person)

  • Open discussion and reflection on the projects

  • Group work

  • Reconvene, recap and plan for session 6

Follow up

TBA

Session 7

Preparation

Prepare roundtable comments

  • Progress you have made since session 6
  • The next research/programming problem you’re hoping to address

Session work

Round table (2-3 min/ person)
- What have work have I worked completed since last session?
- What questions still remains before next session? - What kind of analysis you will be doing?
- What do you want to deliver at the end of the final session?

Example Analysis (together) - Exploratory analysis
- Inference phase
- Prediction

Personal Reflection - What have we talked about today?
- what questions do you remaining that were raised today or you hope will be reaised in a future meeting?
- What future progress do you hope to finish in the next sessions?
Write the answers to these questions in your website, render and push them.

Follow up

TBD

References

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.