Overview
There will be 8 sessions over the course of this academic year. Each session includes some preparatory work before the session and a follow-up task(s) after the session. The preparation is aimed at getting up to speed with general concepts and technical skills and the follow up is your opportunity to apply the knowledge that you have acquired to your particular project. The sessions themselves will cover both conceptual and technical concepts that need to be addressed in general in a quantitative research project but also include particular aspects that the learning community needs to achieve its particular project goals.
Note: A space on GitHub has been created for you to post questions and issues that you may have throughout the year. Here is the link to the flc_discussion_board issues tab. All members of the FLC will be able to see and respond there.
In this session we aim to provide conceptual and technical orientation that will guide our learning community throughout the year. To get started everyone will have the opportunity to discuss their interest in this learning community and we will begin to flesh out where each member’s research stands at this point; Do you have a research question? Do you have data? What format is it in? How have you, or others, approached this topic in terms of analysis? etc.
On the technical side we will aim to sketch out links to some of the conceptual questions we will cover with data management strategies and programming approaches for conducting quantitative research with R. We will also make sure that everyone has R and RStudio up and working and answer any questions concerning the software as we move to set the stage for working with R in and between subsequent sessions.
fork
this repository by pressing the fork
button located on the upper right hand side of the repository. Additionally, you can download the entire repository as a zip drive and then load it manually into a repository that you create.In this session we will discuss the ‘tidy’ approach to data organization. We will work with various datasets to highlight how to set up data so that it is conducive to research.
We will also discuss how R represents various data types and provide examples of how to summarize data in tables and figures to explore and gain preliminary insight from data. We will take advantage of R Markdown to create reports that interleave prose, code, and results of this code in a human-readable format.
Finally we will step you through the process of connecting, updating, and publishing your R Markdown website to GitHub.
rio
ggplot2
ggplot2
In this session we will cover the first steps involved in working with data: importing/ exporting data and preliminary data exploration. First, we will discuss typical data formats generated by proprietary software such as SPSS, Excel, and Stata as well as open-data formats such as comma-separated values and other character delimited data files. Next, we will explore and evaluate data structure from datasets which conform and do not conform to the ‘tidy’ format. Finally, we will demonstrate the first steps in exploring a dataset through visualization.
rio
package. If your data is imported from a proprietary format, convert and save this data to disk in a open data format such as .csv
or .tsv
.Reading on data wrangling:
Optional reading (reading and tidying text for analysis):
tidytext
online bookThese two resources should give you an idea how to importing text from a single and multiple files (and from compressed data formats, such as .zip), how to create tabular data from text, and text tokenization (with the tidytext
package). There are a number of concepts and coding strategies that we have covered and will cover in the upcoming session.
Project work:
Evaluate the current state of your project and be ready to share your thoughts with the FLC.
As we have discussed, the basic data structure used for analysis in R is a data frame. A data frame is a tabular structure. The source data for some projects will already be in tabular format, while other data (text processing) may not. In either case, there is almost always some amount of re-organization that needs to take place before the data is ready for statistical modeling. For example, you may want to take information contained in one column and separate it into two columns, you may need to remove columns, you may need to clean up the values in a column or recode them, you may want to derive new columns based on the values of other columns, etc. These operations constitute the data wrangling/ munging/ curation stage of a project.
In this session we will introduce you to data wrangling in R with the tidyr
and dplyr
packages. We will provide you an opportunity to discuss the current state of your project data and evaluate what data wrangling is needed to best organize your data to address your research goals.
Applying wrangling strategies to your data:
Preparing for analysis:
Reading:
Project work (post-reading):
What is the next question I am hoping to answer?
This will help us all hold each other accountable and help shape the direction of subsequent sessions.
See these notes for how to update your website.
In this session we again start with our round table and orientation of all of our projects. At this point we might be able to start to break up into smaller groups to focus on the tools and methods most appropriate to your research projects (e.g. text analysis methods vs more traditional quantitative approaches).
Round table review of your project status (3-4 min per person)
Open discussion and reflection on the projects
Group work
Reconvene, recap and plan for session 6
TBA
Prepare roundtable comments
Round table (2-3 min/ person)
- What have work have I worked completed since last session?
- What questions still remains before next session? - What kind of analysis you will be doing?
- What do you want to deliver at the end of the final session?
Example Analysis (together) - Exploratory analysis
- Inference phase
- Prediction
Personal Reflection - What have we talked about today?
- what questions do you remaining that were raised today or you hope will be reaised in a future meeting?
- What future progress do you hope to finish in the next sessions?
Write the answers to these questions in your website, render and push them.
TBD
Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10). https://doi.org/10.18637/jss.v059.i10.