Merging a small number of datasets

Let’s say you have a bunch of different datasets you need to merge. For example, one from Qualtrics, one from LIWC and one from the Evaluative Lexicon. How would you merge them together?

Step 1: Import your datasets into R and give them different names

It’s ok if one of your files is a .csv and one is a .xlsx. Once theyre imported as dataframes, it’s all the same.

If you’re confused about how to import data and find the filepath, just scroll down a little bit 🙂

# if your files are .csv
dta1 = read.csv("[filepath].csv")
dta2 = read.csv("[filepath].csv")

#if your files are .xlsx
dta1 = read_xlsx("[filepath to the xlsx file on your computer].xlsx")
dta2 = read_xlsx("[filepath to the xlsx file on your computer].xlsx")

Step 2: Merging the two datasets together

Both of these datasets should have a variable that identifies the participants and is called the same thing across the datasets. In this case, it’s “ResponseId”

dta = merge(dta1, dta2, by = "ResponseId")

Recall that if you need to do something more complex with the merge function (like have the matching variable named different things across dataframes or delete rows that don’t have a “match” between datasets), you can always type “??merge” in the console to get more information. The correct package is base::merge.

Merging a large number of datasets

If you need to merge a large number of datasets, it can be heplful to put them all in the same folder and create a function to import them and name them. That looks something like this

# Create a list of all the files in a folder that have .xlsx as their type
files <- list.files('pathway to your files', pattern = "\\\\.xlsx$", full.names = TRUE)

# Loop through files and assign each to a data frame named the same as the original file 
for (file in files) {  
name <- tools::file_path_sans_ext(basename(file))  
assign(name, read_xlsx(file))}