Skip to contents

Creates a data-raw folder, moves raw data files to a consistent location, and provides a script that makes it easy to clean and wrangle the data into a format consistent with the manydata universe.

Usage

import_data(
  dataset = NULL,
  database = NULL,
  path = NULL,
  codebook = NULL,
  delete_original = FALSE,
  open = rlang::is_interactive()
)

Arguments

dataset

Intended (short) name of the dataset. That is, the name of the two-dimensional tabular data format. For consistency reasons, this should be a unique name in all capitals. Abbreviations make good dataset names, such as "COW" or "DESTA".

database

Intended name of the database or datacube. That is, the name of the population or phenomenon to which the dataset relates. For consistency reasons, this should be a unique name in small letters. Concepts make good database names, such as "states" or "colonial_relations".

path

Path to raw data file. If left unspecified, a dialogue box is raised to select the file via the system.

codebook

Path to the codebook to be imported into the raw-data folder

delete_original

Whether the original file is moved (TRUE) or copied (FALSE). By default FALSE.

open

Whether the resulting preparation script will be opened. By default TRUE.

Value

Places the chosen file into a folder hierarchy within the package such as "data-raw/{database}/{dataset}/" and creates and opens a script in the same folder for preparing the data for use in the package.

Details

The function assists with importing existing raw data into our universe of packages. The function does two main things.

First, it moves or copies a chosen file into the "data-raw/" folder of the current package. A hierarchy to this folder is established. It first checks whether there is already a folder under "data-raw/" on the harddrive that is the same as the name of the database and, if there is no such folder, it creates one. It then also checks whether there is already a folder under that that is consistent with the name of the dataset. If there is no such folder, it creates one. Finally, it places the chosen file into that dataset folder. If the argument delete_original = TRUE then the original file will be deleted. This can be useful if, for example, the file had just been downloaded to your "Downloads" folder.

Second, the function creates a new script in the dataset-level folder, alongside the raw data file. By default, it also opens this script in RStudio or equivalent IDE. The purpose of this script is to read the file into R, cleaning the data and wrangling it into a manydata-consistent format, and then exporting it for use in the package. Quite a bit of this is pre-populated either using information given to import_data(), or inferring what is required from the name or format of the file. Currently supported formats include: .txt .csv, .xlsx, .xls, .dta and, .RData.

Examples

if (FALSE) {
import_data(dataset = "COW", database = "states")
}