Establishes a new data bundle project, used to prepare outputs from standardized datasets.
Usage
dcf_add_bundle(
name,
project_dir = ".",
source_files = NULL,
open_after = interactive(),
use_git = TRUE,
use_workflow = FALSE
)Arguments
- name
Name of the bundle
- project_dir
Path to the Data Collection Framework project.
- source_files
A list or character vector, with names as paths to standard files form source projects (relative to the project's data directory), and distribution file names as entries. This associates input with output files, allowing for calculation of a source state, and metadata inheritance from source files.
- open_after
Logical; if
FALSE, will not open the project.- use_git
Logical; if
TRUE, will initialize a git repository.- use_workflow
Logical; if
TRUE, will add a GitHub Actions workflow.
Project Definition
The process.json file defines the project with some initial attributes:
typeAlwaysbundleto define this as a bundle project.nameName of the project.scriptsList of script definitions.source_filesA character array of paths to other files used within the scripts, relative to the overall project'sdatadirectory.standard_stateState of thesource_files: A list with keys as the file paths, relative to the overall project root, and values as the MD5 hash of those files.dist_stateState of thedistdirectory: A list with keys as the file paths, relative to the overall project root, and values as the MD5 hash of those files.checkedTimestamp when the project was last checked withdcf_check.check_resultsResults of the last check.
Each scripts entry points to a script to be run, with one default:
pathpath to the script, relative to this project's root.last_runTimestamp of the last processing.run_timeHow long the script took to run last, in milliseconds.last_statusStatus of the last run; a list with entries forsuccess(logical) andlog(output of the script).
Project Files
Within a bundle project, there are two files to edits:
build.R: This is the primary script, which is automatically rerun. It should read data from thestandarddirectory of source projects, and write to it's owndistdirectory.measure_info.json: This should list all non-ID variable names in the data files withindist. These will inherit the standard measure info if found in the source projects referred to insource_files. If thedistname is different, but should still inherit standard measure info, asource_identry with the original measure ID will be used to identify the original measure info. Seedcf_measure_info.