Establishes a new data source project, used to collect and prepare data from a new source.
Usage
dcf_add_source(
name,
project_dir = ".",
open_after = interactive(),
use_git = TRUE,
use_workflow = FALSE
)Project Definition
The process.json file defines the project with some initial attributes:
typeAlwayssourceto define this as a source project.nameName of the project.scriptsList of script definitions.checkedWhen the project was last checked withdcf_check.check_resultsResults of the last check.standaloneLogical;TRUEif the source project does not exist within a broader collection project.standard_stateState of thestandarddirectory: A list with names as the file paths, relative to the overall project root, and values as the MD5 hash of those files.raw_stateState of therawdirectory, if set within a script.vintagesA list with names as names of files found in thestandarddirectory, and values as dates (of arbitrary format). This is a way to provide a date separate from the files dates (e.g., if you have some other source for when the data were actually collected), which will be included the named file'sdatapackage.json.
Each scripts entry points to a script to be run, with one default:
pathpath to the script, relative to this project's root.manualLogical; ifTRUE, will only run the script fromdcf_process(notdcf_build).frequencyHow often to rerun the project, in days. This is checked against the last run timestamp when processed; it is a way to skip processing, but can only be as frequent as the overall process is run.last_runTimestamp of the last processing.run_timeHow long the script took to run last, in milliseconds.last_statusStatus of the last run; a list with entries forsuccess(logical) andlog(output of the script).
See the script standards for examples of using this within a sub-project script.
Project Files
Within a source project, there are two files to edits:
ingest.R: This is the primary script, which is automatically rerun. It should store raw data and resources inraw/where possible, then use what's inraw/to produce standard-format files instandard/. This file is sourced from its location during processing, so any system paths must be relative to itself.measure_info.json: This is where you can record information about the variables included in the standardized data files. Seedcf_measure_info.
Examples
project_dir <- paste0(tempdir(), "/temp_project")
dcf_init("temp_project", dirname(project_dir))
dcf_add_source("source_name", project_dir)
list.files(paste0(project_dir, "/data/source_name"))
#> [1] "README.md" "ingest.R" "measure_info.json"
#> [4] "process.json" "project.Rproj" "raw"
#> [7] "standard"