Title: | Fetch Data from Various Data Sources |
---|---|
Description: | Contains functions to fetch data from various data sources. The user first creates a catalog of objects from a data source, then fetches data from the catalog. The package provides an easy way to access data from many different types of sources. |
Authors: | David Bosak [aut, cre], Kevin Kramer [ctb], Archytas Clinical Solutions [cph] |
Maintainer: | David Bosak <[email protected]> |
License: | CC0 |
Version: | 0.1.5 |
Built: | 2024-11-09 03:52:33 UTC |
Source: | https://github.com/dbosak01/fetch |
The catalog
function returns a data catalog
for a data source. A data catalog is like a collection of data dictionaries
for all the datasets in the data source. The catalog allows you to
examine the datasets in the data source without yet loading anything
into memory. Once you decide which data items you want to load,
use the fetch
function to load that item into memory.
catalog(source, engine, pattern = NULL, where = NULL, import_specs = NULL)
catalog(source, engine, pattern = NULL, where = NULL, import_specs = NULL)
source |
The source for the data. This parameter is required. Normally the source is passed as a full or relative path. |
engine |
The data engine to use for this data source. This parameter
is required. The available data engines are available on the |
pattern |
A pattern to use when loading data items from the data source.
The pattern can be a name or a vector of names. Names also accept wildcards.
The supplied pattern will be used to filter which data items are loaded into
the catalog. For example, the pattern |
where |
A where expression to use when fetching
the data. This expression will apply to all fetch operations on this catalog.
The where expression should be defined with the Base R |
import_specs |
The import specs to use for any fetch operation on
this catalog. The import spec can be used to control the data types
on the incoming columns. You can create separate import specs for each
dataset, or one import spec to use for all datasets.
See the |
The loaded data catalog, as class "dcat". The catalog will be a list of data dictionaries. Each data dictionary is a tibble.
The fetch
function to retrieve data from the catalog,
and the import_spec
function to create import specifications.
# Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # Example 1: Catalog all rows # View catalog ct # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows # View catalog item ct$ADEX # data item 'ADEX': 17 cols 348 rows # - Engine: csv # - Size: 70.7 Kb # - Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar # 1 ADEX STUDYID character <NA> NA 0 3 # 2 ADEX USUBJID character <NA> NA 0 10 # 3 ADEX SUBJID character <NA> NA 0 3 # 4 ADEX SITEID character <NA> NA 0 2 # 5 ADEX TRTP character <NA> NA 8 5 # 6 ADEX TRTPN numeric <NA> NA 8 1 # 7 ADEX TRTA character <NA> NA 8 5 # 8 ADEX TRTAN numeric <NA> NA 8 1 # 9 ADEX RANDFL character <NA> NA 0 1 # 10 ADEX SAFFL character <NA> NA 0 1 # 11 ADEX MITTFL character <NA> NA 0 1 # 12 ADEX PPROTFL character <NA> NA 0 1 # 13 ADEX PARAM character <NA> NA 0 45 # 14 ADEX PARAMCD character <NA> NA 0 8 # 15 ADEX PARAMN numeric <NA> NA 0 1 # 16 ADEX AVAL numeric <NA> NA 16 4 # 17 ADEX AVALCAT1 character <NA> NA 87 10 # Example 2: Catalog with where expression ct <- catalog(pkg, engines$csv, where = expression(SUBJID == '049')) # View catalog item - Now only 4 rows ct$ADEX # data item 'ADEX': 17 cols 4 rows #- Where: SUBJID == "049" #- Engine: csv #- Size: 4.5 Kb #- Last Modified: 2020-09-18 14:30:22 #Name Column Class Label Format NAs MaxChar #1 ADEX STUDYID character <NA> NA 0 3 #2 ADEX USUBJID character <NA> NA 0 10 #3 ADEX SUBJID character <NA> NA 0 3 #4 ADEX SITEID character <NA> NA 0 2 #5 ADEX TRTP character <NA> NA 0 5 #6 ADEX TRTPN numeric <NA> NA 0 1 #7 ADEX TRTA character <NA> NA 0 5 #8 ADEX TRTAN numeric <NA> NA 0 1 #9 ADEX RANDFL character <NA> NA 0 1 #10 ADEX SAFFL character <NA> NA 0 1 #11 ADEX MITTFL character <NA> NA 0 1 #12 ADEX PPROTFL character <NA> NA 0 1 #13 ADEX PARAM character <NA> NA 0 45 #14 ADEX PARAMCD character <NA> NA 0 8 #15 ADEX PARAMN numeric <NA> NA 0 1 #16 ADEX AVAL numeric <NA> NA 0 4 #17 ADEX AVALCAT1 character <NA> NA 1 10
# Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # Example 1: Catalog all rows # View catalog ct # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows # View catalog item ct$ADEX # data item 'ADEX': 17 cols 348 rows # - Engine: csv # - Size: 70.7 Kb # - Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar # 1 ADEX STUDYID character <NA> NA 0 3 # 2 ADEX USUBJID character <NA> NA 0 10 # 3 ADEX SUBJID character <NA> NA 0 3 # 4 ADEX SITEID character <NA> NA 0 2 # 5 ADEX TRTP character <NA> NA 8 5 # 6 ADEX TRTPN numeric <NA> NA 8 1 # 7 ADEX TRTA character <NA> NA 8 5 # 8 ADEX TRTAN numeric <NA> NA 8 1 # 9 ADEX RANDFL character <NA> NA 0 1 # 10 ADEX SAFFL character <NA> NA 0 1 # 11 ADEX MITTFL character <NA> NA 0 1 # 12 ADEX PPROTFL character <NA> NA 0 1 # 13 ADEX PARAM character <NA> NA 0 45 # 14 ADEX PARAMCD character <NA> NA 0 8 # 15 ADEX PARAMN numeric <NA> NA 0 1 # 16 ADEX AVAL numeric <NA> NA 16 4 # 17 ADEX AVALCAT1 character <NA> NA 87 10 # Example 2: Catalog with where expression ct <- catalog(pkg, engines$csv, where = expression(SUBJID == '049')) # View catalog item - Now only 4 rows ct$ADEX # data item 'ADEX': 17 cols 4 rows #- Where: SUBJID == "049" #- Engine: csv #- Size: 4.5 Kb #- Last Modified: 2020-09-18 14:30:22 #Name Column Class Label Format NAs MaxChar #1 ADEX STUDYID character <NA> NA 0 3 #2 ADEX USUBJID character <NA> NA 0 10 #3 ADEX SUBJID character <NA> NA 0 3 #4 ADEX SITEID character <NA> NA 0 2 #5 ADEX TRTP character <NA> NA 0 5 #6 ADEX TRTPN numeric <NA> NA 0 1 #7 ADEX TRTA character <NA> NA 0 5 #8 ADEX TRTAN numeric <NA> NA 0 1 #9 ADEX RANDFL character <NA> NA 0 1 #10 ADEX SAFFL character <NA> NA 0 1 #11 ADEX MITTFL character <NA> NA 0 1 #12 ADEX PPROTFL character <NA> NA 0 1 #13 ADEX PARAM character <NA> NA 0 45 #14 ADEX PARAMCD character <NA> NA 0 8 #15 ADEX PARAMN numeric <NA> NA 0 1 #16 ADEX AVAL numeric <NA> NA 0 4 #17 ADEX AVALCAT1 character <NA> NA 1 10
The engines enumeration contains all possible options
for the "engine" parameter of the catalog
function. Use
this enumeration to specify what kind of data you would like to load.
Options are: csv, dbf, rda, rds, rdata, sas7bdat, xls, xlsx, and xpt.
engines
engines
An object of class etype
of length 9.
The engine parameter string.
The engines
enumeration is used on the catalog
function. See that function documentation for additional details.
#' # Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # Example 1: Catalog all rows # View catalog ct # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows
#' # Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # Example 1: Catalog all rows # View catalog ct # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows
A function to create the import specifications for a
particular data file. This information can be used on the
catalog
or fetch
functions to correctly assign
the data types for
columns on imported data. The import specifications are defined as
name/value pairs, where the name is the column name and the value is the
data type indicator. Available data type indicators are
'guess', 'logical', 'character', 'integer', 'numeric',
'date', 'datetime', and 'time'.
Also note that multiple import specifications
can be combined into a collection, and assigned to an entire catalog.
See the specs
function
for an example of using a specs collection.
import_spec(..., na = NULL, trim_ws = NULL)
import_spec(..., na = NULL, trim_ws = NULL)
... |
Named pairs of column names and column data types, separated by commas. Available types are: 'guess', 'logical', 'character', 'integer', 'numeric', 'date', 'datetime', and 'time'. The date/time data types accept an optional input format. To supply the input format, append it after the data type following an equals sign, e.g.: 'date=%d%b%Y' or 'datetime=%d-%m-%Y %H:%M:%S'. Default is NULL, meaning no column types are specified, and the function should make its best guess for each column. |
na |
A vector of values to be treated as NA. For example, the
vector |
trim_ws |
Whether or not to trim white space from the input data values.
The default is NULL, meaning the value of the |
The import specification object. The class of the object will be "import_spec".
Below are some common date formatting codes. For a complete list,
see the documentation for the strptime
function:
%d = day as a number
%a = abbreviated weekday
%A = unabbreviated weekday
%m = month number
%b = abbreviated month name
%B = unabbreviated month name
%y = 2-digit year
%Y = 4-digit year
%H = hour
%M = minute
%S = second
%p = AM/PM indicator
fetch
to retrieve data, and
specs
for creating a collection of import specs.
Other specs:
print.specs()
,
read.specs()
,
specs()
,
write.specs()
# Get sample data directory pkg <- system.file("extdata", package = "fetch") # Create import spec spc <- import_spec(TRTSDT = "date=%d%b%Y", TRTEDT = "date=%d%b%Y") # Create catalog without filter ct <- catalog(pkg, engines$csv, import_specs = spc) # Get dictionary for ADVS with Import Spec d <- ct$ADVS # Observe data types for TRTSDT and TRTEDT are now Dates d[d$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADVS': 37 cols 3617 rows #- Engine: csv #- Size: 1.1 Mb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #16 ADVS TRTSDT Date <NA> NA 54 10 #17 ADVS TRTEDT Date <NA> NA 119 10
# Get sample data directory pkg <- system.file("extdata", package = "fetch") # Create import spec spc <- import_spec(TRTSDT = "date=%d%b%Y", TRTEDT = "date=%d%b%Y") # Create catalog without filter ct <- catalog(pkg, engines$csv, import_specs = spc) # Get dictionary for ADVS with Import Spec d <- ct$ADVS # Observe data types for TRTSDT and TRTEDT are now Dates d[d$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADVS': 37 cols 3617 rows #- Engine: csv #- Size: 1.1 Mb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #16 ADVS TRTSDT Date <NA> NA 54 10 #17 ADVS TRTEDT Date <NA> NA 119 10
A class-specific instance of the print
function for
a data catalog. The function prints the catalog in a summary manner.
Use verbose = TRUE
option to print the catalog as a list.
## S3 method for class 'dcat' print(x, ..., verbose = FALSE)
## S3 method for class 'dcat' print(x, ..., verbose = FALSE)
x |
The catalog to print. |
... |
Any follow-on parameters. |
verbose |
Whether or not to print the catalog in verbose style. By default, the parameter is FALSE, meaning to print in summary style. |
The object, invisibly.
# Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # View catalog print(ct) # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows
# Get data directory pkg <- system.file("extdata", package = "fetch") # Create catalog ct <- catalog(pkg, engines$csv) # View catalog print(ct) # data catalog: 6 items # - Source: C:/packages/fetch/inst/extdata # - Engine: csv # - Items: # data item 'ADAE': 56 cols 150 rows # data item 'ADEX': 17 cols 348 rows # data item 'ADPR': 37 cols 552 rows # data item 'ADPSGA': 42 cols 695 rows # data item 'ADSL': 56 cols 87 rows # data item 'ADVS': 37 cols 3617 rows
A class-specific instance of the print
function for
data catalog items. The function prints the info in a summary manner.
Use verbose = TRUE
to print the data info as a list.
## S3 method for class 'dinfo' print(x, ..., verbose = TRUE)
## S3 method for class 'dinfo' print(x, ..., verbose = TRUE)
x |
The library to print. |
... |
Any follow-on parameters. |
verbose |
Whether or not to print the info in verbose style. By default, the parameter is FALSE, meaning to print in summary style. Verbose style includes a full data dictionary and printing of all attributes. |
The data catalog object, invisibly.
A function to print the import specification collection.
## S3 method for class 'specs' print(x, ..., verbose = FALSE)
## S3 method for class 'specs' print(x, ..., verbose = FALSE)
x |
The specifications to print. |
... |
Any follow-on parameters to the print function. |
verbose |
Whether or not to print the specifications in verbose style. By default, the parameter is FALSE, meaning to print in summary style. |
The specification object, invisibly.
Other specs:
import_spec()
,
read.specs()
,
specs()
,
write.specs()
A function to read import specifications from the file system.
The function accepts a full or relative path to the spec file, and returns
the specs as an object. If the file_path
parameter is passed
as a directory name, the function will search for a file with a '.specs'
extension and read it.
read.specs(file_path = getwd())
read.specs(file_path = getwd())
file_path |
The full or relative path to the file system. Default is
the current working directory. If the |
The specifications object.
Other specs:
import_spec()
,
print.specs()
,
specs()
,
write.specs()
A function to create a collection of import specifications for a
data source. These specs can be used on the
catalog
function to correctly assign the data types uniquely for
different imported data files. The spec collection is a set of import_spec
objects identified by name/value pairs. The name corresponds to the name of
the input dataset, without file extension. The value is the import_spec
object to use for that dataset. In this way, you may define different
specs for each dataset in your catalog.
The import engines will guess at the data types for any columns that are not explicitly defined in the import specifications. The import spec syntax is the same for all data engines.
Note that the na
and trim_ws
parameters on the specs
function will be applied globally to all files in the library.
These global settings can be overridden on the import_spec
for any particular data file.
Also note that the specs
collection is defined as an object
so it can be stored and reused.
See the write.specs
and read.specs
functions
for additional information on saving and restoring specs.
specs(..., na = c("", "NA"), trim_ws = TRUE)
specs(..., na = c("", "NA"), trim_ws = TRUE)
... |
Named input specs. The name should correspond to the file name,
without the file extension.
The spec is defined as an |
na |
A vector of values to be treated as NA. For example, the
vector |
trim_ws |
Whether or not to trim white space from the input data values.
Valid values are TRUE, and FALSE. Default is TRUE. The value of the
|
The import spec collection. The class of the object is "specs".
catalog
to create a data catalog,
fetch
for retrieving data, and
import_spec
for additional information on defining an
import spec.
Other specs:
import_spec()
,
print.specs()
,
read.specs()
,
write.specs()
# Get sample data directory pkg <- system.file("extdata", package = "fetch") # Create import spec spc <- specs(ADAE = import_spec(TRTSDT = "date=%d%b%Y", TRTEDT = "date=%d%b%Y"), ADVS = import_spec(TRTSDT = "character", TRTEDT = "character")) # Create catalog with specs collection ct <- catalog(pkg, engines$csv, import_specs = spc) # Get dictionary for ADAE with Import Spec d1 <- ct$ADAE # Observe data types for TRTSDT and TRTEDT are Dates d1[d1$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADAE': 56 cols 150 rows #- Engine: csv #- Size: 155 Kb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #13 ADAE TRTSDT Date <NA> NA 1 10 #14 ADAE TRTEDT Date <NA> NA 4 10 # Get dictionary for ADVS with Import Spec d2 <- ct$ADVS # Observe data types for TRTSDT and TRTEDT are character d2[d2$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADVS': 37 cols 3617 rows #- Engine: csv #- Size: 1.1 Mb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #16 ADVS TRTSDT character <NA> NA 54 9 #17 ADVS TRTEDT character <NA> NA 119 9
# Get sample data directory pkg <- system.file("extdata", package = "fetch") # Create import spec spc <- specs(ADAE = import_spec(TRTSDT = "date=%d%b%Y", TRTEDT = "date=%d%b%Y"), ADVS = import_spec(TRTSDT = "character", TRTEDT = "character")) # Create catalog with specs collection ct <- catalog(pkg, engines$csv, import_specs = spc) # Get dictionary for ADAE with Import Spec d1 <- ct$ADAE # Observe data types for TRTSDT and TRTEDT are Dates d1[d1$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADAE': 56 cols 150 rows #- Engine: csv #- Size: 155 Kb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #13 ADAE TRTSDT Date <NA> NA 1 10 #14 ADAE TRTEDT Date <NA> NA 4 10 # Get dictionary for ADVS with Import Spec d2 <- ct$ADVS # Observe data types for TRTSDT and TRTEDT are character d2[d2$Column %in% c("TRTSDT", "TRTEDT"), ] # data item 'ADVS': 37 cols 3617 rows #- Engine: csv #- Size: 1.1 Mb #- Last Modified: 2020-09-18 14:30:22 # Name Column Class Label Format NAs MaxChar #16 ADVS TRTSDT character <NA> NA 54 9 #17 ADVS TRTEDT character <NA> NA 119 9
A function to write import specifications to the file system. The function accepts a specifications object and a full or relative path. The function returns the full file path. This function is useful so that you can define import specifications once, and reuse them in multiple programs or across multiple teams.
write.specs(x, dir_path = getwd(), file_name = NULL)
write.specs(x, dir_path = getwd(), file_name = NULL)
x |
A specifications object of class 'specs'. |
dir_path |
A full or relative path to save the specs. Default is the current working directory. |
file_name |
The file name to save to specs, without a file extension. The '.specs' file extension will be added automatically. If no file name is supplied, the function will use the variable name as the file name. |
The full file path.
Other specs:
import_spec()
,
print.specs()
,
read.specs()
,
specs()