--- title: "ards" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ards} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The **ards** package creates Analysis Results Datasets (ARDS). ARDS are used to store the results of an analysis in a tabular form, so they can be examined and manipulated by downstream processes. An ARDS dataset is created with the following steps: 1. Initialize the ARDS 2. Add data to the ARDS 3. Extract the completed ARDS The above three steps are performed with the following functions: * `init_ards()`: A function to initialize the ARDS dataset. This function is typically called at the beginning of a program. * `add_ards()`: A function to add data to the ARDS. This function is typically called along the way, as you create analysis data. * `get_ards()`: A function to extract the ARDS dataset. This function is typically called at the end of the program. The above steps result in an ARDS data frame. Once this data frame is extracted, you may save it to disk, or insert it into a database, as desired. ## ARDS Structure The data structure produced by the **ards** package is structure recommended by [CDISC](https://www.cdisc.org/). This structure puts all analysis values into a single column. Therefore, there is one row per analysis value. Descriptive information, such as the the name of the analysis variable and the by groups, are stored in other columns. Here is the data dictionary for the ARDS dataset: ## How to Use **ards** To see how the **ards** functions work, let us first perform a very simple analysis on the **mtcars** sample data frame. Examine the following example: ```{r eval=FALSE, echo=TRUE} library(dplyr) library(ards) # Initialize the ARDS # - These values will be repeated on all rows in the ARDS dataset init_ards(studyid = "MTCARS", tableid = "01", adsns = "mtcars", population = "all cars", time = "1973") # Perform analysis on MPG # - Using cylinders as a by group analdf <- mtcars |> select(cyl, mpg) |> group_by(cyl) |> summarize(n = n(), mean = mean(mpg), std = sd(mpg), min = min(mpg), max = max(mpg)) # View analysis data analdf # cyl n mean std min max # # 1 4 11 26.7 4.51 21.4 33.9 # 2 6 7 19.7 1.45 17.8 21.4 # 3 8 14 15.1 2.56 10.4 19.2 # Add analysis data to ARDS # - These values will be unique for each row in the ARDS dataset add_ards(analdf, statvars = c("n", "mean", "std", "min", "max"), anal_var = "mpg", trtvar = "cyl") # Get the ARDS # - Remove by-variables to make the ARDS dataset easier to read ards <- get_ards() |> select(-starts_with("by")) # Uncomment to view ards # View(ards) ``` Here is an image of the ARDS dataset created above: ## More Realistic Example As can be seen above, the functions in the **ards** package are easy to use. They can be integrated into a standard Table, Listing, and Figure (TLF) program, or a data preparation program. In many cases, the program does not need to be restructured to accommodate the **ards** functions. Let us look at a more realistic program. In this demonstration, we will create the analysis for a Demographics table. Note the following about this program: * The input data is a subset of 10 subjects and some relevant variables to make the program easier to read and understand. * The `init_ards()` function is called once at the top of the program. The values passed will be the same for all rows of the ARDS. * The `add_ards()` function identifies variables in the analysis dataset that you want to add to the ARDS. The variable values will be extracted from the analysis data and transformed into the ARDS structure. * The `add_ards()` function can be placed in the middle of a data pipeline, and will not interfere with your analysis. * The `add_ards()` function is called immediately after the calculations, before any formatting or transformations. The function is called at this point so that the ARDS will contains the original numeric values with full precision. * When the analysis is complete, the `get_ards()` is called to retrieve the ARDS dataset. The ARDS dataset is a standard R data frame that can be saved, combined with other analysis results, or passed to another program for additional processing. ```{r eval=FALSE, echo=TRUE} library(dplyr) library(tibble) library(tidyr) library(ards) # Create input data adsl <- read.table(header = TRUE, text = ' STUDYID DOMAIN USUBJID SUBJID SITEID BRTHDTC AGE AGEU SEX RACE ETHNIC ARMCD ARM ABC DM ABC-01-049 49 1 11/12/1966 39 YEARS M "WHITE" "NOT HISPANIC OR LATINO" 4 "ARM D" ABC DM ABC-01-050 50 1 12/19/1958 47 YEARS M "WHITE" "NOT HISPANIC OR LATINO" 2 "ARM B" ABC DM ABC-01-051 51 1 5/2/1972 34 YEARS M "WHITE" "NOT HISPANIC OR LATINO" 1 "ARM A" ABC DM ABC-01-052 52 1 6/27/1961 45 YEARS F "WHITE" "UNKNOWN" 3 "ARM C" ABC DM ABC-01-053 53 1 4/7/1980 26 YEARS F "WHITE" "NOT HISPANIC OR LATINO" 2 "ARM B" ABC DM ABC-01-054 54 1 9/13/1962 44 YEARS M "WHITE" "NOT HISPANIC OR LATINO" 4 "ARM D" ABC DM ABC-01-055 55 1 6/11/1959 47 YEARS F "BLACK OR AFRICAN AMERICAN" "UNKNOWN" 3 "ARM C" ABC DM ABC-01-056 56 1 5/2/1975 31 YEARS M "WHITE" "NOT HISPANIC OR LATINO" 1 "ARM A" ABC DM ABC-01-113 113 1 2/8/1932 74 YEARS M "WHITE" "UNKNOWN" 4 "ARM D"') # Initalize ARDS init_ards(studyid = "ABC", tableid = "01", adsns = "adsl", population = "safety population", time = "SCREENING", where = "saffl = TRUE", reset = TRUE) # Perform analysis on AGE variable agedf <- adsl |> select(AGE, ARM) |> group_by(ARM) |> summarize(n = n(), mean = mean(AGE), std = sd(AGE), median = median(AGE), min = min(AGE), max = max(AGE)) |> mutate(analvar = "AGE") |> ungroup() |> add_ards(statvars = c("n", "mean", "std", "median", "min", "max"), statdesc = c("N", "Mean", "Std", "Median", "Min", "Max"), anal_var = "AGE", trtvar = "ARM") |> transmute(analvar, ARM, n = sprintf("%d", n), mean_sd = sprintf("%.1f (%.2f)", mean, std), median = sprintf("%.1f", median), min_max = sprintf("%.1f-%.1f", min, max)) |> pivot_longer(c(n, mean_sd, median, min_max), names_to = "label", values_to = "stats") |> pivot_wider(names_from = ARM, values_from = c(stats)) |> transmute(analvar, label = c("N", "Mean (Std)", "Median", "Min-Max"), trtA = `ARM A`, trtB = `ARM B`, trtC = `ARM C`, trtD = `ARM D`) # View analysis results agedf # # A tibble: 4 × 6 # analvar label trtA trtB trtC trtD # # 1 AGE N 2 2 2 3 # 2 AGE Mean (Std) 32.5 (2.12) 36.5 (14.85) 46.0 (1.41) 52.3 (18.93) # 3 AGE Median 32.5 36.5 46.0 44.0 # 4 AGE Min-Max 31.0-34.0 26.0-47.0 45.0-47.0 39.0-74.0 # Get population counts trt_pop <- count(adsl, ARM) |> deframe() trt_pop # ARM A ARM B ARM C ARM D # 2 2 2 3 # Perform analysis on SEX variable sexdf <- adsl |> mutate(denom = trt_pop[paste0(adsl$ARM)]) |> group_by(SEX, ARM, denom) |> summarize(cnt = n()) |> transmute(SEX, ARM, cnt, analvar = "SEX", label = SEX, pct = cnt / denom * 100) |> ungroup() |> add_ards(statvars = c("cnt", "pct"), statdesc = "label", anal_var = "SEX", trtvar = "ARM") |> pivot_wider(names_from = ARM, values_from = c(cnt, pct)) |> transmute(analvar, label, trtA = sprintf("%1d (%3.0f%%)", `cnt_ARM A`, `pct_ARM A`), trtB = sprintf("%1d (%3.0f%%)", `cnt_ARM B`, `pct_ARM B`), trtC = sprintf("%1d (%3.0f%%)", `cnt_ARM C`, `pct_ARM C`), trtD = sprintf("%1d (%3.0f%%)", `cnt_ARM D`, `pct_ARM D`)) # View analysis results sexdf # analvar label trtA trtB trtC trtD # # 1 SEX F NA ( NA%) 1 ( 50%) 2 (100%) NA ( NA%) # 2 SEX M 2 (100%) 1 ( 50%) NA ( NA%) 3 (100%) # Perform analysis on RACE racedf <- adsl |> mutate(denom = trt_pop[paste0(adsl$ARM)]) |> group_by(RACE, ARM, denom) |> summarize(cnt = n()) |> transmute(RACE, ARM, cnt, analvar = "RACE", label = RACE, pct = cnt / denom * 100) |> ungroup() |> add_ards(statvars = c("cnt", "pct"), statdesc = "label", anal_var = "RACE", trtvar = "ARM") |> pivot_wider(names_from = ARM, values_from = c(cnt, pct)) |> transmute(analvar, label, trtA = sprintf("%1d (%3.0f%%)", `cnt_ARM A`, `pct_ARM A`), trtB = sprintf("%1d (%3.0f%%)", `cnt_ARM B`, `pct_ARM B`), trtC = sprintf("%1d (%3.0f%%)", `cnt_ARM C`, `pct_ARM C`), trtD = sprintf("%1d (%3.0f%%)", `cnt_ARM D`, `pct_ARM D`)) # View analysis results racedf # analvar label trtA trtB trtC trtD # # 1 RACE BLACK OR AFRICAN AMERICAN NA ( NA%) NA ( NA%) 1 ( 50%) NA ( NA%) # 2 RACE WHITE 2 (100%) 2 (100%) 1 ( 50%) 3 (100%) # Combine all analysis into final data frame final <- bind_rows(agedf, sexdf, racedf) # View final data frame # - This data frame would normally used for reporting final # # A tibble: 8 × 6 # analvar label trtA trtB trtC trtD # # 1 AGE N 2 2 2 3 # 2 AGE Mean (Std) 32.5 (2.12) 36.5 (14.85) 46.0 (1.41) 52.3 (18.93) # 3 AGE Median 32.5 36.5 46.0 44.0 # 4 AGE Min-Max 31.0-34.0 26.0-47.0 45.0-47.0 39.0-74.0 # 5 SEX F NA ( NA%) 1 ( 50%) 2 (100%) NA ( NA%) # 6 SEX M 2 (100%) 1 ( 50%) NA ( NA%) 3 (100%) # 7 RACE BLACK OR AFRICAN AMERICAN NA ( NA%) NA ( NA%) 1 ( 50%) NA ( NA%) # 8 RACE WHITE 2 (100%) 2 (100%) 1 ( 50%) 3 (100%) # Extract ARDS # - Remove by-variables to improve readability ards <- get_ards() |> select(-starts_with("by")) # Uncomment to view ARDS # View(ards) ``` Below is the ARDS dataset created by the above code sample: