ards

The ards package creates Analysis Results Datasets (ARDS). ARDS are used to store the results of an analysis in a tabular form, so they can be examined and manipulated by downstream processes.

An ARDS dataset is created with the following steps:

  1. Initialize the ARDS
  2. Add data to the ARDS
  3. Extract the completed ARDS

The above three steps are performed with the following functions:

  • init_ards(): A function to initialize the ARDS dataset. This function is typically called at the beginning of a program.
  • add_ards(): A function to add data to the ARDS. This function is typically called along the way, as you create analysis data.
  • get_ards(): A function to extract the ARDS dataset. This function is typically called at the end of the program.

The above steps result in an ARDS data frame. Once this data frame is extracted, you may save it to disk, or insert it into a database, as desired.

ARDS Structure

The data structure produced by the ards package is structure recommended by CDISC. This structure puts all analysis values into a single column. Therefore, there is one row per analysis value. Descriptive information, such as the the name of the analysis variable and the by groups, are stored in other columns.

Here is the data dictionary for the ARDS dataset:

How to Use ards

To see how the ards functions work, let us first perform a very simple analysis on the mtcars sample data frame. Examine the following example:

library(dplyr)
library(ards)

# Initialize the ARDS
# - These values will be repeated on all rows in the ARDS dataset
init_ards(studyid = "MTCARS",
          tableid = "01", adsns = "mtcars",
          population = "all cars",
          time = "1973")

# Perform analysis on MPG
# - Using cylinders as a by group
analdf <- mtcars |> 
  select(cyl, mpg) |> 
  group_by(cyl) |> 
  summarize(n = n(),
            mean = mean(mpg),
            std = sd(mpg),
            min = min(mpg),
            max = max(mpg))

# View analysis data
analdf
#     cyl     n  mean   std   min   max
#   <dbl> <int> <dbl> <dbl> <dbl> <dbl>
# 1     4    11  26.7  4.51  21.4  33.9
# 2     6     7  19.7  1.45  17.8  21.4
# 3     8    14  15.1  2.56  10.4  19.2

# Add analysis data to ARDS
# - These values will be unique for each row in the ARDS dataset
add_ards(analdf, 
         statvars = c("n", "mean", "std", "min", "max"),
         anal_var = "mpg", trtvar = "cyl")
         
         
# Get the ARDS
# - Remove by-variables to make the ARDS dataset easier to read
ards <- get_ards() |> select(-starts_with("by"))

# Uncomment to view ards
# View(ards)

Here is an image of the ARDS dataset created above:

More Realistic Example

As can be seen above, the functions in the ards package are easy to use. They can be integrated into a standard Table, Listing, and Figure (TLF) program, or a data preparation program. In many cases, the program does not need to be restructured to accommodate the ards functions.

Let us look at a more realistic program. In this demonstration, we will create the analysis for a Demographics table. Note the following about this program:

  • The input data is a subset of 10 subjects and some relevant variables to make the program easier to read and understand.
  • The init_ards() function is called once at the top of the program. The values passed will be the same for all rows of the ARDS.
  • The add_ards() function identifies variables in the analysis dataset that you want to add to the ARDS. The variable values will be extracted from the analysis data and transformed into the ARDS structure.
  • The add_ards() function can be placed in the middle of a data pipeline, and will not interfere with your analysis.
  • The add_ards() function is called immediately after the calculations, before any formatting or transformations. The function is called at this point so that the ARDS will contains the original numeric values with full precision.
  • When the analysis is complete, the get_ards() is called to retrieve the ARDS dataset. The ARDS dataset is a standard R data frame that can be saved, combined with other analysis results, or passed to another program for additional processing.
library(dplyr)
library(tibble)
library(tidyr)
library(ards)

# Create input data
adsl <- read.table(header = TRUE, text = '
STUDYID DOMAIN  USUBJID SUBJID  SITEID  BRTHDTC AGE AGEU    SEX RACE    ETHNIC  ARMCD   ARM
ABC DM  ABC-01-049  49  1   11/12/1966  39  YEARS   M   "WHITE" "NOT HISPANIC OR LATINO"    4   "ARM D"
ABC DM  ABC-01-050  50  1   12/19/1958  47  YEARS   M   "WHITE" "NOT HISPANIC OR LATINO"    2   "ARM B"
ABC DM  ABC-01-051  51  1   5/2/1972    34  YEARS   M   "WHITE" "NOT HISPANIC OR LATINO"    1   "ARM A"
ABC DM  ABC-01-052  52  1   6/27/1961   45  YEARS   F   "WHITE" "UNKNOWN"   3   "ARM C"
ABC DM  ABC-01-053  53  1   4/7/1980    26  YEARS   F   "WHITE" "NOT HISPANIC OR LATINO"    2   "ARM B"
ABC DM  ABC-01-054  54  1   9/13/1962   44  YEARS   M   "WHITE" "NOT HISPANIC OR LATINO"    4   "ARM D"
ABC DM  ABC-01-055  55  1   6/11/1959   47  YEARS   F   "BLACK OR AFRICAN AMERICAN" "UNKNOWN"   3   "ARM C"
ABC DM  ABC-01-056  56  1   5/2/1975    31  YEARS   M   "WHITE" "NOT HISPANIC OR LATINO"    1   "ARM A"
ABC DM  ABC-01-113  113 1   2/8/1932    74  YEARS   M   "WHITE" "UNKNOWN"   4   "ARM D"')

# Initalize ARDS
init_ards(studyid = "ABC",
          tableid = "01", adsns = "adsl",
          population = "safety population",
          time = "SCREENING", where = "saffl = TRUE", reset = TRUE)

# Perform analysis on AGE variable
agedf <- adsl |>
  select(AGE, ARM) |>
  group_by(ARM) |>
  summarize(n = n(),
            mean = mean(AGE),
            std = sd(AGE),
            median = median(AGE),
            min = min(AGE),
            max = max(AGE)) |>
  mutate(analvar = "AGE") |>
  ungroup() |>
  add_ards(statvars = c("n", "mean", "std", "median", "min", "max"),
           statdesc = c("N", "Mean", "Std", "Median", "Min", "Max"),
           anal_var = "AGE", trtvar = "ARM") |>
  transmute(analvar, ARM,
            n = sprintf("%d", n),
            mean_sd = sprintf("%.1f (%.2f)", mean, std),
            median = sprintf("%.1f", median),
            min_max = sprintf("%.1f-%.1f", min, max)) |>
  pivot_longer(c(n, mean_sd, median, min_max),
               names_to = "label", values_to = "stats") |>
  pivot_wider(names_from = ARM,
              values_from = c(stats)) |>
  transmute(analvar, label = c("N", "Mean (Std)", "Median", "Min-Max"),
           trtA = `ARM A`, trtB = `ARM B`, trtC = `ARM C`, trtD = `ARM D`)

# View analysis results
agedf
#   # A tibble: 4 × 6
#   analvar label      trtA        trtB         trtC        trtD        
#   <chr>   <chr>      <chr>       <chr>        <chr>       <chr>       
# 1 AGE     N          2           2            2           3           
# 2 AGE     Mean (Std) 32.5 (2.12) 36.5 (14.85) 46.0 (1.41) 52.3 (18.93)
# 3 AGE     Median     32.5        36.5         46.0        44.0        
# 4 AGE     Min-Max    31.0-34.0   26.0-47.0    45.0-47.0   39.0-74.0  

# Get population counts
trt_pop <- count(adsl, ARM) |> deframe()

trt_pop
# ARM A ARM B ARM C ARM D 
#   2     2     2     3 

# Perform analysis on SEX variable
sexdf <- adsl |>
  mutate(denom = trt_pop[paste0(adsl$ARM)]) |>
  group_by(SEX, ARM, denom) |>
  summarize(cnt = n()) |>
  transmute(SEX, ARM, cnt, analvar = "SEX", label = SEX,  pct =  cnt / denom * 100) |>
  ungroup() |>
  add_ards(statvars = c("cnt", "pct"), statdesc = "label",
           anal_var = "SEX", trtvar = "ARM") |>
  pivot_wider(names_from = ARM,
              values_from = c(cnt, pct)) |>
  transmute(analvar, label,
            trtA = sprintf("%1d (%3.0f%%)", `cnt_ARM A`, `pct_ARM A`),
            trtB = sprintf("%1d (%3.0f%%)", `cnt_ARM B`, `pct_ARM B`),
            trtC = sprintf("%1d (%3.0f%%)", `cnt_ARM C`, `pct_ARM C`),
            trtD = sprintf("%1d (%3.0f%%)", `cnt_ARM D`, `pct_ARM D`))

# View analysis results
sexdf
#   analvar label trtA      trtB     trtC      trtD     
#   <chr>   <chr> <chr>     <chr>    <chr>     <chr>    
# 1 SEX     F     NA ( NA%) 1 ( 50%) 2 (100%)  NA ( NA%)
# 2 SEX     M     2 (100%)  1 ( 50%) NA ( NA%) 3 (100%) 

# Perform analysis on RACE
racedf <- adsl |>
  mutate(denom = trt_pop[paste0(adsl$ARM)]) |>
  group_by(RACE, ARM, denom) |>
  summarize(cnt = n()) |>
  transmute(RACE, ARM, cnt, analvar = "RACE", label = RACE,  pct =  cnt / denom * 100) |>
  ungroup() |>
  add_ards(statvars = c("cnt", "pct"), statdesc = "label",
           anal_var = "RACE", trtvar = "ARM") |>
  pivot_wider(names_from = ARM,
              values_from = c(cnt, pct)) |>
  transmute(analvar, label,
            trtA = sprintf("%1d (%3.0f%%)", `cnt_ARM A`, `pct_ARM A`),
            trtB = sprintf("%1d (%3.0f%%)", `cnt_ARM B`, `pct_ARM B`),
            trtC = sprintf("%1d (%3.0f%%)", `cnt_ARM C`, `pct_ARM C`),
            trtD = sprintf("%1d (%3.0f%%)", `cnt_ARM D`, `pct_ARM D`))

# View analysis results
racedf
#   analvar label                     trtA      trtB      trtC     trtD     
#   <chr>   <chr>                     <chr>     <chr>     <chr>    <chr>    
# 1 RACE    BLACK OR AFRICAN AMERICAN NA ( NA%) NA ( NA%) 1 ( 50%) NA ( NA%)
# 2 RACE    WHITE                     2 (100%)  2 (100%)  1 ( 50%) 3 (100%) 

# Combine all analysis into final data frame
final <- bind_rows(agedf, sexdf, racedf)

# View final data frame
# - This data frame would normally used for reporting
final
# # A tibble: 8 × 6
#   analvar label                     trtA        trtB         trtC        trtD        
#   <chr>   <chr>                     <chr>       <chr>        <chr>       <chr>       
# 1 AGE     N                         2           2            2           3           
# 2 AGE     Mean (Std)                32.5 (2.12) 36.5 (14.85) 46.0 (1.41) 52.3 (18.93)
# 3 AGE     Median                    32.5        36.5         46.0        44.0        
# 4 AGE     Min-Max                   31.0-34.0   26.0-47.0    45.0-47.0   39.0-74.0   
# 5 SEX     F                         NA ( NA%)   1 ( 50%)     2 (100%)    NA ( NA%)   
# 6 SEX     M                         2 (100%)    1 ( 50%)     NA ( NA%)   3 (100%)    
# 7 RACE    BLACK OR AFRICAN AMERICAN NA ( NA%)   NA ( NA%)    1 ( 50%)    NA ( NA%)   
# 8 RACE    WHITE                     2 (100%)    2 (100%)     1 ( 50%)    3 (100%)  


# Extract ARDS
# - Remove by-variables to improve readability
ards <- get_ards() |> select(-starts_with("by"))
                             
# Uncomment to view ARDS                             
# View(ards)

Below is the ARDS dataset created by the above code sample: