Format Apply Function

How to use fapply()

The fapply() function applies a format to a vector, factor, or list. This function may be used independently of the fdata() function. Here is an example:

v1 <- c("A", "B", "C", "B")
v1

# [1] "A" "B" "C" "B"

fmt1 <- value(condition(x == "A", "Label A"),
              condition(x == "B", "Label B"),
              condition(TRUE, "Other"))

fapply(v1, fmt1)

# "Label A" "Label B"   "Other" "Label B" 

One advantage of using fapply() is that your original data is not altered. The formatted values are assigned to a new object. If your original data changes, the formatting function should be reapplied to maintain consistency with the original data.

What kind of formats are available

Data can be formatted with several different types of objects:

  • A formatting string
  • A named vector
  • A user-defined format
  • A vectorized function
  • A formatting list

You can use the type of formatting object that is most suitable to your data and situation. Each type of formatting object has it’s own strengths and weaknesses.

Formatting String

The formatting functions accept formatting strings such as those associated with the Base R format() and sprintf() functions. If the data type of the vector is a date or datetime, fapply() will use the format codes associated with the format() function. For other data types, fapply() will use the format codes associated with sprintf. Here is an example:

v1 <- c(1.367, 8.356, 4.583, 2.873)

fapply(v1, "%.1f%%")

[1] "1.4%" "8.4%" "4.6%" "2.9%"

Named vectors

Data may be formatted using a named vector as a lookup. Simply ensure that the names on the formatting vector correspond to the values in the data vector.
The advantage of using a named vector for formatting is its simplicity. The disadvantage is that it only works with character values. Here is an example of formatting using a named vector:

v1 <- c("A", "B", "C", "B")

fmt1 <- c(A = "Label A", B = "Label B", C= "Label C")

fapply(v1, fmt1)

# "Label A" "Label B" "Label C" "Label B" 

User-defined formats

The fmtr package provides custom functions for creating user-defined formats, in a manner that is similar to a SAS® user-defined format. These functions are value() and condition(). The value() function accepts one or more conditions. The condition() function accepts an expression/label pair. A user-defined format has the advantage of a clear and flexible syntax. It is excellent for categorizing data. Here is an example of a user-defined format:

v1 <- c("A", "B", NA, "C")

fmt2 <- value(condition(is.na(x), "Missing"),
              condition(x == "A", "Label A"),
              condition(x == "B", "Label B"),
              condition(TRUE, "Other"))
              
fapply(v1, fmt2)

# "Label A" "Label B"   "Missing" "Other" 

The user-defined format may also be used to format values conditionally. Conditional formatting is accomplished by using a formatting string as the label. The following example formats a numeric value two decimal places, unless it exceeds a specified range.

v2 <- c(18.3987, 15.45852, 8.9835, 11.246246, 25.3858, NA)

fmt3 <- value(condition(is.na(x), "Missing"),
              condition(x < 10, "Low"),
              condition(x > 20, "High"),
              condition(TRUE, "%.2f"))
              
fapply(v2, fmt3)

# [1] "18.40"   "15.46"   "Low"     "11.25"   "High"    "Missing"

Vectorized functions

Vectorized functions provide the most powerful way of formatting data. Vectorized functions can be user-defined, or wrapping an available packaged function. The vectorized function has the advantage of being nearly limitless in the types of formatting you can perform. The drawback is that a vectorized function can be more complicated to write. Here is an example of formatting with a user-defined, vectorized function:

v1 <- c("A", "B", NA, "C")

fmt2 <- Vectorize(function(x) {
    
    if (is.na(x)) 
      ret <- "Missing"
    else if (x %in% c("A", "B"))
      ret <- paste("Label", x)
    else 
      ret <- "Other"
    
    return(ret)
    
  })
  
fapply(v1, fmt2)

# "Label A" "Label B"   "Missing" "Other" 

A formatting list

Sometimes data needs to be formatted differently for each row. This situation is difficult to deal with in any language.
But it can be made easy in R with the fmtr package and a formatting list.

A formatting list is a list that contains one or more of the four types of formatting objects described above. It is defined with the flist() function. A formatting list can be applied in two different ways: in order, or with a lookup.

By default, the list is applied in order. That means the first format in the list is applied to the first item in the vector, the second format in the list is applied to the second item in the vector, and so on. The list is recycled if the number of list items is shorter than the number of values in the vector.

For the lookup method, the formatting object is specified by a lookup vector. The lookup vector should contain names associated with the elements in the formatting list. The lookup vector should also contain the same number of items as the data vector. For each item in the data vector, fmtr will look up the appropriate format from the formatting list, and apply that format to the corresponding data value.

The following is an example of a lookup style formatting list:

# Set up data
v1 <- c("num", "char", "date", "char", "date", "num")
v2 <- list(1.258, "H", as.Date("2020-06-19"),
           "L", as.Date("2020-04-24"), 2.8865)

df <- data.frame(type = v1, values = I(v2))
df

#    type     values
# 1   num      1.258
# 2  char          H
# 3  date 2020-06-19
# 4  char          L
# 5  date 2020-04-24
# 6   num     2.8865

# Set up formatting list
lst <- flist(type = "row", lookup = v1,
             num = "%.1f",
             char = value(condition(x == "H", "High"),
                          condition(x == "L", "Low"),
                          condition(TRUE, "NA")),
             date = "%y-%m")

# Assign formatting list to values column
attr(df$values, "format") <- lst


# Apply formatting
fdata(df)

#   type values
# 1  num    1.3
# 2 char   High
# 3 date  20-06
# 4 char    Low
# 5 date  20-04
# 6  num    2.9

Next: Format Catalogs