Consolidate datacube into a single dataset

This function consolidates a set of datasets in a 'many* package' datacube into a single dataset with some combination of the rows, columns, and observations of the datasets in the datacube. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.

Usage

consolidate(
  datacube,
  rows = "any",
  cols = "any",
  resolve = "coalesce",
  key = "manyID"
)

Arguments

datacube: A datacube from one of the many packages
rows: Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets.
cols: Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets.
resolve: How should conflicts between observations be resolved? By default "coalesce", but other options include: "min", "max", "mean", "median", and "random". "coalesce" takes the first non-NA value. "max" takes the largest value. "min" takes the smallest value. "mean" takes the average value. "median" takes the median value. "random" takes a random value. For different variables to be resolved differently, you can specify the variables' names alongside how each is to be resolved in a list (e.g. resolve = c(var1 = "min", var2 = "max")). In this case, only the variables named will be resolved and returned.
key: An ID column to collapse by. By default "manyID". Users can also specify multiple key variables in a list. For multiple key variables, the key variables must be present in all the datasets in the datacube (e.g. key = c("key1", "key2")). For equivalent key columns with different names across datasets, matching is possible if keys are declared (e.g. key = c("key1" = "key2")). Missing observations in the key variable are removed.

Value

A single tibble/data frame.

Details

Text variables are dropped for more efficient consolidation.

Examples

# \donttest{
consolidate(datacube = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 138 × 15
#>    ID       Begin End   FullName Birth Death CityBirth ProvinceBirth Rise  Cause
#>    <chr>    <chr> <chr> <chr>    <chr> <chr> <chr>     <chr>         <chr> <chr>
#>  1 Aemilian 0253… 0253… CAESAR … 0207… 0253… NA        Africa        Appo… Assa…
#>  2 Allectus 0293  0297  ?        ?     297   NA        NA            NA    NA   
#>  3 Anastas… 0491  0518  Flavius… 430   518   NA        NA            NA    NA   
#>  4 Anthemi… 0467  0472  Procopi… 420   472   NA        NA            NA    NA   
#>  5 Antonin… 0138  0161  Titus A… 86    161   NA        NA            NA    NA   
#>  6 Antoniu… 0138… 0161… CAESAR … 0086… 0161… Lanuvium  Italia        Birt… Natu…
#>  7 Arcadius 0395  0408  Flavius… 377   408   NA        NA            NA    NA   
#>  8 Augustus -002… 0014… IMPERAT… 0062… 0014… Rome      Italia        Birt… Assa…
#>  9 Aulus V… 0069… 0069… NA       NA    NA    NA        NA            NA    NA   
#> 10 Aurelian 0270… 0275… CAESAR … 0214… 0275… Sirmium   Pannonia      Appo… Assa…
#> # ℹ 128 more rows
#> # ℹ 5 more variables: Killer <chr>, Dynasty <chr>, Era <chr>, Notes <chr>,
#> #   Verif <chr>
consolidate(datacube = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin End  
#>    <chr>          <chr> <chr>
#>  1 Aemilian       0253  0253 
#>  2 Augustus       -0027 -0014
#>  3 Aurelian       0270  0275 
#>  4 Balbinus       0238  0238 
#>  5 Caracalla      0211  0217 
#>  6 Carinus        0283  0285 
#>  7 Carus          0282  0283 
#>  8 Claudius       0041  0054 
#>  9 Commodus       0180  0192 
#> 10 Constantine II 0337  0340 
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  3 failed to parse.
#> Warning:  3 failed to parse.
#> Warning:  86 failed to parse.
#> Warning:  86 failed to parse.
#> Warning:  71 failed to parse.
#> Warning:  71 failed to parse.
#> Warning:  87 failed to parse.
#> Warning:  87 failed to parse.
#> Warning:  73 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 138 × 3
#>    ID              Begin      End       
#>    <chr>           <chr>      <chr>     
#>  1 Aemilian        253-08-15  253-10-15 
#>  2 Allectus        NA         NA        
#>  3 Anastasius      NA         NA        
#>  4 Anthemius       NA         NA        
#>  5 Antoninus Pius  NA         NA        
#>  6 Antonius Pius   138-07-10  161-03-07 
#>  7 Arcadius        NA         NA        
#>  8 Augustus        -26-01-16  14-08-19  
#>  9 Aulus Vitellius 1969-07-01 1969-12-01
#> 10 Aurelian        270-09-15  275-09-15 
#> # ℹ 128 more rows
consolidate(datacube = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  2 failed to parse.
#> Warning:  2 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 15
#>    ID         Begin      End        FullName Birth Death CityBirth ProvinceBirth
#>    <chr>      <date>     <date>     <chr>    <chr> <chr> <chr>     <chr>        
#>  1 Aemilian   0253-08-15 0253-10-15 "Marcus… 207?  253   NA        Africa       
#>  2 Augustus   -026-01-16 2014-12-31 "IMPERA… 63 BC 14    Rome      Italia       
#>  3 Aurelian   0270-09-15 0275-09-15 "Lucius… 214   275   Sirmium   Pannonia     
#>  4 Balbinus   0238-04-22 0238-07-29 "Decimu… 170?  238   NA        Unknown      
#>  5 Caracalla  -Inf       0217-04-08 "born L… 188   217   Lugdunum  Gallia Lugdu…
#>  6 Carinus    0283-08-01 0285-08-01 "Marcus… ?     285   NA        Unknown      
#>  7 Carus      0282-10-01 0283-08-01 "Marcus… 230?  283   Narbo     Gallia Narbo…
#>  8 Claudius   2041-12-31 2054-12-31 "Tiberi… 10 BC 41    Lugdunum  Gallia Lugdu…
#>  9 Commodus   -Inf       0192-12-31 "Marcus… 161   192   Lanuvium  Italia       
#> 10 Constanti… 0337-05-22 0340-01-01 "Flaviu… 317   340   Arelate   Gallia Narbo…
#> # ℹ 31 more rows
#> # ℹ 7 more variables: Rise <chr>, Cause <chr>, Killer <chr>, Dynasty <chr>,
#> #   Era <chr>, Notes <chr>, Verif <chr>
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  2 failed to parse.
#> Warning:  2 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <date>     <date>    
#>  1 Aemilian       0253-08-15 0253-10-15
#>  2 Augustus       -026-01-16 1014-07-26
#>  3 Aurelian       0270-09-15 0275-09-15
#>  4 Balbinus       0238-04-22 0238-07-29
#>  5 Caracalla      NA         0217-04-08
#>  6 Carinus        0283-08-01 0285-08-01
#>  7 Carus          0282-10-01 0283-08-01
#>  8 Claudius       2041-07-02 2054-07-02
#>  9 Commodus       NA         0192-12-31
#> 10 Constantine II 0337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  2 failed to parse.
#> Warning:  2 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin       End       
#>    <chr>          <chr>       <date>    
#>  1 Aemilian       253-08-15   0253-10-15
#>  2 Augustus       -1361-01-06 0005-03-18
#>  3 Aurelian       270-09-15   0275-09-15
#>  4 Balbinus       238-04-22   0238-07-29
#>  5 Caracalla      NA          0217-04-08
#>  6 Carinus        283-08-01   0285-08-01
#>  7 Carus          282-10-01   0283-08-01
#>  8 Claudius       1374-09-09  1387-12-05
#>  9 Commodus       NA          0192-12-31
#> 10 Constantine II 337-05-22   0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  2 failed to parse.
#> Warning:  2 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin      End       
#>    <chr>          <chr>      <chr>     
#>  1 Aemilian       0253-08-15 NA        
#>  2 Augustus       NA         NA        
#>  3 Aurelian       NA         0275-09-15
#>  4 Balbinus       0238-04-22 NA        
#>  5 Caracalla      NA         NA        
#>  6 Carinus        NA         NA        
#>  7 Carus          NA         0283-08-01
#>  8 Claudius       0041-01-25 2054-09-25
#>  9 Commodus       NA         NA        
#> 10 Constantine II 0337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = c(Begin = "min", End = "max"), key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning:  2 failed to parse.
#> Warning:  2 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  30 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  29 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  31 failed to parse.
#> Warning:  30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Begin     End       
#>    <chr>          <chr>     <date>    
#>  1 Aemilian       253-08-15 0253-10-15
#>  2 Augustus       -26-01-16 2014-12-31
#>  3 Aurelian       270-09-15 0275-09-15
#>  4 Balbinus       238-04-22 0238-07-29
#>  5 Caracalla      NA        0217-04-08
#>  6 Carinus        283-08-01 0285-08-01
#>  7 Carus          282-10-01 0283-08-01
#>  8 Claudius       41-01-25  2054-12-31
#>  9 Commodus       NA        0192-12-31
#> 10 Constantine II 337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Begin"))
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 202 × 4
#>    ID             Begin       Cause          Death      
#>    <chr>          <chr>       <chr>          <chr>      
#>  1 Aemilian       0253        NA             253        
#>  2 Aemilian       0253-08-15~ Assassination  0253-10-15~
#>  3 Allectus       0293        NA             297        
#>  4 Anastasius     0491        NA             518        
#>  5 Anthemius      0467        NA             472        
#>  6 Antoninus Pius 0138        NA             161        
#>  7 Antonius Pius  0138-07-10  Natural Causes 0161-03-07 
#>  8 Arcadius       0383        NA             NA         
#>  9 Arcadius       0395        NA             408        
#> 10 Augustus       -0026-01-16 Assassination  0014-08-19 
#> # ℹ 192 more rows
# }