This function consolidates a set of datasets in a 'many* package' datacube into a single dataset with some combination of the rows, columns, and observations of the datasets in the datacube. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.
Arguments
- datacube
A datacube from one of the many packages
- rows
Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets.
- cols
Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets.
- resolve
How should conflicts between observations be resolved? By default "coalesce", but other options include: "min", "max", "mean", "median", and "random". "coalesce" takes the first non-NA value. "max" takes the largest value. "min" takes the smallest value. "mean" takes the average value. "median" takes the median value. "random" takes a random value. For different variables to be resolved differently, you can specify the variables' names alongside how each is to be resolved in a list (e.g.
resolve = c(var1 = "min", var2 = "max")
). In this case, only the variables named will be resolved and returned.- key
An ID column to collapse by. By default "manyID". Users can also specify multiple key variables in a list. For multiple key variables, the key variables must be present in all the datasets in the datacube (e.g.
key = c("key1", "key2")
). For equivalent key columns with different names across datasets, matching is possible if keys are declared (e.g.key = c("key1" = "key2")
). Missing observations in the key variable are removed.
Examples
# \donttest{
consolidate(datacube = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 138 × 15
#> ID Begin End FullName Birth Death CityBirth ProvinceBirth Rise Cause
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Aemilian 0253… 0253… CAESAR … 0207… 0253… NA Africa Appo… Assa…
#> 2 Allectus 0293 0297 ? ? 297 NA NA NA NA
#> 3 Anastas… 0491 0518 Flavius… 430 518 NA NA NA NA
#> 4 Anthemi… 0467 0472 Procopi… 420 472 NA NA NA NA
#> 5 Antonin… 0138 0161 Titus A… 86 161 NA NA NA NA
#> 6 Antoniu… 0138… 0161… CAESAR … 0086… 0161… Lanuvium Italia Birt… Natu…
#> 7 Arcadius 0395 0408 Flavius… 377 408 NA NA NA NA
#> 8 Augustus -002… 0014… IMPERAT… 0062… 0014… Rome Italia Birt… Assa…
#> 9 Aulus V… 0069… 0069… NA NA NA NA NA NA NA
#> 10 Aurelian 0270… 0275… CAESAR … 0214… 0275… Sirmium Pannonia Appo… Assa…
#> # ℹ 128 more rows
#> # ℹ 5 more variables: Killer <chr>, Dynasty <chr>, Era <chr>, Notes <chr>,
#> # Verif <chr>
consolidate(datacube = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#> ID Begin End
#> <chr> <chr> <chr>
#> 1 Aemilian 0253 0253
#> 2 Augustus -0027 -0014
#> 3 Aurelian 0270 0275
#> 4 Balbinus 0238 0238
#> 5 Caracalla 0211 0217
#> 6 Carinus 0283 0285
#> 7 Carus 0282 0283
#> 8 Claudius 0041 0054
#> 9 Commodus 0180 0192
#> 10 Constantine II 0337 0340
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 3 failed to parse.
#> Warning: 3 failed to parse.
#> Warning: 86 failed to parse.
#> Warning: 86 failed to parse.
#> Warning: 71 failed to parse.
#> Warning: 71 failed to parse.
#> Warning: 87 failed to parse.
#> Warning: 87 failed to parse.
#> Warning: 73 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 138 × 3
#> ID Begin End
#> <chr> <chr> <chr>
#> 1 Aemilian 253-08-15 253-10-15
#> 2 Allectus NA NA
#> 3 Anastasius NA NA
#> 4 Anthemius NA NA
#> 5 Antoninus Pius NA NA
#> 6 Antonius Pius 138-07-10 161-03-07
#> 7 Arcadius NA NA
#> 8 Augustus -26-01-16 14-08-19
#> 9 Aulus Vitellius 1969-07-01 1969-12-01
#> 10 Aurelian 270-09-15 275-09-15
#> # ℹ 128 more rows
consolidate(datacube = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 15
#> ID Begin End FullName Birth Death CityBirth ProvinceBirth
#> <chr> <date> <date> <chr> <chr> <chr> <chr> <chr>
#> 1 Aemilian 0253-08-15 0253-10-15 "Marcus… 207? 253 NA Africa
#> 2 Augustus -026-01-16 2014-12-31 "IMPERA… 63 BC 14 Rome Italia
#> 3 Aurelian 0270-09-15 0275-09-15 "Lucius… 214 275 Sirmium Pannonia
#> 4 Balbinus 0238-04-22 0238-07-29 "Decimu… 170? 238 NA Unknown
#> 5 Caracalla -Inf 0217-04-08 "born L… 188 217 Lugdunum Gallia Lugdu…
#> 6 Carinus 0283-08-01 0285-08-01 "Marcus… ? 285 NA Unknown
#> 7 Carus 0282-10-01 0283-08-01 "Marcus… 230? 283 Narbo Gallia Narbo…
#> 8 Claudius 2041-12-31 2054-12-31 "Tiberi… 10 BC 41 Lugdunum Gallia Lugdu…
#> 9 Commodus -Inf 0192-12-31 "Marcus… 161 192 Lanuvium Italia
#> 10 Constanti… 0337-05-22 0340-01-01 "Flaviu… 317 340 Arelate Gallia Narbo…
#> # ℹ 31 more rows
#> # ℹ 7 more variables: Rise <chr>, Cause <chr>, Killer <chr>, Dynasty <chr>,
#> # Era <chr>, Notes <chr>, Verif <chr>
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#> ID Begin End
#> <chr> <date> <date>
#> 1 Aemilian 0253-08-15 0253-10-15
#> 2 Augustus -026-01-16 1014-07-26
#> 3 Aurelian 0270-09-15 0275-09-15
#> 4 Balbinus 0238-04-22 0238-07-29
#> 5 Caracalla NA 0217-04-08
#> 6 Carinus 0283-08-01 0285-08-01
#> 7 Carus 0282-10-01 0283-08-01
#> 8 Claudius 2041-07-02 2054-07-02
#> 9 Commodus NA 0192-12-31
#> 10 Constantine II 0337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#> ID Begin End
#> <chr> <chr> <date>
#> 1 Aemilian 253-08-15 0253-10-15
#> 2 Augustus -1361-01-06 0005-03-18
#> 3 Aurelian 270-09-15 0275-09-15
#> 4 Balbinus 238-04-22 0238-07-29
#> 5 Caracalla NA 0217-04-08
#> 6 Carinus 283-08-01 0285-08-01
#> 7 Carus 282-10-01 0283-08-01
#> 8 Claudius 1374-09-09 1387-12-05
#> 9 Commodus NA 0192-12-31
#> 10 Constantine II 337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#> ID Begin End
#> <chr> <chr> <chr>
#> 1 Aemilian 0253-08-15 NA
#> 2 Augustus NA NA
#> 3 Aurelian NA 0275-09-15
#> 4 Balbinus 0238-04-22 NA
#> 5 Caracalla NA NA
#> 6 Carinus NA NA
#> 7 Carus NA 0283-08-01
#> 8 Claudius 0041-01-25 2054-09-25
#> 9 Commodus NA NA
#> 10 Constantine II 0337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = c(Begin = "min", End = "max"), key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> ℹ Resolving conflicts...
#> Warning: 2 failed to parse.
#> Warning: 2 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 30 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 29 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 31 failed to parse.
#> Warning: 30 failed to parse.
#> ℹ Coalescing compatible rows...
#> # A tibble: 41 × 3
#> ID Begin End
#> <chr> <chr> <date>
#> 1 Aemilian 253-08-15 0253-10-15
#> 2 Augustus -26-01-16 2014-12-31
#> 3 Aurelian 270-09-15 0275-09-15
#> 4 Balbinus 238-04-22 0238-07-29
#> 5 Caracalla NA 0217-04-08
#> 6 Carinus 283-08-01 0285-08-01
#> 7 Carus 282-10-01 0283-08-01
#> 8 Claudius 41-01-25 2054-12-31
#> 9 Commodus NA 0192-12-31
#> 10 Constantine II 337-05-22 0340-01-01
#> # ℹ 31 more rows
consolidate(datacube = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Begin"))
#> ℹ Resolving conflicts...
#> ℹ Coalescing compatible rows...
#> # A tibble: 202 × 4
#> ID Begin Cause Death
#> <chr> <chr> <chr> <chr>
#> 1 Aemilian 0253 NA 253
#> 2 Aemilian 0253-08-15~ Assassination 0253-10-15~
#> 3 Allectus 0293 NA 297
#> 4 Anastasius 0491 NA 518
#> 5 Anthemius 0467 NA 472
#> 6 Antoninus Pius 0138 NA 161
#> 7 Antonius Pius 0138-07-10 Natural Causes 0161-03-07
#> 8 Arcadius 0383 NA NA
#> 9 Arcadius 0395 NA 408
#> 10 Augustus -0026-01-16 Assassination 0014-08-19
#> # ℹ 192 more rows
# }