Skip to contents

This function consolidates a set of datasets in a 'many* package' database into a single dataset with some combination of the rows, columns, and observations of the datasets in the database. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.

Usage

consolidate(
  database,
  rows = "any",
  cols = "any",
  resolve = "coalesce",
  key = "manyID"
)

Arguments

database

A database object from one of the many packages

rows

Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets.

cols

Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets.

resolve

How should conflicts between observations be resolved? By default "coalesce", but other options include: "min", "max", "mean", "median", and "random". "coalesce" takes the first non-NA value. "max" takes the largest value. "min" takes the smallest value. "mean" takes the average value. "median" takes the median value. "random" takes a random value. For different variables to be resolved differently, you can specify the variables' names alongside how each is to be resolved in a list (e.g. resolve = c(var1 = "min", var2 = "max")). In this case, only the variables named will be resolved and returned.

key

An ID column to collapse by. By default "many_ID". Users can also specify multiple key variables in a list. For multiple key variables, the key variables must be present in all the datasets in the database (e.g. key = c("key1", "key2")). For equivalent key columns with different names across datasets, matching is possible if keys are declared (e.g. key = c("key1" = "key2")). Missing observations in the key variable are removed.

Value

A single tibble/data frame.

Details

Text variables are dropped for more efficient consolidation.

Examples

# \donttest{
consolidate(database = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 138 × 15
#>    ID           CityB…¹ Provi…² Rise  Cause Killer Era   Notes Verif Birth Death
#>    <chr>        <chr>   <chr>   <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr>
#>  1 Aemilian     NA      Africa  Appo… Assa… Other… Prin… birt… NA    0207… 0253…
#>  2 Allectus     NA      NA      NA    NA    NA     NA    NA    NA    ?     297  
#>  3 Anastasius   NA      NA      NA    NA    NA     NA    NA    NA    430   518  
#>  4 Anthemius    NA      NA      NA    NA    NA     NA    NA    NA    420   472  
#>  5 Antoninus P… NA      NA      NA    NA    NA     NA    NA    NA    86    161  
#>  6 Antonius Pi… Lanuvi… Italia  Birt… Natu… Disea… Prin… NA    NA    0086… 0161…
#>  7 Arcadius     NA      NA      NA    NA    NA     NA    NA    NA    377   408  
#>  8 Augustus     Rome    Italia  Birt… Assa… Wife   Prin… birt… Redd… 0062… 0014…
#>  9 Aulus Vitel… NA      NA      NA    NA    NA     NA    NA    NA    NA    NA   
#> 10 Aurelian     Sirmium Pannon… Appo… Assa… Praet… Prin… NA    NA    0214… 0275…
#> # … with 128 more rows, 4 more variables: FullName <chr>, Dynasty <chr>,
#> #   Beg <mdate>, End <mdate>, and abbreviated variable names ¹​CityBirth,
#> #   ²​ProvinceBirth
consolidate(database = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Beg     End    
#>    <chr>          <mdate> <mdate>
#>  1 Aemilian       0253    0253   
#>  2 Augustus       -0027   -0014  
#>  3 Aurelian       0270    0275   
#>  4 Balbinus       0238    0238   
#>  5 Caracalla      0211    0217   
#>  6 Carinus        0283    0285   
#>  7 Carus          0282    0283   
#>  8 Claudius       0041    0054   
#>  9 Commodus       0180    0192   
#> 10 Constantine II 0337    0340   
#> # … with 31 more rows
consolidate(database = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 138 × 3
#>    ID              Beg        End       
#>    <chr>           <chr>      <chr>     
#>  1 Aemilian        0253-01-01 0253-01-01
#>  2 Allectus        0293       0297      
#>  3 Anastasius      0491       0518      
#>  4 Anthemius       0467       0472      
#>  5 Antoninus Pius  0138       0161      
#>  6 Antonius Pius   0138-07-10 0161-03-07
#>  7 Arcadius        0395       0408      
#>  8 Augustus        -026-01-16 -014-01-01
#>  9 Aulus Vitellius 0069-07    0069-12   
#> 10 Aurelian        0270-01-01 0275-01-01
#> # … with 128 more rows
consolidate(database = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 15
#>    ID           CityB…¹ Provi…² Rise  Cause Killer Era   Notes Verif Birth Death
#>    <chr>        <chr>   <chr>   <chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr>
#>  1 Aemilian     NA      Africa  Appo… Assa… Other… Prin… birt… NA    207?  253  
#>  2 Augustus     Rome    Italia  Birt… Assa… Wife   Prin… birt… Redd… 63 BC 14   
#>  3 Aurelian     Sirmium Pannon… Appo… Assa… Praet… Prin… NA    NA    214   275  
#>  4 Balbinus     NA      Unknown Appo… Assa… Praet… Prin… birt… NA    170?  238  
#>  5 Caracalla    Lugdun… Gallia… Birt… Assa… Other… Prin… reig… NA    188   217  
#>  6 Carinus      NA      Unknown Birt… Died… Oppos… Prin… deat… NA    ?     285  
#>  7 Carus        Narbo   Gallia… Seiz… Natu… Light… Prin… birt… NA    230?  283  
#>  8 Claudius     Lugdun… Gallia… Birt… Assa… Wife   Prin… birt… Redd… 10 BC 41   
#>  9 Commodus     Lanuvi… Italia  Birt… Assa… Praet… Prin… reig… NA    161   192  
#> 10 Constantine… Arelate Gallia… Birt… Exec… Other… Domi… birt… NA    317   340  
#> # … with 31 more rows, 4 more variables: FullName <chr>, Dynasty <chr>,
#> #   Beg <chr>, End <chr>, and abbreviated variable names ¹​CityBirth,
#> #   ²​ProvinceBirth
consolidate(database = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Beg        End       
#>    <chr>          <chr>      <chr>     
#>  1 Aemilian       0253-12-31 0253-12-31
#>  2 Augustus       -027-12-31 0014-08-19
#>  3 Aurelian       0270-12-31 0275-12-31
#>  4 Balbinus       0238-04-22 0238-07-29
#>  5 Caracalla      0198-12-31 0217-12-31
#>  6 Carinus        0283-12-31 0285-12-31
#>  7 Carus          0282-12-31 0283-12-31
#>  8 Claudius       0041-12-31 0054-12-31
#>  9 Commodus       0177-12-31 0192-12-31
#> 10 Constantine II 0337-12-31 0340-12-31
#> # … with 31 more rows
consolidate(database = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Beg         End        
#>    <chr>          <chr>       <chr>      
#>  1 Aemilian       0253-08-15~ 0253-10-15~
#>  2 Augustus       -0026-01-16 0014-08-19 
#>  3 Aurelian       0270-09-15  0275-09-15 
#>  4 Balbinus       0238-04-22  0238-07-29 
#>  5 Caracalla      0198        0217-04-08 
#>  6 Carinus        0283-08-01~ 0285-08-01~
#>  7 Carus          0282-10-01~ 0283-08-01~
#>  8 Claudius       0041-01-25  0054-10-13 
#>  9 Commodus       0177        0192-12-31 
#> 10 Constantine II 0337-05-22  0340-01-01 
#> # … with 31 more rows
consolidate(database = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Beg        End       
#>    <chr>          <chr>      <chr>     
#>  1 Aemilian       0253-12-31 0253-12-31
#>  2 Augustus       -031-12-31 0014-12-31
#>  3 Aurelian       0270-12-31 0275-09-15
#>  4 Balbinus       0238-04-22 0238-07-29
#>  5 Caracalla      0198-12-31 0217-12-31
#>  6 Carinus        0283-12-31 0285-12-31
#>  7 Carus          0282-12-31 0283-08-01
#>  8 Claudius       0041-01-25 0054-10-13
#>  9 Commodus       0177-12-31 0192-12-31
#> 10 Constantine II 0337-12-31 0340-12-31
#> # … with 31 more rows
consolidate(database = emperors, rows = "every", cols = "every",
resolve = c(Beg = "min", End = "max"), key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 41 × 3
#>    ID             Beg        End       
#>    <chr>          <chr>      <chr>     
#>  1 Aemilian       0253-01-01 0253-12-31
#>  2 Augustus       -026-01-16 0014-12-31
#>  3 Aurelian       0270-01-01 0275-12-31
#>  4 Balbinus       0238-01-01 0238-12-31
#>  5 Caracalla      0198-01-01 0217-12-31
#>  6 Carinus        0283-01-01 0285-12-31
#>  7 Carus          0282-01-01 0283-12-31
#>  8 Claudius       0041-01-01 0054-12-31
#>  9 Commodus       0177-01-01 0192-12-31
#> 10 Constantine II 0337-01-01 0340-12-31
#> # … with 31 more rows
consolidate(database = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Beg"))
#>  Resolving conflicts...
#>  Coalescing compatible rows...
#> # A tibble: 202 × 4
#>    ID             Beg         Cause          Death      
#>    <chr>          <mdate>     <chr>          <chr>      
#>  1 Aemilian       0253        NA             253        
#>  2 Aemilian       0253-08-15~ Assassination  0253-10-15~
#>  3 Allectus       0293        NA             297        
#>  4 Anastasius     0491        NA             518        
#>  5 Anthemius      0467        NA             472        
#>  6 Antoninus Pius 0138        NA             161        
#>  7 Antonius Pius  0138-07-10  Natural Causes 0161-03-07 
#>  8 Arcadius       0383        NA             NA         
#>  9 Arcadius       0395        NA             408        
#> 10 Augustus       -0026-01-16 Assassination  0014-08-19 
#> # … with 192 more rows
# }