Skip to contents

Compare categories in 'many' datacubes

Usage

compare_categories(
  datacube,
  dataset = "all",
  key = "manyID",
  variable = "all",
  category = "all"
)

Arguments

datacube

A datacube from one of the many packages.

dataset

A dataset in a datacube from one of the many packages. By default "all". That is, all datasets in the datacube are used. To select two or more datasets, please declare them as a vector.

key

A variable key to join datasets. 'manyID' by default.

variable

Would you like to focus on one, or more, specific variables present in one or more datasets in the 'many' datacube? By default "all". For multiple variables, please declare variable names as a vector.

category

Would you like to focus on one specific code category? By default "all" are returned. Other options include "confirmed", "unique", "missing", "conflict", or "majority". For multiple variables, please declare categories as a vector.

Details

Confirmed values are the same in all datasets in datacube. Unique values appear once in datasets in datacube. Missing values are missing in all datasets in datacube. Conflict values are different in the same number of datasets in datacube. Majority values have the same value in multiple, but not all, datasets in datacube.

Examples

# \donttest{
compare_categories(emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in datacube.
#> # A tibble: 139 × 37
#>    ID        `wikipedia$Begin` `UNRV$Begin` `britannica$Begin` `Begin (3)`
#>    <chr>     <mdate>           <mdate>      <mdate>            <chr>      
#>  1 Augustus  -0026-01-16       -0027        -0031              conflict   
#>  2 Tiberius  0014-09-18        -0014        0014               conflict   
#>  3 Caligula  0037-03-18        NA           0037               conflict   
#>  4 Claudius  0041-01-25        0041         0041               majority   
#>  5 Nero      0054-10-13        0054         0054               majority   
#>  6 Galba     0068-06-08        0068         0068               majority   
#>  7 Otho      0069-01-15        0069         0069-01            conflict   
#>  8 Vitellius 0069-04-17        0069         NA                 conflict   
#>  9 Vespasian 0069-12-21        0069         0069               majority   
#> 10 Titus     0079-06-24        0079         0079               majority   
#> # ℹ 129 more rows
#> # ℹ 32 more variables: `wikipedia$End` <mdate>, `UNRV$End` <mdate>,
#> #   `britannica$End` <mdate>, `End (3)` <chr>, `wikipedia$FullName` <chr>,
#> #   `UNRV$FullName` <chr>, `FullName (2)` <chr>, `wikipedia$Birth` <chr>,
#> #   `UNRV$Birth` <chr>, `Birth (2)` <chr>, `wikipedia$Death` <chr>,
#> #   `UNRV$Death` <chr>, `Death (2)` <chr>, `wikipedia$CityBirth` <chr>,
#> #   `CityBirth (1)` <chr>, `wikipedia$ProvinceBirth` <chr>, …
compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique"))
#> There were 49 matched observations by ID variable across datasets in datacube.
#> # A tibble: 119 × 4
#>    ID        `wikipedia$End` `UNRV$End` `End (2)`
#>    <chr>     <mdate>         <mdate>    <chr>    
#>  1 Augustus  0014-08-19      -0014      conflict 
#>  2 Tiberius  0037-03-16      0037       conflict 
#>  3 Caligula  0041-01-24      NA         unique   
#>  4 Claudius  0054-10-13      0054       conflict 
#>  5 Nero      0068-06-09      0068       conflict 
#>  6 Galba     0069-01-15      0069       conflict 
#>  7 Otho      0069-04-16      0069       conflict 
#>  8 Vitellius 0069-12-20      0069       conflict 
#>  9 Vespasian 0079-06-24      0079       conflict 
#> 10 Titus     0081-09-13      0081       conflict 
#> # ℹ 109 more rows
plot(compare_categories(emperors, key = "ID"))
#> There were 116 matched observations by ID variable across datasets in datacube.

plot(compare_categories(datacube = emperors, dataset = c("wikipedia", "UNRV"),
key = "ID", variable = c("Beg", "End"), category = c("conflict", "unique")))
#> There were 49 matched observations by ID variable across datasets in datacube.

# }