Skip to contents

Database profiling functions that returns confirmed, unique, missing, conflicting, or majority values in all (non-ID) variables in the datasets in a 'many' package database.

Usage

db_plot(database, key = "manyID", variable = "all", category = "all")

db_comp(database, key = "manyID", variable = "all", category = "all")

Arguments

database

A many database.

key

A variable key to join datasets by, "manyID" by default.

variable

Would you like to focus on one, or more, specific variables? By default "all". For multiple variables, please declare variable names as a vector.

category

Would you like to focus on one specific code category? By default "all" are returned. Other options include "confirmed", "unique", "missing", "conflicting", or "majority". For multiple variables, please declare categories as a vector.

Value

A plot, or a tibble, with the profile of the variables across all datasets in a "many" database. For multiple categories across multiple variables, the functions return all rows that contain at least one of the selected variables coded as one of the categories.

Details

Confirmed values are the same in all datasets in database. Unique values appear once in datasets in database. Missing values are missing in all datasets in database. Conflicting values are different in the same number of datasets in database. Majority values have the same value in multiple, but not all, datasets in database.

db_plot() plots the database profile.

db_comp() creates a tibble comparing the variables in a database.

Examples

# \donttest{
db_plot(database = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.

db_plot(database = emperors, key = "ID", variable = c("Beg", "End"))
#> There were 116 matched observations by ID variable across datasets in database.

db_plot(database = emperors, key = "ID", variable = c("Beg", "End"),
category = c("conflict", "unique"))
#> There were 116 matched observations by ID variable across datasets in database.

# }
# \donttest{
db_comp(database = emperors, key = "ID")
#> There were 116 matched observations by ID variable across datasets in database.
#> # A tibble: 139 × 37
#>    ID    wikip…¹ UNRV$…² brita…³ Beg (…⁴ wikip…⁵ UNRV$…⁶ brita…⁷ End (…⁸ wikip…⁹
#>    <chr> <mdate> <mdate> <mdate> <chr>   <mdate> <mdate> <mdate> <chr>   <chr>  
#>  1 Augu… -0026-… -0027   -0031   confli… 0014-0… -0014   0014    confli… IMPERA…
#>  2 Tibe… 0014-0… -0014   0014    confli… 0037-0… 0037    0037    majori… TIBERI…
#>  3 Cali… 0037-0… NA      0037    confli… 0041-0… NA      0041    confli… GAIVS …
#>  4 Clau… 0041-0… 0041    0041    majori… 0054-1… 0054    0054    majori… TIBERI…
#>  5 Nero  0054-1… 0054    0054    majori… 0068-0… 0068    0068    majori… NERO C…
#>  6 Galba 0068-0… 0068    0068    majori… 0069-0… 0069    0069    majori… SERVIV…
#>  7 Otho  0069-0… 0069    0069-01 confli… 0069-0… 0069    0069-04 confli… MARCVS…
#>  8 Vite… 0069-0… 0069    NA      confli… 0069-1… 0069    NA      confli… AVLVS …
#>  9 Vesp… 0069-1… 0069    0069    majori… 0079-0… 0079    0079    majori… TITVS …
#> 10 Titus 0079-0… 0079    0079    majori… 0081-0… 0081    0081    majori… TITVS …
#> # … with 129 more rows, 27 more variables: `UNRV$FullName` <chr>,
#> #   `FullName (2)` <chr>, `wikipedia$Birth` <chr>, `UNRV$Birth` <chr>,
#> #   `Birth (2)` <chr>, `wikipedia$Death` <chr>, `UNRV$Death` <chr>,
#> #   `Death (2)` <chr>, `wikipedia$CityBirth` <chr>, `CityBirth (1)` <chr>,
#> #   `wikipedia$ProvinceBirth` <chr>, `ProvinceBirth (1)` <chr>,
#> #   `wikipedia$Rise` <chr>, `Rise (1)` <chr>, `wikipedia$Cause` <chr>,
#> #   `Cause (1)` <chr>, `wikipedia$Killer` <chr>, `Killer (1)` <chr>, …
db_comp(database = emperors, key = "ID", variable = "Beg")
#> There were 116 matched observations by ID variable across datasets in database.
#> # A tibble: 139 × 5
#>    ID        `wikipedia$Beg` `UNRV$Beg` `britannica$Beg` `Beg (3)`
#>    <chr>     <mdate>         <mdate>    <mdate>          <chr>    
#>  1 Augustus  -0026-01-16     -0027      -0031            conflict 
#>  2 Tiberius  0014-09-18      -0014      0014             conflict 
#>  3 Caligula  0037-03-18      NA         0037             conflict 
#>  4 Claudius  0041-01-25      0041       0041             majority 
#>  5 Nero      0054-10-13      0054       0054             majority 
#>  6 Galba     0068-06-08      0068       0068             majority 
#>  7 Otho      0069-01-15      0069       0069-01          conflict 
#>  8 Vitellius 0069-04-17      0069       NA               conflict 
#>  9 Vespasian 0069-12-21      0069       0069             majority 
#> 10 Titus     0079-06-24      0079       0079             majority 
#> # … with 129 more rows
db_comp(database = emperors, key = "ID", variable = c("Beg", "End"),
category = "conflict")
#> There were 116 matched observations by ID variable across datasets in database.
#> # A tibble: 26 × 9
#>    ID            wikip…¹ UNRV$…² brita…³ Beg (…⁴ wikip…⁵ UNRV$…⁶ brita…⁷ End (…⁸
#>    <chr>         <mdate> <mdate> <mdate> <chr>   <mdate> <mdate> <mdate> <chr>  
#>  1 Augustus      -0026-… -0027   -0031   confli… 0014-0… -0014   0014    confli…
#>  2 Tiberius      0014-0… -0014   0014    confli… 0037-0… 0037    0037    majori…
#>  3 Caligula      0037-0… NA      0037    confli… 0041-0… NA      0041    confli…
#>  4 Otho          0069-0… 0069    0069-01 confli… 0069-0… 0069    0069-04 confli…
#>  5 Vitellius     0069-0… 0069    NA      confli… 0069-1… 0069    NA      confli…
#>  6 Commodus      0177  … 0180    0177    confli… 0192-1… 0192    0192    majori…
#>  7 Pertinax      0193-0… 0193    NA      confli… 0193-0… 0193    NA      confli…
#>  8 Didius Julia… 0193-0… 0193    NA      confli… 0193-0… 0193    NA      confli…
#>  9 Caracalla     0198  … 0211    0198    confli… 0217-0… 0217    0217    majori…
#> 10 Geta          0209  … 0211    NA      confli… 0211-1… 0211    NA      confli…
#> # … with 16 more rows, and abbreviated variable names ¹​`wikipedia$Beg`,
#> #   ²​`UNRV$Beg`, ³​`britannica$Beg`, ⁴​`Beg (3)`, ⁵​`wikipedia$End`, ⁶​`UNRV$End`,
#> #   ⁷​`britannica$End`, ⁸​`End (3)`
db_comp(database = emperors, key = "ID", variable = c("Beg", "End"),
category = c("conflict", "unique"))
#> There were 116 matched observations by ID variable across datasets in database.
#> # A tibble: 91 × 9
#>    ID            wikip…¹ UNRV$…² brita…³ Beg (…⁴ wikip…⁵ UNRV$…⁶ brita…⁷ End (…⁸
#>    <chr>         <mdate> <mdate> <mdate> <chr>   <mdate> <mdate> <mdate> <chr>  
#>  1 Augustus      -0026-… -0027   -0031   confli… 0014-0… -0014   0014    confli…
#>  2 Tiberius      0014-0… -0014   0014    confli… 0037-0… 0037    0037    majori…
#>  3 Caligula      0037-0… NA      0037    confli… 0041-0… NA      0041    confli…
#>  4 Otho          0069-0… 0069    0069-01 confli… 0069-0… 0069    0069-04 confli…
#>  5 Vitellius     0069-0… 0069    NA      confli… 0069-1… 0069    NA      confli…
#>  6 Antonius Pius 0138-0… NA      NA      unique  0161-0… NA      NA      unique 
#>  7 Commodus      0177  … 0180    0177    confli… 0192-1… 0192    0192    majori…
#>  8 Pertinax      0193-0… 0193    NA      confli… 0193-0… 0193    NA      confli…
#>  9 Didius Julia… 0193-0… 0193    NA      confli… 0193-0… 0193    NA      confli…
#> 10 Septimus Sev… 0193-0… NA      NA      unique  0211-0… NA      NA      unique 
#> # … with 81 more rows, and abbreviated variable names ¹​`wikipedia$Beg`,
#> #   ²​`UNRV$Beg`, ³​`britannica$Beg`, ⁴​`Beg (3)`, ⁵​`wikipedia$End`, ⁶​`UNRV$End`,
#> #   ⁷​`britannica$End`, ⁸​`End (3)`
# }