Standardises words in a character title variable to improve readability, facilitate string matching and enable more accurate comparisons for variables in different datatsets.

standardise_titles(title)

standardize_titles(title)

standardise_caps(title)

standardise_numbers(title)

Arguments

title

A string or a character vector of titles.

Value

A capitalised, trimmed and standardised string

Details

The function capitalises words in the strings passed to it. It trims white spaces from the start, middle and end of the strings. Removes ambiguous punctions and symbols from strings. All the strings are transformed into to ASCII character encoding. Written numbers in ordinal form are transformed into numerical form.

standardise_caps()

This function is used to standardise the capitalisation of words in treaty titles. It capitalises the first letter of each word while keeping the rest of the letters in lowercase. Unlike stringi or stringr solutions though, this function retains abbreviations and acronyms in uppercase.

standardise_numbers()

This function is used to standardise numbers in treaty titles to improve readability and facilitate string matching. It replaces written numbers with their numerical equivalents.

Examples

e <- standardise_titles("A treaty concerning things")
#>  Standardised capitalisation in titles
#>  Standardised numbers in titles
#>  Standardised titles
e==c("A Treaty Concerning Things")
#> [1] TRUE