GGO states codebook

Release 1.0

This document provides a brief overview of the coding rationale for key variables in the list of episodes of independent states and state-like entities in the international system provided in manystates::states$GGO.

Note that this dataset was constructed as a complement to datasets such as the Gleditsch and Ward Revised List of Independent States (manystates::states$GW) and Butcher and Griffiths’ International System(s) Dataset (manystates::states$ISD). As such, it is incomplete in observations nor variables, yet offers some more specificity and some additional entries compared to such other datasets.

Work on this dataset was supported by the Swiss National Science Foundation (SNSF) Grant Number 188976: “Power and Networks and the Rate of Change in Institutional Complexes” (PANARCHIC).

Please direct all comments and suggestions to:

James Hollway

International Relations/Political Science Department

Graduate Institute of International and Development Studies

Geneva, Switzerland

james.hollway@graduateinstitute.ch

States

StateName, StateNameAlt

This is the name or names of the state or state-like entity. Since the dataset includes entities (or dates placing these entities) before the advent of the modern interstate system, the definition of a state has changed but we include them here for reasons of comprehensivity. Where there are alternative or longer forms of the name of the state name, or names in other languages, these are included in the StateNameAlt variable. The shorter or more common name is preferred for the StateName variable, so long as it is unambiguous.

stateID

This is the three-letter code associated with the state or state-like entity. These three-letter codes are based on the ISO 3166-1 alpha-3 list, and all codes are consistent with it, however additional codes have been added to cover historical and other states that are not covered by the ISO’s own list. Where possible, we use the Correlates of War three-letter codes for this purpose, or those used in the GW or ISD datasets. However, in some cases we must select new codes and in such situations, we aim to use recognisable, unique codes relying on significant consonants or vowels.

Note that we endeavour to use existing codes where possible for state episodes that are substantially similar in territory and involve some inheritance of the international legal obligations, rights, and recognitions of the predecessor states. For this reason there is a series of episodes associated with “RUS”, for example, ranging from the Russian Empire, through the USSR, to the Russian Federation. However, where the state is not considered the legal successor state, for example Serbia is not considered the legal successor of Yugoslavia, we use different stateID codes (in this case “SRB” and “YUG”). In cases of dissolution (see below), the old stateID code should cease, whereas in cases of secession, the old stateID code should continue for the rump state.

Dates

Begin, End

These are the dates when an episode of state independence is deemed to have begun or ended. Dates are coded using the messydates system. This implements ISO’s extended date/time format. As such, some dates are only entered as a year or are annotated with a question mark if the source is uncertain. For more details see messydates.

States that are currently independent have an end date 9999-12-31. This distinguishes them from missing data, which is always coded NA.

Basis

The basis is coded as how the episode of state independence began. We adopt many of the categories offered in the ISD dataset, but add some additional categories to improve specificity:

Consolidation: state created over territory where no unified state previously existed, often uniting smaller local polities into a single entity
Decolonisation: state born from decolonisation of an empire or colonial metropole, including the conclusion of a protectorate or trusteeship arrangement
Dissolution: state born as a fragment of a larger state that broke apart and ceased to exist (e.g. Austro-Hungarian Empire)
Liberation: state restored after a period of non-existence, for example following occupation or annexation (e.g. Belgium after WWII occupation)
Secession: state secedes or breaks away from larger state or empire that continues to exist
Transformation: state continues in substance but changes its constitutional form, title, or status without foreign conquest or voluntary unification (e.g. Tsardom of Russia 1721 to Russian Empire)
Unification: state born from the voluntary merging of several (typically equally sized) states that previously existed, e.g. UAE in 1971
Other: for unusual or unclear cases; to be used sparingly with an explanation or elaboration required in the comments

Where the code is followed by a ? annotation, this indicates uncertainty about the coding.

Grounds

The grounds is coded as how the state ended. We use the categories offered in the ISD dataset:

Annexation: state taken over by conquest/foreign take-over (e.g. Aceh in 1874 by the Netherlands)
Colonisation: state subjected to imperial, non-contiguous colonisation, becomes a protectorate, or vassal (e.g. Mewar 1818 under British protection)
Unification: state ceases through process of voluntary unification or incorporation (e.g. Croatia 1102 into Hungary)
Dissolution: state ceases through dissolution of the state into several smaller states (e.g. Gran Colombia 1830)
Occupation: state ceases through occupation by outside powers (e.g. Albania by Italy in 1939)
Partition: state ceases through partition by outside powers or scission (e.g. Poland 1795)
Revolution: state ceases through internal revolution or coup (e.g. Russian Empire 1917)
Transformation: state continues in substance but changes its constitutional form, title, or status without foreign conquest or voluntary unification (e.g. Tsardom of Russia 1721 to Russian Empire)
Other: for unusual or unclear cases; to be used sparingly with an explanation or elaboration required in the comments

Where the code is followed by a ? annotation, this indicates uncertainty about the coding.

Places

Capital, CapitalAlt

This is the name of the capital city. For the most part, this is fairly straightforward, however in some cases there is a second capital city, in which case this will appear in the CapitalAlt variable.

Latitude, Longitude

Here we use the latitude and longitude in decimal form. If possible, we code the location of the capital city. If this is not possible, we attempt to identify the longitude and latitude of the barycentre of the territory.

Region

We code the region more specifically than in some other datasets. We code the region descriptively and as a character string, which affords the opportunity to search by regular expression such as “America” to get “Northern America”, “Southern America”, “Central America”, and “Caribbean America”. Note that we use the adjectival form, e.g. “Southern Africa”, to distinguish the region from the country “South Africa”. We use “Central” to describe areas in the middle of the continent, if applicable. The data includes the following regions:

Northern America
Southern America
Central America
Caribbean America
Northern Europe
Eastern Europe
Southeastern Europe
Southern Europe
Western Europe
Central Europe
Eastern Asia
Southeastern Asia
Southern Asia
Western Asia
Central Asia
Northern Africa
Eastern Africa
Southern Africa
Western Africa
Central Africa
Oceania

Coder, Comments, Source

The Coder variable is a comma separated vector of the surnames of those who have added or verified data for each entry/observation. Where special conditions arise, the Comments variable offers a free text area for explanations or recording how the coding has changed from version to version. The Source variable should contain only links or bibliographic information for the sources used to add or verify information.

James Hollway

2025-09-19