Logout / Access Other products Drop Down Arrow
Get live help Monday-Friday from 7:00AM-6:00PM ET (11:00AM-10:00PM GMT)  •  Contact Us
Check out our new FAQ section!
RSS Feed
TitleFAQ: Why doesn't Data Buffet use provider geo codes?
AuthorPhillip Thorne
Question

For mnemonics, why does Data Buffet use its own system of geo codes, instead of the official codes issued by each national provider?

Answer

The challenge

Because Data Buffet republishes data from hundreds of providers, and over a span of decades, direct reuse of their identifiers raises the potential for conflict or collisions:

  • For the same area, different sources may use different codes.
  • For the same code, different sources may have different meanings.
  • A single source may change its system over time, using different codes for the same area.
  • A single source may retain a code despite changes to an area (its composition or boundaries)

Moreover, we aim to achieve these goals with our geo codes:

  • Uniformity among areas at the same geo level
  • Distinction between areas of different types
  • Subnational areas are visibly related to their nation
  • Different vintages of a taxonomy can be isolated
  • Limit the impact of coding changes that are "distinction without a difference"
  • Are amenable to wild card expressions, so that related areas can be selected as a group

At the national level

For example, for the United Kingdom: Eurostat uses code "UK". IMF has used both "112" and "GBR". We elected to build upon the ISO 3166 alpha-3 standard, by using the fixed prefix "I" followed by "GBR", hence, "IGBR".

The fixed prefix means all national data can be retrieved with the wild card ".I^^^", and precludes collision with our 1990-delineation geo codes for U.S. metro areas, which consist of three letters. For example, IARE "United Arab Emirates" vs. ARE "Arecibo, Puerto Rico".

Note that we assign "national" geo codes to areas when statistically convenient, even if they are not sovereign (e.g., Hong Kong SAR or Puerto Rico) or their status is disputed (Taiwan ROC).

At the metro level, for the U.S.

For metro areas (more generally, "core-based statistical areas" or CBSAs), the maintainer is the U.S. Office of Management and Budget. Each CBSA is a composite of contiguous counties or county-equivalent areas, selected according to the results of the decennial census and the American Community Survey. The numeric code may change; or the numeric code may be retained even if the composition changes. Either circumstance prompts us to define a new geo code. For example:

CensusBulletinCodeNameComponentsGeo code
1990   0120 Albany, GA Dougherty, Lee ALN
2000   10500 Albany, GA Baker, Dougherty, Lee, Terrell, Worth MALN
2010 18-03 10500 Albany, GA Baker, Dougherty, Lee, Terrell, Worth IUSA_MALN
2010 18-04 10500 Albany, GA Dougherty, Lee, Terrell, Worth IUSA_MABY

Individual sources do not adopt the new delineations in lockstep. As a transitional measure, if a source reports using delineation "A", we may construct a supplemental dataset under delineation "B". This is possible only with distinct geo codes.

At the subnational level, for Europe

Under the Eurostat NUTS standard, an area may be terminated, merged, renamed, or created; in these cases, we assign a new geo code.

There are also cases where the boundaries do not change (same physical territory) but the area is nonetheless assigned a new identifier ("code change" or "recoded"); for this case, Data Buffet policy is to leave our geo code unchanged. Here are four examples in two countries:

NUTS vintagesNUTS levelMutationCode 1Code 2NameGeo code
2003 to 2006 3 Code change DEE21 DEE02 Halle (Saale), Kreisfreie Stadt IDEU_15002
2003 to 2006 3 Code change DEE31 DEE03 Magdeburg, Kreisfreie Stadt IDEU_15003
2003 to 2006 3 Code change DEE3B DEE04 Altmarkkreis Salzwedel IDEU_15081
2013 to 2016 3 Code change UKM21 UKM71 Angus and Dundee City IGBR_ADU

See also

Updates

  • Nov 2017 - Initial version
  • Jun 2019 - Considerations re: OMB codes for U.S. metro areas