Introduction

The UNESCO World Heritage List is a collection of 1,248 sites that are recognized for their cultural, historical, or natural significance. The dataset includes the sites’ names and short description in multiple different languages, long description, justification for inclusion, date of inclusion, and 19 other variables.  

What’s in the Data?

The data was compiled from UNESCO’s official World Heritage nominations and records that were voted on during their annual committee sessions. The dataset was created by the United Nations Educational, Scientific and Cultural Organization (UNESCO). The information can be found from the World Heritage list, which is made up of sites selected by the UNESCO World Heritage Convention. Information about each site is sourced from their Nomination files, which is produced and submitted by the State Parties from which the site resides. The creation of the dataset was funded by the World Heritage Fund. The World Heritage Committee is responsible for allocating the budget of this fund for various purposes, such as collecting and publishing data.  Detailed information on the voting process, effects on tourism, details on inequality, bias, or power, and internal voting debates are missing from the data. This might be problematic due to the possibility of misinterpretation or misuse of data. 

Why it Matters

The dataset can illustrate global patterns in how heritage is recognized and valued across regions and time. By examining variables such as date of inclusion, justification criteria, and geographic location, we can analyze the trends and shifts in UNESCO’s priorities and politics. At a deeper level, the UNESCO World Heritage dataset is not just a neutral catalog of important places; it documents a political and bureaucratic process of recognition. Each site represents the successful navigation of an institutional system involving nomination dossiers, expert evaluations, and approval by the World Heritage Committee. What the dataset therefore illuminates is not only “heritage,” but which states have the administrative capacity, expertise, and resources to translate local sites into UNESCO-legible data. This helps explain why wealthier or more institutionally powerful countries are often overrepresented relative to regions with equally rich cultural histories.

Limitations

The dataset cannot fully capture the lived experiences and social impacts of “World Heritage” designation on surrounding communities. It also doesn’t provide details about funding or tourism related to each site. The dataset also does not include sites that were proposed but rejected, and it also does not explain detailed decision-making processes beyond a short justification. While the dataset includes short justifications for inclusion, it omits controversy, dissent, and conflict surrounding many inscriptions. There is no structured data about local opposition, displacement, environmental harm, or political disputes tied to World Heritage designation. Nor does it record sites placed on the “List of World Heritage in Danger” as part of a broader historical narrative. This absence makes UNESCO recognition appear consensual and celebratory, masking the uneven consequences of inscription. 

Dataset Ontologies

Every row in the UNESCO World Heritage dataset represents one World Heritage property/site that UNESCO recognizes as having “Outstanding Universal Value.” In other words, it treats heritage as something that can be named, bounded, classified, and counted, with core descriptors like location, inscription year, and selection criteria attached to each site.

That design choice defines its ontology: it frames heritage as discrete objects rather than living, evolving relationships; globally legible through a single institutional vocabulary (e.g., UNESCO’s criteria); and authoritatively validated by inscription. Because inclusion depends on meeting at least one of UNESCO’s criteria, the dataset inherits UNESCO’s definition of what “counts” as heritage and what “counts” as valuable.

The dataset itself does not explicitly explain the criteria used to recognize a UNESCO World Heritage site; instead, the criteria are implied by the fact that only the sites UNESCO has officially inscribed appear as records (1,248 of them).  Finally, the dataset’s descriptions being available only in a few languages shapes what is knowable. Translating sites into a small set of global languages increases accessibility, but it can flatten local meanings and replace community-specific terms with UNESCO’s standardized vocabulary. As a result, the dataset makes some forms of knowledge easy to analyze (global patterns) while leaving out others (local names, Indigenous framings, and everyday cultural significance). 

Links to the datasets: