This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the October 2023 version of the Web Data Commons Microdata and JSON-LD corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, job postings,
or data describing local businesses), we have created class-specific subsets out of the complete and merged Microdata and JSON-LD corpora for a
selection of schema.org classes.
The subsets contain all instances of a specific class of either formats as well as all other data that is found on
the webpages containing these instances. For example, a page containing data about a product might also contain
reviews and offers for this product; a page containing data about an event might also contain data about the
location of the event and the persons involved in the event.
The data is represented in N-Quads format, meaning that the forth
element of each quad contains the URL of the webpage from which the data was extracted.
To facilitate the download and access to the class specific data, we provide the schema.org subsets in chunks. Each
chunk contains quads of specific pay-level-domains (PLDs), i.e. all quads of one PLD, e.g. yummly.com,
are organized within the same chunk file. Additionally, we provide lookup files containing the mappings between PLDs
and their corresponding chunks as well as csv files with PLD-specific statistics.
Please note that:
Schema.org Subset | General Stats | Related Classes | Size (# Files) | Download (Sample) | PLD to File look-up PLD Specific Stats |
---|---|---|---|---|---|
AdministrativeArea | Quads: 93,509,047 URLs: 475,278 Hosts: 3,567 | http://schema.org/City (1,856,683)http://schema.org/ListItem (1,485,028) http://schema.org/ImageObject (1,298,188)http://schema.org/AdministrativeArea (1,107,448) http://schema.org/PostalAddress (916,224) | 1.19 GB (8) |
AdministrativeArea (sample) |
lookup_file pld_stats_file |
Airport | Quads: 34,301,832 URLs: 140,425 Hosts: 711 | http://schema.org/Airport (2,523,676)http://schema.org/GeoCoordinates (1,271,769) http://schema.org/Flight (884,658)http://schema.org/Airline (826,467)http://schema.org/Offer (620,257) | 319.61 MB (3) |
Airport (sample) |
lookup_file pld_stats_file |
Answer | Quads: 1,980,845,491 URLs: 18,003,812 Hosts: 381,423 | http://schema.org/Answer (79,065,507)http://schema.org/Question (69,348,253) http://schema.org/ListItem (40,860,417)http://schema.org/ImageObject (25,098,071) https://schema.org/Answer (23,475,828) | 38.03 GB (158) |
Answer (sample) |
lookup_file pld_stats_file |
Book | Quads: 349,074,007 URLs: 5,155,249 Hosts: 26,153 | http://schema.org/Book (14,176,649)http://schema.org/Country (10,281,356) http://schema.org/Person (9,223,300)http://schema.org/Offer (7,223,358) http://schema.org/ListItem (4,037,512) | 5.9 GB (28) |
Book (sample) |
lookup_file pld_stats_file |
City | Quads: 280,236,099 URLs: 1,514,800 Hosts: 14,222 | http://schema.org/City (7,187,495)http://schema.org/PostalAddress (3,950,680) http://schema.org/OpeningHoursSpecification (3,778,969)http://schema.org/ListItem (3,575,442) http://schema.org/Person (3,474,824) | 3.04 GB (23) |
City (sample) |
lookup_file pld_stats_file |
CollegeOrUniversity | Quads: 139,230,188 URLs: 1,152,370 Hosts: 3,287 | http://schema.org/CollegeOrUniversity (4,789,189)http://schema.org/ListItem (3,521,763) http://schema.org/ImageObject (3,365,727)http://schema.org/Person (2,952,316) http://schema.org/PostalAddress (2,259,152) | 1.63 GB (12) |
CollegeOrUniversity (sample) |
lookup_file pld_stats_file |
Continent | Quads: 2,219,972 URLs: 32,599 Hosts: 69 | http://schema.org/City (234,284)http://schema.org/AdministrativeArea (139,110) http://schema.org/Country (37,910)http://schema.org/Continent (36,968) http://schema.org/GeoCoordinates (29,135) | 23.19 MB (1) |
Continent (sample) |
lookup_file pld_stats_file |
Country | Quads: 678,417,918 URLs: 5,559,842 Hosts: 27,500 | http://schema.org/Country (39,943,326)http://schema.org/ListItem (16,171,738) http://schema.org/Organization (9,427,215)http://schema.org/Offer (9,129,260) http://schema.org/PostalAddress (8,574,888) | 9.29 GB (54) |
Country (sample) |
lookup_file pld_stats_file |
CreativeWork | Quads: 3,190,496,257 URLs: 68,554,518 Hosts: 1,182,156 | https://schema.org/CreativeWork (123,695,386)https://schema.org/SiteNavigationElement (82,235,811)https://schema.org/Person (71,552,642)https://schema.org/WPHeader (48,950,793)https://schema.org/WPFooter (47,191,030) | 128.59 GB (255) |
CreativeWork (sample) |
lookup_file pld_stats_file |
Dataset | Quads: 88,498,570 URLs: 839,031 Hosts: 1,958 | http://schema.org/PropertyValue (6,202,068)http://schema.org/DataDownload (2,724,251) http://schema.org/Dataset (1,352,684)http://schema.org/Organization (1,319,413) http://schema.org/Person (860,605) | 1.08 GB (7) |
Dataset (sample) |
lookup_file pld_stats_file |
EducationalOrganization | Quads: 94,950,615 URLs: 1,278,468 Hosts: 10,287 | http://schema.org/EducationalOrganization (2,179,457)http://schema.org/ListItem (1,844,469) http://schema.org/ImageObject (1,358,517)http://schema.org/PostalAddress (1,352,028) http://schema.org/Person (890,191) | 1.47 GB (8) |
EducationalOrganization (sample) |
lookup_file pld_stats_file |
Event | Quads: 2,532,122,969 URLs: 20,974,763 Hosts: 402,077 | http://schema.org/Event (88,374,986)http://schema.org/Place (64,567,498) http://schema.org/PostalAddress (51,701,441)http://schema.org/Person (30,113,455) http://schema.org/ListItem (28,414,290) | 32.25 GB (202) |
Event (sample) |
lookup_file pld_stats_file |
FAQPage | Quads: 1,749,714,956 URLs: 14,411,496 Hosts: 354,617 | http://schema.org/Question (66,030,524)http://schema.org/Answer (65,823,505) http://schema.org/ListItem (38,336,764)http://schema.org/ImageObject (26,199,305) https://schema.org/Answer (16,941,782) | 31.72 GB (140) |
FAQPage (sample) |
lookup_file pld_stats_file |
GeoCoordinates | Quads: 4,186,967,156 URLs: 33,179,627 Hosts: 509,401 | http://schema.org/ListItem (114,912,609)http://schema.org/PostalAddress (69,328,670) http://schema.org/GeoCoordinates (63,593,552)http://schema.org/OpeningHoursSpecification (41,284,705)http://schema.org/Offer (36,532,761) | 52.94 GB (334) |
GeoCoordinates (sample) |
lookup_file pld_stats_file |
GovernmentOrganization | Quads: 31,048,065 URLs: 497,116 Hosts: 1,687 | http://schema.org/ListItem (1,436,216)http://schema.org/GovernmentOrganization (604,170) http://schema.org/ImageObject (465,098)https://schema.org/ImageObject (317,409) http://schema.org/PostalAddress (311,063) | 468.15 MB (3) |
GovernmentOrganization (sample) |
lookup_file pld_stats_file |
Hospital | Quads: 30,710,791 URLs: 288,052 Hosts: 1,935 | http://schema.org/PostalAddress (828,913)http://schema.org/GeoCoordinates (691,957) http://schema.org/Hospital (557,245)http://schema.org/GeoCircle (550,398) http://schema.org/ListItem (472,076) | 388.31 MB (3) |
Hospital (sample) |
lookup_file pld_stats_file |
Hotel | Quads: 388,224,871 URLs: 2,726,279 Hosts: 25,507 | http://schema.org/ImageObject (15,257,350)http://schema.org/Hotel (8,078,597) http://schema.org/PostalAddress (7,075,683)http://schema.org/Rating (5,379,135) http://schema.org/LocationFeatureSpecification (5,272,388) | 5.23 GB (31) |
Hotel (sample) |
lookup_file pld_stats_file |
JobPosting | Quads: 189,836,812 URLs: 4,056,084 Hosts: 61,024 | http://schema.org/Place (4,968,710)http://schema.org/Organization (4,890,857) http://schema.org/PostalAddress (4,883,844)http://schema.org/JobPosting (4,741,019) http://schema.org/ListItem (2,985,374) | 7.57 GB (16) |
JobPosting (sample) |
lookup_file pld_stats_file |
LakeBodyOfWater | Quads: 176,871 URLs: 4,422 Hosts: 135 | http://schema.org/LakeBodyOfWater (5,018)http://schema.org/PostalAddress (3,796) http://schema.org/GeoCoordinates (1,018)http://schema.org/City (952) http://schema.org/PropertyValue (745) | 4.95 MB (1) |
LakeBodyOfWater (sample) |
lookup_file pld_stats_file |
LandmarksOrHistoricalBuildings | Quads: 2,848,557 URLs: 34,917 Hosts: 405 | http://schema.org/PropertyValue (74,748)http://schema.org/ImageObject (74,725) http://schema.org/LandmarksOrHistoricalBuildings (73,368)http://schema.org/PostalAddress (53,626)http://schema.org/CreativeWork (42,448) | 79.02 MB (1) |
LandmarksOrHistoricalBuildings (sample) |
lookup_file pld_stats_file |
Language | Quads: 720,590,542 URLs: 5,684,120 Hosts: 12,880 | http://schema.org/Person (31,686,159)http://schema.org/Comment (25,150,067) http://schema.org/ListItem (12,223,370)http://schema.org/Language (11,339,333) http://schema.org/InteractionCounter (9,290,581) | 12.84 GB (57) |
Language (sample) |
lookup_file pld_stats_file |
Library | Quads: 8,270,159 URLs: 197,901 Hosts: 818 | http://schema.org/Library (214,924)http://schema.org/PostalAddress (95,456) http://schema.org/Place (93,798)http://schema.org/ListItem (86,384) http://schema.org/OpeningHoursSpecification (78,516) | 128.91 MB (1) |
Library (sample) |
lookup_file pld_stats_file |
LocalBusiness | Quads: 2,979,247,943 URLs: 36,711,236 Hosts: 1,354,750 | http://schema.org/ListItem (107,056,900)http://schema.org/LocalBusiness (55,520,207) http://schema.org/PostalAddress (51,173,588)http://schema.org/ImageObject (24,221,665) http://schema.org/OpeningHoursSpecification (21,480,235) | 38.61 GB (238) |
LocalBusiness (sample) |
lookup_file pld_stats_file |
Mountain | Quads: 244,167 URLs: 12,064 Hosts: 63 | http://schema.org/Mountain (16,723)http://schema.org/GeoCoordinates (16,704) http://schema.org/propertyValue (7,540)http://schema.org/Place (2,887) https://schema.org/ListItem (1,436) | 5.74 MB (1) |
Mountain (sample) |
lookup_file pld_stats_file |
Movie | Quads: 162,588,730 URLs: 2,003,583 Hosts: 7,641 | http://schema.org/Person (12,242,939)http://schema.org/Movie (4,451,037) http://schema.org/ListItem (2,464,915)http://schema.org/AggregateRating (1,348,493) http://schema.org/ImageObject (962,320) | 2.38 GB (13) |
Movie (sample) |
lookup_file pld_stats_file |
Museum | Quads: 5,539,798 URLs: 92,048 Hosts: 675 | http://schema.org/Museum (110,728)http://schema.org/Event (93,667)http://schema.org/ListItem (93,259)http://schema.org/PostalAddress (85,710)http://schema.org/OpeningHoursSpecification (67,635) | 85.08 MB (1) |
Museum (sample) |
lookup_file pld_stats_file |
MusicAlbum | Quads: 112,398,520 URLs: 819,666 Hosts: 18,779 | http://schema.org/Country (8,123,813)http://schema.org/MusicRecording (4,520,767) http://schema.org/MusicAlbum (2,815,229)http://schema.org/Offer (2,622,574) http://schema.org/EntryPoint (1,338,109) | 1.03 GB (9) |
MusicAlbum (sample) |
lookup_file pld_stats_file |
MusicRecording | Quads: 173,478,132 URLs: 1,459,187 Hosts: 27,876 | http://schema.org/Country (14,033,776)http://schema.org/MusicRecording (10,037,791) http://schema.org/Offer (2,842,197)http://schema.org/MusicGroup (1,873,068) http://schema.org/MusicAlbum (1,758,193) | 1.63 GB (14) |
MusicRecording (sample) |
lookup_file pld_stats_file |
Organization | Quads: 52,360,387,820 URLs: 824,557,426 Hosts: 6,764,349 | http://schema.org/ListItem (1,475,935,021)http://schema.org/ImageObject (1,183,253,284) http://schema.org/Organization (1,062,852,659)http://schema.org/BreadcrumbList (529,826,463) http://schema.org/WebPage (514,302,028) | 847.31 GB (4168) |
Organization (sample) |
lookup_file pld_stats_file |
Painting | Quads: 12,180,582 URLs: 116,101 Hosts: 640 | http://schema.org/Person (2,294,142)http://schema.org/Painting (481,498) http://schema.org/Offer (307,589)http://schema.org/ListItem (219,232) http://schema.org/Property (110,457) | 123.99 MB (1) |
Painting (sample) |
lookup_file pld_stats_file |
Park | Quads: 1,654,003 URLs: 13,506 Hosts: 324 | http://schema.org/Organization (55,799)http://schema.org/PostalAddress (30,844) http://schema.org/OpeningHoursSpecification (17,788)http://schema.org/ListItem (16,248) http://schema.org/Park (14,366) | 21.53 MB (1) |
Park (sample) |
lookup_file pld_stats_file |
Person | Quads: 34,171,228,713 URLs: 465,715,308 Hosts: 5,117,767 | http://schema.org/ImageObject (864,419,612)http://schema.org/Person (778,026,765) http://schema.org/ListItem (764,540,830)http://schema.org/WebPage (420,788,422) http://schema.org/Organization (383,671,490) | 645.02 GB (2723) |
Person (sample) |
lookup_file pld_stats_file |
Place | Quads: 4,649,072,412 URLs: 37,241,661 Hosts: 535,530 | http://schema.org/ListItem (116,975,200)http://schema.org/Place (109,578,157) http://schema.org/PostalAddress (91,318,816)http://schema.org/Event (71,585,370) http://schema.org/Person (47,511,716) | 63.13 GB (371) |
Place (sample) |
lookup_file pld_stats_file |
Product | Quads: 23,849,625,142 URLs: 347,477,842 Hosts: 2,897,121 | http://schema.org/Offer (826,534,359)http://schema.org/ListItem (636,352,840) http://schema.org/Product (595,117,400)http://schema.org/Organization (322,255,895) http://schema.org/ImageObject (191,985,760) | 349.61 GB (1899) |
Product (sample) |
lookup_file pld_stats_file |
QAPage | Quads: 186,659,370 URLs: 3,303,871 Hosts: 10,746 | http://schema.org/Person (12,196,036)http://schema.org/Answer (7,259,119) https://schema.org/Answer (2,582,837)http://schema.org/Question (2,441,450) http://schema.org/QAPage (2,384,815) | 4.69 GB (15) |
QAPage (sample) |
lookup_file pld_stats_file |
Question | Quads: 2,013,523,983 URLs: 18,820,629 Hosts: 383,439 | http://schema.org/Answer (78,080,422)http://schema.org/Question (71,036,212) http://schema.org/ListItem (41,071,086)http://schema.org/ImageObject (26,505,287) https://schema.org/Answer (23,080,355) | 38.59 GB (161) |
Question (sample) |
lookup_file pld_stats_file |
RadioStation | Quads: 17,376,036 URLs: 321,302 Hosts: 1,138 | http://schema.org/ListItem (601,939)http://schema.org/RadioStation (374,079) http://schema.org/ImageObject (291,891)http://schema.org/NewsArticle (270,600) http://schema.org/Organization (169,207) | 308.8 MB (2) |
RadioStation (sample) |
lookup_file pld_stats_file |
Recipe | Quads: 502,684,939 URLs: 4,489,240 Hosts: 42,727 | http://schema.org/HowToStep (16,492,994)http://schema.org/ListItem (8,525,059) http://schema.org/ImageObject (8,000,669)http://schema.org/Person (6,387,072) http://schema.org/Recipe (5,683,640) | 8.07 GB (40) |
Recipe (sample) |
lookup_file pld_stats_file |
Restaurant | Quads: 223,449,936 URLs: 1,717,316 Hosts: 64,072 | http://schema.org/Offer (6,574,202)http://schema.org/MenuItem (5,599,769) http://schema.org/Restaurant (4,862,139)http://schema.org/ListItem (4,136,057) http://schema.org/PostalAddress (3,805,739) | 2.5 GB (18) |
Restaurant (sample) |
lookup_file pld_stats_file |
RiverBodyOfWater | Quads: 94,489 URLs: 2,224 Hosts: 16 | http://schema.org/ImageObject (3,379)http://schema.org/ListItem (2,875) http://schema.org/RiverBodyOfWater (2,316)http://schema.org/Organization (1,991) http://schema.org/PropertyValue (1,913) | 3.45 MB (1) |
RiverBodyOfWater (sample) |
lookup_file pld_stats_file |
School | Quads: 14,495,397 URLs: 292,758 Hosts: 2,030 | http://schema.org/ListItem (400,913)http://schema.org/School (381,043) http://schema.org/PostalAddress (234,219)http://schema.org/ImageObject (165,397) http://schema.org/GeoCoordinates (124,684) | 221.82 MB (2) |
School (sample) |
lookup_file pld_stats_file |
ShoppingCenter | Quads: 12,849,735 URLs: 152,010 Hosts: 1,285 | http://schema.org/PostalAddress (241,647)http://schema.org/Organization (238,002) http://schema.org/Offer (231,301)http://schema.org/ShoppingCenter (214,191) http://schema.org/ListItem (167,885) | 176.61 MB (2) |
ShoppingCenter (sample) |
lookup_file pld_stats_file |
SkiResort | Quads: 1,494,935 URLs: 29,605 Hosts: 266 | http://schema.org/ListItem (48,318)http://schema.org/SkiResort (40,543) http://schema.org/Person (34,224)http://schema.org/Review (33,478) http://schema.org/PostalAddress (25,515) | 30.61 MB (1) |
SkiResort (sample) |
lookup_file pld_stats_file |
SportsEvent | Quads: 165,906,925 URLs: 1,010,368 Hosts: 7,207 | http://schema.org/SportsEvent (7,345,109)http://schema.org/SportsTeam (6,804,715) http://schema.org/Place (6,640,892)http://schema.org/PostalAddress (5,556,554) http://schema.org/Organization (2,138,486) | 1.53 GB (14) |
SportsEvent (sample) |
lookup_file pld_stats_file |
SportsTeam | Quads: 126,009,626 URLs: 781,371 Hosts: 3,853 | http://schema.org/SportsTeam (7,898,303)http://schema.org/SportsEvent (3,144,937) http://schema.org/Place (2,738,920)http://schema.org/Organization (2,010,041) http://schema.org/PostalAddress (1,864,192) | 1.05 GB (11) |
SportsTeam (sample) |
lookup_file pld_stats_file |
StadiumOrArena | Quads: 28,984,463 URLs: 90,159 Hosts: 258 | http://schema.org/Organization (1,223,801)http://schema.org/ImageObject (954,725) http://schema.org/SportsTeam (923,758)http://schema.org/SportsEvent (453,481) http://schema.org/BlogPosting (446,060) | 234.01 MB (3) |
StadiumOrArena (sample) |
lookup_file pld_stats_file |
TelevisionStation | Quads: 1,792,637 URLs: 19,196 Hosts: 94 | http://schema.org/ListItem (50,738)http://schema.org/CreativeWorkSeries (34,057) http://schema.org/TelevisionStation (34,023)http://schema.org/SiteNavigationElement (32,902) http://schema.org/ImageObject (32,397) | 29.12 MB (1) |
TelevisionStation (sample) |
lookup_file pld_stats_file |
TVEpisode | Quads: 62,300,689 URLs: 305,104 Hosts: 1,048 | http://schema.org/Country (3,827,991)https://schema.org/TVEpisode (3,653,982) http://schema.org/TVEpisode (1,556,216)http://schema.org/ListItem (602,831) http://schema.org/Person (409,220) | 633.05 MB (5) |
TVEpisode (sample) |
lookup_file pld_stats_file |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
We provide the extracted data for download using a variation of the N-Quads format. For users who prefer other formats, we provide code for converting the download files into CSV and JSON formats, which are supported by a wide range of spreadsheet applications, relational databases and data mining frameworks like the python data analysis library pandas. Please find further details on how to convert the download files to other formats on the main page.
The jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus can be checked out from our Git repository.
The extraction of the December 2023 was done with version 1.5 of the extractor. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.