This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the October 2022 version of the Web Data Commons Microdata and JSON-LD corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, job postings,
or data describing local businesses), we have created class-specific subsets out of the complete and merged Microdata and JSON-LD corpora for a
selection of schema.org classes.
The subsets contain all instances of a specific class of either formats as well as all other data that is found on
the webpages containing these instances. For example, a page containing data about a product might also contain
reviews and offers for this product; a page containing data about an event might also contain data about the
location of the event and the persons involved in the event.
The data is represented in N-Quads format, meaning that the forth
element of each quad contains the URL of the webpage from which the data was extracted.
To facilitate the download and access to the class specific data, we provide the schema.org subsets in chunks. Each
chunk contains quads of specific pay-level-domains (PLDs), i.e. all quads of one PLD, e.g. yummly.com,
are organized within the same chunk file. Additionally, we provide lookup files containing the mappings between PLDs
and their corresponding chunks as well as csv files with PLD-specific statistics.
Please note that:
Schema.org Subset | General Stats | Related Classes | Size (# Files) | Download (Sample) | PLD to File look-up PLD Specific Stats |
---|---|---|---|---|---|
AdministrativeArea | Quads: 77,960,962 URLs: 381,624 Hosts: 2,695 | http://schema.org/City (2,154,382)http://schema.org/ImageObject (1,260,633) http://schema.org/AdministrativeArea (932,550)http://schema.org/Person (819,487) http://schema.org/ListItem (810,411) | 923.8 MB (1) |
AdministrativeArea (sample) |
lookup_file pld_stats_file |
Airport | Quads: 54,033,331 URLs: 160,852 Hosts: 538 | http://schema.org/Airport (4,569,542)http://schema.org/GeoCoordinates (2,461,017) http://schema.org/Flight (1,593,849)http://schema.org/Airline (1,473,676) http://schema.org/Offer (1,244,616) | 415.83 MB (1) |
Airport (sample) |
lookup_file pld_stats_file |
Answer | Quads: 1,485,993,805 URLs: 13,279,199 Hosts: 251,442 | http://schema.org/Answer (53,728,963)http://schema.org/Question (45,624,470) http://schema.org/ImageObject (30,813,788)http://schema.org/ListItem (30,093,835) http://schema.org/Person (18,278,578) | 26.69 GB (15) |
Answer (sample) |
lookup_file pld_stats_file |
Book | Quads: 308,456,812 URLs: 4,999,213 Hosts: 21,623 | http://schema.org/Book (13,434,918)http://schema.org/Country (7,685,986) http://schema.org/Offer (7,381,043)http://schema.org/Person (6,792,930) http://schema.org/ListItem (3,755,829) | 7.67 GB (4) |
Book (sample) |
lookup_file pld_stats_file |
City | Quads: 213,841,120 URLs: 1,152,893 Hosts: 11,786 | http://schema.org/City (5,832,337)http://schema.org/ImageObject (3,890,159) http://schema.org/Person (3,482,668)http://schema.org/ListItem (3,132,617) http://schema.org/PostalAddress (2,928,984) | 2.25 GB (3) |
City (sample) |
lookup_file pld_stats_file |
CollegeOrUniversity | Quads: 154,890,333 URLs: 1,326,291 Hosts: 3,544 | http://schema.org/CollegeOrUniversity (5,643,657)http://schema.org/ListItem (4,582,716) http://schema.org/ImageObject (4,072,323)http://schema.org/Person (3,630,069) http://schema.org/PostalAddress (2,812,621) | 2.15 GB (2) |
CollegeOrUniversity (sample) |
lookup_file pld_stats_file |
Continent | Quads: 1,807,060 URLs: 8,584 Hosts: 57 | http://schema.org/City (243,141)http://schema.org/AdministrativeArea (122,577) http://schema.org/Country (11,022)http://schema.org/Continent (10,846) http://schema.org/ListItem (7,788) | 16.38 MB (1) |
Continent (sample) |
lookup_file pld_stats_file |
Country | Quads: 625,627,034 URLs: 4,840,011 Hosts: 24,422 | http://schema.org/Country (43,110,397)http://schema.org/ListItem (15,718,968) http://schema.org/Organization (8,272,171)http://schema.org/Offer (8,244,657) http://schema.org/PostalAddress (6,901,736) | 7.83 GB (7) |
Country (sample) |
lookup_file pld_stats_file |
CreativeWork | Quads: 2,909,313,285 URLs: 57,905,383 Hosts: 1,060,682 | https://schema.org/CreativeWork (97,847,794)https://schema.org/Person (69,897,980) https://schema.org/SiteNavigationElement (60,989,946)https://schema.org/WPHeader (39,355,522) https://schema.org/WPFooter (38,637,170) | 112.09 GB (30) |
CreativeWork (sample) |
lookup_file pld_stats_file |
Dataset | Quads: 50,243,885 URLs: 930,859 Hosts: 1,676 | http://schema.org/DataDownload (2,447,952)http://schema.org/Dataset (1,320,329) http://schema.org/Organization (951,404)http://schema.org/PropertyValue (800,666) http://schema.org/Person (449,880) | 782.07 MB (1) |
Dataset (sample) |
lookup_file pld_stats_file |
EducationalOrganization | Quads: 96,140,998 URLs: 1,146,674 Hosts: 9,080 | http://schema.org/EducationalOrganization (2,312,278)http://schema.org/ListItem (2,090,170) http://schema.org/PostalAddress (1,825,786)http://schema.org/ImageObject (1,036,325) http://schema.org/GeoCoordinates (752,916) | 1.5 GB (1) |
EducationalOrganization (sample) |
lookup_file pld_stats_file |
Event | Quads: 1,732,974,389 URLs: 16,038,172 Hosts: 313,782 | http://schema.org/Event (68,785,848)http://schema.org/Place (48,427,726) http://schema.org/PostalAddress (38,748,060)http://schema.org/Person (21,337,534) http://schema.org/ListItem (15,879,353) | 22.93 GB (18) |
Event (sample) |
lookup_file pld_stats_file |
FAQPage | Quads: 1,283,468,270 URLs: 10,111,531 Hosts: 230,283 | http://schema.org/Question (42,934,439)http://schema.org/Answer (42,802,142) http://schema.org/ImageObject (31,156,198)http://schema.org/ListItem (27,405,514) https://schema.org/Question (13,404,725) | 21.57 GB (13) |
FAQPage (sample) |
lookup_file pld_stats_file |
GeoCoordinates | Quads: 3,699,629,956 URLs: 29,887,844 Hosts: 461,695 | http://schema.org/ListItem (98,973,753)http://schema.org/PostalAddress (64,876,448) http://schema.org/GeoCoordinates (60,789,730)http://schema.org/ImageObject (37,038,861) http://schema.org/OpeningHoursSpecification (34,280,120) | 45.45 GB (37) |
GeoCoordinates (sample) |
lookup_file pld_stats_file |
GovernmentOrganization | Quads: 15,661,042 URLs: 348,175 Hosts: 1,398 | http://schema.org/GovernmentOrganization (467,886)http://schema.org/ImageObject (374,525) http://schema.org/ListItem (335,640)http://schema.org/PostalAddress (251,555) http://schema.org/Organization (181,023) | 303.11 MB (1) |
GovernmentOrganization (sample) |
lookup_file pld_stats_file |
Hospital | Quads: 23,904,433 URLs: 272,304 Hosts: 2,058 | http://schema.org/PostalAddress (682,782)http://schema.org/Hospital (511,631) http://schema.org/ListItem (364,906)http://schema.org/Physician (271,490) http://schema.org/Review (263,414) | 315.43 MB (1) |
Hospital (sample) |
lookup_file pld_stats_file |
Hotel | Quads: 323,859,754 URLs: 2,636,528 Hosts: 24,065 | http://schema.org/Hotel (9,579,097)http://schema.org/PostalAddress (8,664,346) http://schema.org/Rating (8,582,433)http://schema.org/ImageObject (5,499,477) http://schema.org/ListItem (5,076,113) | 4.56 GB (4) |
Hotel (sample) |
lookup_file pld_stats_file |
JobPosting | Quads: 182,882,855 URLs: 4,126,373 Hosts: 50,466 | http://schema.org/Place (5,538,410)http://schema.org/PostalAddress (5,396,582) http://schema.org/Organization (5,085,787)http://schema.org/JobPosting (4,977,840) http://schema.org/ListItem (2,628,233) | 7.01 GB (2) |
JobPosting (sample) |
lookup_file pld_stats_file |
LakeBodyOfWater | Quads: 54,795 URLs: 1,719 Hosts: 110 | http://schema.org/LakeBodyOfWater (1,680)http://schema.org/PropertyValue (1,396) http://schema.org/ImageObject (1,336)http://schema.org/GeoCoordinates (1,243) http://schema.org/PostalAddress (820) | 2.5 MB (1) |
LakeBodyOfWater (sample) |
lookup_file pld_stats_file |
LandmarksOrHistoricalBuildings | Quads: 1,985,725 URLs: 23,513 Hosts: 348 | http://schema.org/LandmarksOrHistoricalBuildings (129,909)http://schema.org/PostalAddress (56,802)http://schema.org/PropertyValue (34,550)http://schema.org/ImageObject (31,991) http://schema.org/Organization (27,292) | 32.47 MB (1) |
LandmarksOrHistoricalBuildings (sample) |
lookup_file pld_stats_file |
Language | Quads: 701,081,445 URLs: 5,783,390 Hosts: 10,513 | http://schema.org/Person (30,919,803)http://schema.org/Comment (24,954,078) http://schema.org/ListItem (12,274,809)http://schema.org/Language (9,191,675) http://schema.org/InteractionCounter (8,765,796) | 12.56 GB (7) |
Language (sample) |
lookup_file pld_stats_file |
Library | Quads: 6,597,774 URLs: 199,425 Hosts: 714 | http://schema.org/Library (215,364)http://schema.org/OpeningHoursSpecification (205,783) http://schema.org/PostalAddress (94,133)http://schema.org/ListItem (78,297) http://schema.org/Place (62,714) | 109.4 MB (1) |
Library (sample) |
lookup_file pld_stats_file |
LocalBusiness | Quads: 2,671,216,988 URLs: 37,243,804 Hosts: 1,197,063 | http://schema.org/ListItem (89,958,346)http://schema.org/LocalBusiness (55,898,742) http://schema.org/PostalAddress (51,249,632)http://schema.org/ImageObject (24,054,225) http://schema.org/OpeningHoursSpecification (19,182,321) | 34.85 GB (27) |
LocalBusiness (sample) |
lookup_file pld_stats_file |
Mountain | Quads: 300,982 URLs: 15,293 Hosts: 56 | http://schema.org/propertyValue (23,340)http://schema.org/Mountain (16,503) http://schema.org/GeoCoordinates (16,380)http://schema.org/ImageObject (2,323) http://schema.org/Place (838) | 5.63 MB (1) |
Mountain (sample) |
lookup_file pld_stats_file |
Movie | Quads: 232,715,373 URLs: 2,265,947 Hosts: 7,801 | http://schema.org/Person (10,501,940)http://schema.org/Movie (6,832,078) https://schema.org/Person (3,423,482)http://schema.org/ImageObject (2,789,665) http://schema.org/VideoObject (2,435,667) | 3.39 GB (3) |
Movie (sample) |
lookup_file pld_stats_file |
Museum | Quads: 6,137,731 URLs: 102,146 Hosts: 610 | http://schema.org/OpeningHoursSpecification (223,564)http://schema.org/Museum (113,752) http://schema.org/PostalAddress (91,909)http://schema.org/Event (68,844) http://schema.org/ListItem (62,544) | 83.71 MB (1) |
Museum (sample) |
lookup_file pld_stats_file |
MusicAlbum | Quads: 116,521,141 URLs: 901,157 Hosts: 16,829 | http://schema.org/MusicRecording (7,489,652)http://schema.org/Country (6,478,141) http://schema.org/Offer (2,588,062)http://schema.org/MusicAlbum (2,378,182) http://schema.org/AudioObject (2,209,240) | 1.07 GB (2) |
MusicAlbum (sample) |
lookup_file pld_stats_file |
MusicRecording | Quads: 194,556,455 URLs: 1,683,808 Hosts: 24,513 | http://schema.org/Country (16,492,792)http://schema.org/MusicRecording (13,576,880) http://schema.org/Offer (2,866,310)http://schema.org/AudioObject (2,533,762) http://schema.org/MusicGroup (2,344,164) | 1.8 GB (2) |
MusicRecording (sample) |
lookup_file pld_stats_file |
Organization | Quads: 40,108,866,817 URLs: 637,002,088 Hosts: 5,915,483 | http://schema.org/ListItem (1,085,683,835)http://schema.org/ImageObject (933,908,146) http://schema.org/Organization (840,543,926)http://schema.org/WebPage (429,047,446) http://schema.org/Person (408,926,073) | 679.48 GB (401) |
Organization (sample) |
lookup_file pld_stats_file |
Painting | Quads: 15,219,466 URLs: 137,718 Hosts: 525 | http://schema.org/Person (2,870,457)http://schema.org/Painting (619,613) http://schema.org/Offer (398,543)http://schema.org/ListItem (309,616) http://schema.org/Property (179,754) | 136.25 MB (1) |
Painting (sample) |
lookup_file pld_stats_file |
Park | Quads: 1,105,870 URLs: 13,473 Hosts: 280 | http://schema.org/PostalAddress (31,540)http://schema.org/OpeningHoursSpecification (29,531) http://schema.org/ListItem (16,167)http://schema.org/Park (15,215) http://schema.org/GeoCoordinates (13,676) | 16.83 MB (1) |
Park (sample) |
lookup_file pld_stats_file |
Person | Quads: 28,389,014,403 URLs: 396,876,413 Hosts: 4,342,475 | http://schema.org/ImageObject (707,265,548)http://schema.org/Person (685,309,482) http://schema.org/ListItem (587,832,845)http://schema.org/WebPage (356,489,547) http://schema.org/Organization (326,175,879) | 559.6 GB (284) |
Person (sample) |
lookup_file pld_stats_file |
Place | Quads: 3,732,184,081 URLs: 32,121,226 Hosts: 430,758 | http://schema.org/ListItem (94,624,361)http://schema.org/Place (93,816,638) http://schema.org/PostalAddress (78,329,151)http://schema.org/Event (52,594,074) http://schema.org/Person (38,433,126) | 51.62 GB (38) |
Place (sample) |
lookup_file pld_stats_file |
Product | Quads: 17,883,521,101 URLs: 256,873,876 Hosts: 2,551,754 | http://schema.org/Offer (568,611,575)http://schema.org/Product (502,424,891) http://schema.org/ListItem (500,361,108)http://schema.org/Organization (218,515,422) http://schema.org/ImageObject (142,962,903) | 257.16 GB (179) |
Product (sample) |
lookup_file pld_stats_file |
QAPage | Quads: 179,194,962 URLs: 3,174,379 Hosts: 11,172 | http://schema.org/Person (10,610,649)http://schema.org/Answer (6,971,763) http://schema.org/Question (2,617,916)http://schema.org/QAPage (2,434,624) http://schema.org/ListItem (2,264,667) | 4.25 GB (2) |
QAPage (sample) |
lookup_file pld_stats_file |
Question | Quads: 1,520,018,091 URLs: 14,257,595 Hosts: 254,857 | http://schema.org/Answer (52,752,628)http://schema.org/Question (47,219,749) http://schema.org/ImageObject (31,489,782)http://schema.org/ListItem (30,231,894) http://schema.org/Person (18,552,705) | 27.47 GB (16) |
Question (sample) |
lookup_file pld_stats_file |
RadioStation | Quads: 19,207,934 URLs: 393,145 Hosts: 967 | http://schema.org/ListItem (847,579)http://schema.org/RadioStation (449,935) http://schema.org/ImageObject (308,248)http://schema.org/NewsArticle (264,461) http://schema.org/BreadcrumbList (199,135) | 301.1 MB (1) |
RadioStation (sample) |
lookup_file pld_stats_file |
Recipe | Quads: 449,214,017 URLs: 4,090,428 Hosts: 40,702 | http://schema.org/HowToStep (13,702,473)http://schema.org/ImageObject (7,240,806) http://schema.org/Person (6,709,755)http://schema.org/ListItem (6,635,160) http://schema.org/Recipe (5,290,014) | 7.33 GB (5) |
Recipe (sample) |
lookup_file pld_stats_file |
Restaurant | Quads: 216,942,320 URLs: 1,599,457 Hosts: 63,121 | http://schema.org/Offer (6,963,370)http://schema.org/MenuItem (6,426,675) http://schema.org/Restaurant (4,675,743)http://schema.org/ListItem (3,907,235) http://schema.org/PostalAddress (3,561,463) | 2.35 GB (3) |
Restaurant (sample) |
lookup_file pld_stats_file |
RiverBodyOfWater | Quads: 167,629 URLs: 3,221 Hosts: 19 | http://schema.org/ImageObject (6,369)http://schema.org/ListItem (6,367) http://schema.org/Organization (4,276)http://schema.org/RiverBodyOfWater (3,239) http://schema.org/GeoCoordinates (2,726) | 4.06 MB (1) |
RiverBodyOfWater (sample) |
lookup_file pld_stats_file |
School | Quads: 12,547,288 URLs: 235,736 Hosts: 1,699 | http://schema.org/School (395,869)http://schema.org/ListItem (339,734) http://schema.org/PostalAddress (271,185)http://schema.org/ImageObject (140,270) http://schema.org/WebPage (122,865) | 182.74 MB (1) |
School (sample) |
lookup_file pld_stats_file |
ShoppingCenter | Quads: 10,680,910 URLs: 136,690 Hosts: 1,199 | http://schema.org/PostalAddress (207,036)http://schema.org/ShoppingCenter (202,312) http://schema.org/Offer (173,447)http://schema.org/Organization (165,687) http://schema.org/ListItem (148,766) | 135.23 MB (1) |
ShoppingCenter (sample) |
lookup_file pld_stats_file |
SkiResort | Quads: 1,475,134 URLs: 34,829 Hosts: 250 | http://schema.org/ListItem (50,178)http://schema.org/SkiResort (39,260) http://schema.org/PostalAddress (28,966)http://schema.org/Person (23,906) http://schema.org/Review (23,384) | 26.25 MB (1) |
SkiResort (sample) |
lookup_file pld_stats_file |
SportsEvent | Quads: 140,098,380 URLs: 905,879 Hosts: 6,743 | http://schema.org/SportsEvent (6,683,759)http://schema.org/Place (6,215,841) http://schema.org/SportsTeam (5,822,261)http://schema.org/PostalAddress (4,944,824) http://schema.org/Organization (1,993,490) | 1.18 GB (2) |
SportsEvent (sample) |
lookup_file pld_stats_file |
SportsTeam | Quads: 99,150,611 URLs: 734,718 Hosts: 3,818 | http://schema.org/SportsTeam (6,879,209)http://schema.org/SportsEvent (2,591,617) http://schema.org/Place (2,420,968)http://schema.org/Organization (1,837,178) http://schema.org/PostalAddress (1,788,906) | 876.31 MB (1) |
SportsTeam (sample) |
lookup_file pld_stats_file |
StadiumOrArena | Quads: 27,517,383 URLs: 78,382 Hosts: 253 | http://schema.org/Organization (1,118,034)http://schema.org/SportsTeam (997,719) http://schema.org/ImageObject (848,949)http://schema.org/BlogPosting (417,543) http://schema.org/SportsEvent (397,598) | 207.27 MB (1) |
StadiumOrArena (sample) |
lookup_file pld_stats_file |
TelevisionStation | Quads: 1,637,886 URLs: 20,631 Hosts: 91 | http://schema.org/ListItem (51,424)http://schema.org/ImageObject (31,052) http://schema.org/TelevisionStation (25,578)http://schema.org/Organization (23,518) http://schema.org/PostalAddress (22,709) | 24.68 MB (1) |
TelevisionStation (sample) |
lookup_file pld_stats_file |
TVEpisode | Quads: 68,894,417 URLs: 394,881 Hosts: 1,204 | http://schema.org/Country (5,010,019)https://schema.org/TVEpisode (3,390,748) http://schema.org/TVEpisode (1,626,080)http://schema.org/Person (866,730) http://schema.org/ListItem (408,453) | 703.18 MB (1) |
TVEpisode (sample) |
lookup_file pld_stats_file |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
We provide the extracted data for download using a variation of the N-Quads format. For users who prefer other formats, we provide code for converting the download files into CSV and JSON formats, which are supported by a wide range of spreadsheet applications, relational databases and data mining frameworks like the python data analysis library pandas. Please find further details on how to convert the download files to other formats on the main page.
The jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus can be checked out from our Git repository.
The extraction of the December 2022 was done with version 1.5 of the extractor. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.