This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the November 2017 version of the Web Data Commons Microdata corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, or address data), we have created class-specific subsets out of the complete Microdata corpus for a selection of schema.org classes. The subsets contain all instances of a specific class as well as all other data that is found on the webpages containing these instances. For example, a page containing data about a product might also contain reviews and offers for this product; a page containing data about an event might also contain data about the location of the event and the persons involved in the event. The data is represented in N-Quads format, meaning that the forth element of each quad contains the URL of the webpage from which the data was extracted.
Please note that
Class Name | Total Number of | Top Classes (Entity Count) | Total File Size | Quad File |
---|---|---|---|---|
http://schema.org/AdministrativeArea | Quads: 15,915,728 URLs: 206,143 Hosts: 301 | http://schema.org/City (822,377)http://schema.org/AdministrativeArea (501,176)http://schema.org/CityHall (162,397)http://schema.org/ListItem (161,751)http://schema.org/Service (128,219) | 266.3 MB | schema_AdministrativeArea.gz (sample) |
http://schema.org/Airport | Quads: 11,889,956 URLs: 164,664 Hosts: 132 | http://schema.org/Airport (2,846,502)http://schema.org/Place (97,194)http://schema.org/Flight (65,685)http://schema.org/GeoCoordinates (32,079)http://schema.org/ListItem (22,230) | 161.2 MB | schema_Airport.gz (sample) |
http://schema.org/Book | Quads: 355,749,038 URLs: 8,612,144 Hosts: 7,402 | http://schema.org/Book (21,437,601)http://schema.org/Offer (9,480,071)http://schema.org/Person (9,217,519)http://schema.org/ListItem (4,224,099)http://schema.org/AggregateRating (2,064,487) | 8.3 GB | schema_Book.gz (sample) |
http://schema.org/City | Quads: 134,336,690 URLs: 626,881 Hosts: 850 | http://schema.org/GeoCoordinates (5,486,532)http://schema.org/PostalAddress (5,413,186)http://schema.org/LocalBusiness (4,915,558)http://schema.org/Person (4,504,999)http://schema.org/City (3,784,668) | 1.9 GB | schema_City.gz (sample) |
http://schema.org/CollegeOrUniversity | Quads: 197,217,532 URLs: 746,745 Hosts: 664 | http://schema.org/Organization (11,372,563)http://schema.org/Person (9,662,369)http://schema.org/CollegeOrUniversity (8,759,942)http://schema.org/PostalAddress (5,582,453)http://schema.org/GeoCoordinates (5,349,424) | 2.9 GB | schema_CollegeOrUniversity.gz (sample) |
http://schema.org/Continent | Quads: 5,781,203 URLs: 73,939 Hosts: 17 | http://schema.org/City (731,339)http://schema.org/AdministrativeArea (303,342)http://schema.org/Place (99,039)http://schema.org/GeoCoordinates (78,425)http://schema.org/Country (76,246) | 71.1 MB | schema_Continent.gz (sample) |
http://schema.org/Country | Quads: 91,785,020 URLs: 469,394 Hosts: 804 | http://schema.org/Country (4,445,475)http://schema.org/Person (3,258,729)http://schema.org/PostalAddress (2,920,998)http://schema.org/Rating (2,537,527)http://schema.org/Review (2,526,032) | 1.6 GB | schema_Country.gz (sample) |
http://schema.org/CreativeWork | Quads: 616,442,965 URLs: 11,851,424 Hosts: 171,817 | http://schema.org/CreativeWork (24,486,210)http://schema.org/Person (20,397,070)http://schema.org/Comment (9,554,245)http://schema.org/TelevisionChannel (6,818,740)http://schema.org/SiteNavigationElement (6,400,561) | 27.2 GB | schema_CreativeWork.gz (sample) |
http://schema.org/EducationalOrganization | Quads: 9,385,781 URLs: 214,844 Hosts: 3,142 | http://schema.org/EducationalOrganization (371,542)http://schema.org/Person (284,633)http://schema.org/PostalAddress (267,969)http://schema.org/Place (135,195)http://schema.org/ListItem (79,756) | 172.7 MB | schema_EducationalOrganization.gz (sample) |
http://schema.org/Event | Quads: 263,504,427 URLs: 4,565,851 Hosts: 65,114 | http://schema.org/Event (21,359,059)http://schema.org/Place (12,927,463)http://schema.org/PostalAddress (7,759,732)http://schema.org/Offer (1,374,495)http://schema.org/Person (1,122,117) | 5.6 GB | schema_Event.gz (sample) |
http://schema.org/GeoCoordinates | Quads: 858,240,131 URLs: 8,799,691 Hosts: 50,871 | http://schema.org/GeoCoordinates (34,893,781)http://schema.org/PostalAddress (32,573,487)http://schema.org/LocalBusiness (14,230,960)http://schema.org/Place (12,920,066)http://schema.org/Offer (11,308,063) | 14 GB | schema_GeoCoordinates.gz (sample) |
http://schema.org/GovernmentOrganization | Quads: 2,849,188 URLs: 91,640 Hosts: 331 | http://schema.org/GovernmentOrganization (136,583)http://schema.org/ListItem (82,208)http://schema.org/PostalAddress (43,621)http://schema.org/WebPage (18,349)http://schema.org/Event (14,767) | 64.1 MB | schema_GovernmentOrganization.gz (sample) |
http://schema.org/Hospital | Quads: 3,277,313 URLs: 90,644 Hosts: 361 | http://schema.org/PostalAddress (165,337)http://schema.org/Hospital (149,765)http://schema.org/ListItem (63,968)http://schema.org/Place (58,126)http://schema.org/GeoCoordinates (48,088) | 57.9 MB | schema_Hospital.gz (sample) |
http://schema.org/Hotel | Quads: 161,254,476 URLs: 5,594,793 Hosts: 7,494 | http://schema.org/Hotel (10,302,877)http://schema.org/PostalAddress (4,919,871)http://schema.org/ListItem (4,756,155)http://schema.org/AggregateRating (3,811,134)http://schema.org/ImageObject (1,392,820) | 3.5 GB | schema_Hotel.gz (sample) |
http://schema.org/JobPosting | Quads: 266,933,002 URLs: 2,295,403 Hosts: 7,023 | http://schema.org/JobPosting (23,597,716)http://schema.org/Place (16,792,907)http://schema.org/PostalAddress (12,561,300)http://schema.org/Organization (5,982,465)http://schema.org/ListItem (1,903,144) | 6.2 GB | schema_JobPosting.gz (sample) |
http://schema.org/LakeBodyOfWater | Quads: 90,108 URLs: 633 Hosts: 21 | http://schema.org/GeoCoordinates (3,674)http://schema.org/PostalAddress (3,648)http://schema.org/LakeBodyOfWater (1,697)http://schema.org/PropertyValue (1,514)http://schema.org/City (1,269) | 1.5 MB | schema_LakeBodyOfWater.gz (sample) |
http://schema.org/LandmarksOrHistoricalBuildings | Quads: 2,208,810 URLs: 115,230 Hosts: 194 | http://schema.org/LandmarksOrHistoricalBuildings (149,824)http://schema.org/PostalAddress (128,601)http://schema.org/GeoCoordinates (32,760)http://schema.org/ImageObject (30,982)http://schema.org/Offer (18,081) | 45.3 MB | schema_LandmarksOrHistoricalBuildings.gz (sample) |
http://schema.org/Language | Quads: 7,327,523 URLs: 95,606 Hosts: 454 | http://schema.org/SiteNavigationElement (194,315)http://schema.org/Language (124,807)http://schema.org/PostalAddress (110,926)http://schema.org/GeoCoordinates (107,459)http://schema.org/Organization (69,782) | 158.7 MB | schema_Language.gz (sample) |
http://schema.org/Library | Quads: 794,835 URLs: 16,471 Hosts: 188 | http://schema.org/Library (31,396)http://schema.org/PostalAddress (27,365)http://schema.org/Organization (16,693)http://schema.org/Offer (10,098)http://schema.org/SiteNavigationElement (8,449) | 12.9 MB | schema_Library.gz (sample) |
http://schema.org/LocalBusiness | Quads: 1,144,571,235 URLs: 20,364,380 Hosts: 230,844 | http://schema.org/LocalBusiness (66,826,984)http://schema.org/PostalAddress (52,820,628)http://schema.org/Person (32,456,801)http://schema.org/GeoCoordinates (13,414,986)http://schema.org/ListItem (11,676,565) | 19.9 GB | schema_LocalBusiness.gz (sample) |
http://schema.org/Mountain | Quads: 536,575 URLs: 15,901 Hosts: 38 | http://schema.org/Mountain (31,219)http://schema.org/GeoCoordinates (19,691)http://schema.org/PostalAddress (4,716)http://schema.org/Review (3,321)http://schema.org/City (575) | 8.5 MB | schema_Mountain.gz (sample) |
http://schema.org/Movie | Quads: 164,589,867 URLs: 3,719,152 Hosts: 6,010 | http://schema.org/Person (12,496,472)http://schema.org/Movie (8,499,513)http://schema.org/AggregateRating (2,600,319)http://schema.org/ImageObject (898,326)http://schema.org/Organization (716,639) | 3.6 GB | schema_Movie.gz (sample) |
http://schema.org/Museum | Quads: 2,057,104 URLs: 33,132 Hosts: 234 | http://schema.org/GeoCoordinates (61,202)http://schema.org/Review (57,210)http://schema.org/Museum (54,193)http://schema.org/PostalAddress (48,488)http://schema.org/AggregateRating (26,501) | 43.4 MB | schema_Museum.gz (sample) |
http://schema.org/MusicAlbum | Quads: 94,295,777 URLs: 1,194,310 Hosts: 7,992 | http://schema.org/MusicRecording (8,843,652)http://schema.org/Country (3,074,299)http://schema.org/MusicAlbum (2,768,809)http://schema.org/ListItem (1,579,226)http://schema.org/MusicGroup (1,434,431) | 1.4 GB | schema_MusicAlbum.gz (sample) |
http://schema.org/MusicRecording | Quads: 145,978,352 URLs: 2,175,324 Hosts: 4,749 | http://schema.org/MusicRecording (16,436,479)http://schema.org/Country (3,731,689)http://schema.org/MusicGroup (2,191,117)http://schema.org/ListItem (1,840,130)http://schema.org/MusicAlbum (1,592,186) | 2.3 GB | schema_MusicRecording.gz (sample) |
http://schema.org/Organization | Quads: 839,872,521 URLs: 68,187,331 Hosts: 352,669 | http://schema.org/Organization (148,575,835)http://schema.org/Offer (53,476,589)http://schema.org/Product (48,834,225)http://schema.org/PostalAddress (48,200,710)http://schema.org/ListItem (38,635,887) | 87.1 GB | schema_Organization.gz (sample) |
http://schema.org/Painting | Quads: 2,656,918 URLs: 70,899 Hosts: 210 | http://schema.org/Painting (282,229)http://schema.org/Person (109,642)http://schema.org/Offer (33,257)http://schema.org/Comment (28,458)http://schema.org/UserComments (18,940) | 67.6 MB | schema_Painting.gz (sample) |
http://schema.org/Park | Quads: 491,176 URLs: 5,350 Hosts: 103 | http://schema.org/GeoCoordinates (29,150)http://schema.org/PostalAddress (15,272)http://schema.org/Park (14,867)http://schema.org/Museum (5,481)http://schema.org/City (3,890) | 9 MB | schema_Park.gz (sample) |
http://schema.org/Person | Quads: 1,494,835,164 URLs: 41,801,740 Hosts: 289,232 | http://schema.org/Person (205,659,159)http://schema.org/ImageObject (30,957,223)http://schema.org/Comment (27,154,959)http://schema.org/Organization (25,225,064)http://schema.org/Article (20,419,037) | 84.2 GB | schema_Person.gz (sample) |
http://schema.org/Place | Quads: 888,169,505 URLs: 10,411,701 Hosts: 78,010 | http://schema.org/Place (52,139,212)http://schema.org/PostalAddress (37,829,439)http://schema.org/JobPosting (16,693,597)http://schema.org/Event (13,393,794)http://schema.org/Offer (12,674,341) | 18 GB | schema_Place.gz (sample) |
http://schema.org/Product | Quads: 6,321,909,578 URLs: 112,695,547 Hosts: 581,482 | http://schema.org/Product (444,760,713)http://schema.org/Offer (365,577,281)http://schema.org/AggregateRating (46,793,199)http://schema.org/Organization (32,839,969)http://schema.org/Review (23,361,605) | 135 GB | schema_Product.gz schemaMD_Product_chunks.list (sample) |
http://schema.org/RadioStation | Quads: 1,314,016 URLs: 96,153 Hosts: 138 | http://schema.org/RadioStation (109,761)http://schema.org/PostalAddress (34,133)http://schema.org/ImageObject (13,622)http://schema.org/MusicVideoObject (13,552)http://schema.org/VideoObject (13,528) | 29 MB | schema_RadioStation.gz (sample) |
http://schema.org/Recipe | Quads: 108,908,798 URLs: 2,757,523 Hosts: 25,111 | http://schema.org/Recipe (4,415,586)http://schema.org/Person (1,141,445)http://schema.org/ListItem (1,126,462)http://schema.org/AggregateRating (1,091,320)http://schema.org/NutritionInformation (503,663) | 3.4 GB | schema_Recipe.gz (sample) |
http://schema.org/Restaurant | Quads: 82,228,482 URLs: 677,878 Hosts: 11,979 | http://schema.org/Review (4,294,997)http://schema.org/Rating (4,137,198)http://schema.org/Person (3,888,557)http://schema.org/Restaurant (1,676,161)http://schema.org/Product (1,150,037) | 1.7 GB | schema_Restaurant.gz (sample) |
http://schema.org/RiverBodyOfWater | Quads: 71,740 URLs: 763 Hosts: 13 | http://schema.org/GeoCoordinates (3,301)http://schema.org/PostalAddress (3,150)http://schema.org/RiverBodyOfWater (1,513)http://schema.org/NewsArticle (998)http://schema.org/City (546) | 1.1 MB | schema_RiverBodyOfWater.gz (sample) |
http://schema.org/School | Quads: 2,601,454 URLs: 78,274 Hosts: 365 | http://schema.org/School (201,657)http://schema.org/PostalAddress (66,513)http://schema.org/Review (52,890)http://schema.org/Rating (51,050)http://schema.org/Person (43,465) | 65.3 MB | schema_School.gz (sample) |
http://schema.org/ShoppingCenter | Quads: 3,136,544 URLs: 30,079 Hosts: 151 | http://schema.org/PostalAddress (157,508)http://schema.org/ShoppingCenter (120,387)http://schema.org/Product (112,094)http://schema.org/Offer (111,799)http://schema.org/ClothingStore (64,668) | 49.6 MB | schema_ShoppingCenter.gz (sample) |
http://schema.org/SkiResort | Quads: 347,463 URLs: 35,869 Hosts: 54 | http://schema.org/SkiResort (37,568)http://schema.org/AggregateRating (23,652)http://schema.org/Review (5,215)http://schema.org/Person (3,613)http://schema.org/Rating (1,618) | 8.5 MB | schema_SkiResort.gz (sample) |
http://schema.org/SportsEvent | Quads: 43,627,478 URLs: 400,719 Hosts: 1,571 | http://schema.org/SportsEvent (2,651,794)http://schema.org/Place (1,884,042)http://schema.org/SportsTeam (1,506,391)http://schema.org/PostalAddress (1,160,975)http://schema.org/SiteNavigationElement (353,692) | 660.4 MB | schema_SportsEvent.gz (sample) |
http://schema.org/SportsTeam | Quads: 25,217,786 URLs: 306,609 Hosts: 1,085 | http://schema.org/SportsTeam (2,078,054)http://schema.org/Person (818,624)http://schema.org/SportsEvent (813,244)http://schema.org/Place (587,103)http://schema.org/SportsMatchCompetitor (397,568) | 396.6 MB | schema_SportsTeam.gz (sample) |
http://schema.org/StadiumOrArena | Quads: 2,530,400 URLs: 21,937 Hosts: 96 | http://schema.org/Person (240,227)http://schema.org/StadiumOrArena (64,721)http://schema.org/PostalAddress (62,983)http://schema.org/SportsTeam (45,687)http://schema.org/SportsEvent (30,982) | 38 MB | schema_StadiumOrArena.gz (sample) |
http://schema.org/TVEpisode | Quads: 49,090,967 URLs: 756,921 Hosts: 508 | http://schema.org/TVEpisode (3,812,335)http://schema.org/Person (1,365,567)http://schema.org/TVSeries (817,184)http://schema.org/SiteNavigationElement (776,574)http://schema.org/TVSeason (470,736) | 889.9 MB | schema_TVEpisode.gz (sample) |
http://schema.org/TelevisionStation | Quads: 679,578 URLs: 26,765 Hosts: 31 | http://schema.org/TelevisionStation (37,016)http://schema.org/PostalAddress (20,412)http://schema.org/Article (19,783)http://schema.org/AggregateRating (19,308)http://schema.org/GeoCoordinates (1,031) | 9.6 MB | schema_TelevisionStation.gz (sample) |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
We analyzed the adoption of important properties for the classes schema.org/Product, schema.org/JobPosting, schema.org/Hotel and schema.org/LocalBusiness over the period of three years (2015-2017). In general, we observe that more and more websites use structured data to describe content referring to these four domains. You can find the detailed statistics in the schema.org_SubsetsAnalysis Excel file (33kb).
The source code can be checked out from our Github repository. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.