This page provides access to and statistics about class-specific subsets of the Schema.org data contained in the November 2015 version of the Web Data Commons Microdata corpus. The datasets are part of the Web Data Commons Schema.org Data Set Series
As many users are only interested in specific types of Schema.org data (like product data, event data, or address data), we have created class-specific subsets out of the complete Microdata corpus for a selection of schema.org classes. The subsets contain all instances of a specific class as well as all other data that is found on the webpages containing these instances. For example, a page containing data about a product might also contain reviews and offers for this product; a page containing data about an event might also contain data about the location of the event and the persons involved in the event. The data is represented in N-Quads format, meaning that the forth element of each quad contains the URL of the webpage from which the data was extracted.
Please note that
Class Name | Total Number of | Top Classes (Entity Count) | Total File Size | Quad File |
---|---|---|---|---|
http://schema.org/AdministrativeArea | Quads: 4,849,338 URLs: 91,914 Hosts: 130 | http://schema.org/City (445,468)http://schema.org/AdministrativeArea (209,305)http://schema.org/GeoCoordinates (91,632)http://schema.org/Country (80,834)http://schema.org/Continent (79,277) | 70.8 MB | schema_AdministrativeArea.gz (sample) |
http://schema.org/Airport | Quads: 113,014,885 URLs: 1,371,521 Hosts: 70 | http://schema.org/Airport (26,176,317)http://schema.org/Thing (1,152,832)http://schema.org/WebPage (384,308)http://schema.org/PostalAddress (18,003)http://schema.org/GeoCoordinates (9,483) | 1.8 GB | schema_Airport.gz (sample) |
http://schema.org/Book | Quads: 160,179,686 URLs: 3,763,154 Hosts: 2,324 | http://schema.org/Book (8,836,201)http://schema.org/Person (7,786,582)http://schema.org/Offer (3,499,192)http://schema.org/ScholarlyArticle (2,126,138)http://schema.org/Review (1,022,432) | 3.1 GB | schema_Book.gz (sample) |
http://schema.org/City | Quads: 22,111,248 URLs: 287,383 Hosts: 294 | http://schema.org/City (784,593)http://schema.org/GeoCoordinates (633,325)http://schema.org/PostalAddress (460,601)http://schema.org/Person (356,327)http://schema.org/Offer (328,818) | 392 MB | schema_City.gz (sample) |
http://schema.org/CollegeOrUniversity | Quads: 19,073,398 URLs: 499,640 Hosts: 348 | http://schema.org/CollegeOrUniversity (1,327,158)http://schema.org/Person (904,617)http://schema.org/CreativeWork (855,099)http://schema.org/PostalAddress (337,986)http://schema.org/AggregateRating (184,770) | 400 MB | schema_CollegeOrUniversity.gz (sample) |
http://schema.org/Continent | Quads: 3,669,937 URLs: 81,720 Hosts: 9 | http://schema.org/City (442,529)http://schema.org/AdministrativeArea (138,841)http://schema.org/GeoCoordinates (86,847)http://schema.org/Continent (82,720)http://schema.org/Country (81,813) | 47.6 MB | schema_Continent.gz (sample) |
http://schema.org/Country | Quads: 120,125,213 URLs: 641,031 Hosts: 289 | http://schema.org/MusicRecording (6,403,764)http://schema.org/LodgingBusinessAmenity (1,928,390)http://schema.org/Person (1,888,208)http://schema.org/UserComments (1,864,924)http://schema.org/Country (1,643,078) | 2 GB | schema_Country.gz (sample) |
http://schema.org/CreativeWork | Quads: 295,800,594 URLs: 6,454,246 Hosts: 44,339 | http://schema.org/CreativeWork (16,901,641)http://schema.org/Person (10,249,776)http://schema.org/Comment (5,465,208)http://schema.org/Organization (3,582,214)http://schema.org/WebPage (2,816,816) | 10.3 GB | schema_CreativeWork.gz (sample) |
http://schema.org/EducationalOrganization | Quads: 5,209,884 URLs: 143,572 Hosts: 1,224 | http://schema.org/EducationalOrganization (292,877)http://schema.org/PostalAddress (220,587)http://schema.org/MedicalScholarlyArticle (113,590)http://schema.org/GeoCoordinates (89,911)http://schema.org/EducationEvent (89,870) | 104.6 MB | schema_EducationalOrganization.gz (sample) |
http://schema.org/Event | Quads: 240,250,191 URLs: 1,574,622 Hosts: 12,429 | http://schema.org/Event (13,184,936)http://schema.org/Place (9,504,931)http://schema.org/PostalAddress (8,177,965)http://schema.org/GeoCoordinates (3,735,707)http://schema.org/AggregateOffer (3,332,782) | 4.2 GB | schema_Event.gz (sample) |
http://schema.org/GeoCoordinates | Quads: 637,232,864 URLs: 5,235,522 Hosts: 17,365 | http://schema.org/GeoCoordinates (25,900,663)http://schema.org/PostalAddress (25,194,078)http://schema.org/LocalBusiness (12,660,143)http://schema.org/AggregateRating (10,030,610)http://schema.org/Place (5,839,006) | 10.7 GB | schema_GeoCoordinates.gz (sample) |
http://schema.org/GovernmentOrganization | Quads: 1,049,453 URLs: 36,199 Hosts: 161 | http://schema.org/GovernmentOrganization (69,413)http://schema.org/PostalAddress (39,190)http://schema.org/Article (7,555)http://schema.org/Event (5,206)http://schema.org/NewsArticle (5,145) | 21.6 MB | schema_GovernmentOrganization.gz (sample) |
http://schema.org/Hospital | Quads: 10,857,143 URLs: 406,687 Hosts: 223 | http://schema.org/PostalAddress (625,422)http://schema.org/Hospital (514,304)http://schema.org/Physician (269,801)http://schema.org/MedicalSpecialty (143,512)http://schema.org/GeoCoordinates (126,877) | 203.1 MB | schema_Hospital.gz (sample) |
http://schema.org/Hotel | Quads: 291,506,752 URLs: 4,040,460 Hosts: 5,362 | http://schema.org/Hotel (23,297,263)http://schema.org/LandmarksOrHistoricalBuildings (15,568,363)http://schema.org/PostalAddress (3,413,459)http://schema.org/Review (3,140,368)http://schema.org/AggregateRating (2,999,410) | 5.7 GB | schema_Hotel.gz (sample) |
http://schema.org/JobPosting | Quads: 271,062,391 URLs: 2,045,084 Hosts: 3,656 | http://schema.org/JobPosting (25,507,180)http://schema.org/Place (19,149,557)http://schema.org/Organization (13,656,356)http://schema.org/Postaladdress (5,803,830)http://schema.org/PostalAddress (4,259,531) | 5.3 GB | schema_JobPosting.gz (sample) |
http://schema.org/LakeBodyOfWater | Quads: 210,129 URLs: 1,371 Hosts: 15 | http://schema.org/PostalAddress (10,379)http://schema.org/GeoCoordinates (10,320)http://schema.org/LakeBodyOfWater (3,135)http://schema.org/City (1,835)http://schema.org/Park (1,046) | 3.3 MB | schema_LakeBodyOfWater.gz (sample) |
http://schema.org/LandmarksOrHistoricalBuildings | Quads: 112,059,617 URLs: 769,388 Hosts: 84 | http://schema.org/LandmarksOrHistoricalBuildings (15,593,862)http://schema.org/Hotel (11,889,717)http://schema.org/Review (600,591)http://schema.org/Offer (458,892)http://schema.org/Organization (42,770) | 1.9 GB | schema_LandmarksOrHistoricalBuildings.gz (sample) |
http://schema.org/Language | Quads: 536,574 URLs: 3,517 Hosts: 162 | http://schema.org/SiteNavigationElement (17,739)http://schema.org/Language (6,831)http://schema.org/PostalAddress (5,425)http://schema.org/Organization (4,079)http://schema.org/WPFooter (3,713) | 13.5 MB | schema_Language.gz (sample) |
http://schema.org/Library | Quads: 1,289,802 URLs: 33,804 Hosts: 45 | http://schema.org/CreativeWork (57,293)http://schema.org/Library (42,633)http://schema.org/PostalAddress (39,664)http://schema.org/GeoCoordinates (25,722)http://schema.org/Place (24,707) | 20.2 MB | schema_Library.gz (sample) |
http://schema.org/LocalBusiness | Quads: 569,754,144 URLs: 6,280,198 Hosts: 77,659 | http://schema.org/LocalBusiness (31,690,304)http://schema.org/PostalAddress (25,683,431)http://schema.org/GeoCoordinates (12,859,248)http://schema.org/AggregateRating (9,752,425)http://schema.org/Product (6,397,651) | 9 GB | schema_LocalBusiness.gz (sample) |
http://schema.org/Mountain | Quads: 301,954 URLs: 2,138 Hosts: 12 | http://schema.org/GeoCoordinates (12,375)http://schema.org/Mountain (12,127)http://schema.org/PostalAddress (11,982)http://schema.org/Review (2,611)http://schema.org/City (1,569) | 4.7 MB | schema_Mountain.gz (sample) |
http://schema.org/Movie | Quads: 109,148,410 URLs: 1,412,757 Hosts: 3,395 | http://schema.org/Person (9,069,362)http://schema.org/Movie (5,647,480)http://schema.org/AggregateRating (1,056,819)http://schema.org/CreativeWork (648,533)http://schema.org/ImageGallery (594,508) | 2.4 GB | schema_Movie.gz (sample) |
http://schema.org/Museum | Quads: 2,544,434 URLs: 23,669 Hosts: 69 | http://schema.org/Painting (390,837)http://schema.org/Event (94,761)http://schema.org/PostalAddress (33,595)http://schema.org/Museum (29,096)http://schema.org/GeoCoordinates (26,620) | 44.1 MB | schema_Museum.gz (sample) |
http://schema.org/MusicAlbum | Quads: 251,633,850 URLs: 879,573 Hosts: 409 | http://schema.org/MusicRecording (22,619,712)http://schema.org/MusicAlbum (13,062,133)http://schema.org/Offer (8,586,854)http://schema.org/AudioObject (8,519,865)http://schema.org/Person (2,056,857) | 3.8 GB | schema_MusicAlbum.gz (sample) |
http://schema.org/MusicRecording | Quads: 318,158,175 URLs: 1,871,921 Hosts: 2,138 | http://schema.org/MusicRecording (31,348,530)http://schema.org/MusicAlbum (11,898,247)http://schema.org/AudioObject (8,750,393)http://schema.org/Offer (8,676,133)http://schema.org/Person (3,213,938) | 4.8 GB | schema_MusicRecording.gz (sample) |
http://schema.org/Organization | Quads: 2,681,017,265 URLs: 41,853,100 Hosts: 79,102 | http://schema.org/Organization (110,247,692)http://schema.org/Product (58,567,430)http://schema.org/TVSeries (50,436,187)http://schema.org/Offer (35,571,153)http://schema.org/AggregateRating (27,035,780) | 56.7 GB | schema_Organization.gz (sample) |
http://schema.org/Painting | Quads: 1,425,159 URLs: 11,980 Hosts: 69 | http://schema.org/Painting (400,189)http://schema.org/Person (12,955)http://schema.org/Comment (9,856)http://schema.org/Museum (4,311)http://schema.org/PostalAddress (4,127) | 29 MB | schema_Painting.gz (sample) |
http://schema.org/Park | Quads: 548,686 URLs: 3,890 Hosts: 39 | http://schema.org/PostalAddress (27,155)http://schema.org/GeoCoordinates (26,145)http://schema.org/Park (9,746)http://schema.org/City (3,943)http://schema.org/TouristAttraction (2,313) | 8.9 MB | schema_Park.gz (sample) |
http://schema.org/Person | Quads: 2,021,449,102 URLs: 25,637,330 Hosts: 74,427 | http://schema.org/Person (168,363,779)http://schema.org/UserComments (25,500,181)http://schema.org/Comment (21,193,122)http://schema.org/ImageObject (18,999,858)http://schema.org/Article (14,896,021) | 62 GB | schema_Person.gz (sample) |
http://schema.org/Place | Quads: 663,039,048 URLs: 5,590,863 Hosts: 22,738 | http://schema.org/Place (41,960,508)http://schema.org/JobPosting (18,783,162)http://schema.org/PostalAddress (18,598,576)http://schema.org/Organization (13,370,924)http://schema.org/Event (9,502,789) | 12.6 GB | schema_Place.gz (sample) |
http://schema.org/Product | Quads: 3,775,412,920 URLs: 47,888,512 Hosts: 108,387 | http://schema.org/Product (252,233,316)http://schema.org/Offer (193,846,906)http://schema.org/AggregateRating (59,608,310)http://schema.org/Review (30,653,561)http://schema.org/Rating (27,421,509) | 65.5 GB | schema_Product.gz (sample) |
http://schema.org/RadioStation | Quads: 1,065,412 URLs: 71,308 Hosts: 82 | http://schema.org/RadioStation (94,181)http://schema.org/PostalAddress (83,928)http://schema.org/Review (20,966)http://schema.org/Rating (20,910)http://schema.org/AggregateRating (12,973) | 20.3 MB | schema_RadioStation.gz (sample) |
http://schema.org/Recipe | Quads: 75,222,033 URLs: 1,589,075 Hosts: 8,944 | http://schema.org/Recipe (2,347,678)http://schema.org/AggregateRating (1,537,937)http://schema.org/Person (1,325,948)http://schema.org/NutritionInformation (883,679)http://schema.org/Comment (576,791) | 2.1 GB | schema_Recipe.gz (sample) |
http://schema.org/Restaurant | Quads: 20,157,626 URLs: 294,134 Hosts: 3,831 | http://schema.org/PostalAddress (857,827)http://schema.org/Restaurant (851,035)http://schema.org/LocalBusiness (299,868)http://schema.org/Review (267,789)http://schema.org/AggregateRating (245,252) | 383.3 MB | schema_Restaurant.gz (sample) |
http://schema.org/RiverBodyOfWater | Quads: 161,839 URLs: 1,311 Hosts: 9 | http://schema.org/PostalAddress (7,893)http://schema.org/GeoCoordinates (7,835)http://schema.org/RiverBodyOfWater (3,063)http://schema.org/City (1,004)http://schema.org/LakeBodyOfWater (589) | 2.6 MB | schema_RiverBodyOfWater.gz (sample) |
http://schema.org/School | Quads: 16,427,157 URLs: 318,668 Hosts: 200 | http://schema.org/PostalAddress (1,381,159)http://schema.org/School (1,237,255)http://schema.org/WebSite (157,554)http://schema.org/SearchAction (157,552)http://schema.org/Review (83,336) | 247.9 MB | schema_School.gz (sample) |
http://schema.org/ShoppingCenter | Quads: 594,623 URLs: 4,863 Hosts: 82 | http://schema.org/PostalAddress (27,239)http://schema.org/ShoppingCenter (25,679)http://schema.org/ClothingStore (12,270)http://schema.org/GeoCoordinates (6,338)http://schema.org/Restaurant (6,243) | 9.3 MB | schema_ShoppingCenter.gz (sample) |
http://schema.org/SkiResort | Quads: 78,737 URLs: 4,414 Hosts: 25 | http://schema.org/SkiResort (4,972)http://schema.org/PostalAddress (2,128)http://schema.org/GeoCoordinates (2,110)http://schema.org/AggregateRating (1,151)http://schema.org/Review (748) | 2.2 MB | schema_SkiResort.gz (sample) |
http://schema.org/SportsEvent | Quads: 25,762,126 URLs: 94,534 Hosts: 410 | http://schema.org/SportsEvent (1,302,541)http://schema.org/PostalAddress (1,046,485)http://schema.org/EventVenue (570,556)http://schema.org/SportsTeam/Soccer (312,853)http://schema.org/SportsAthlete/Soccer (312,853) | 363.7 MB | schema_SportsEvent.gz (sample) |
http://schema.org/SportsTeam | Quads: 9,624,458 URLs: 158,256 Hosts: 197 | http://schema.org/Article (527,658)http://schema.org/SportsTeam (465,610)http://schema.org/Person (434,668)http://schema.org/SportsMatchCompetitor (205,434)http://schema.org/SiteNavigationElement (198,154) | 201.6 MB | schema_SportsTeam.gz (sample) |
http://schema.org/StadiumOrArena | Quads: 12,330,509 URLs: 11,457 Hosts: 43 | http://schema.org/PostalAddress (913,351)http://schema.org/SportsEvent (685,605)http://schema.org/EventVenue (626,629)http://schema.org/StadiumOrArena (295,348)http://schema.org/MusicEvent (159,533) | 165.5 MB | schema_StadiumOrArena.gz (sample) |
http://schema.org/TelevisionStation | Quads: 58,691 URLs: 1,955 Hosts: 19 | http://schema.org/TelevisionStation (8,998)http://schema.org/PostalAddress (637)http://schema.org/Review (489)http://schema.org/Rating (482)http://schema.org/Event (326) | 1.4 MB | schema_TelevisionStation.gz (sample) |
http://schema.org/TVEpisode | Quads: 44,044,409 URLs: 472,303 Hosts: 244 | http://schema.org/TVEpisode (3,936,841)http://schema.org/Person (1,754,279)http://schema.org/TVSeries (605,323)http://schema.org/AggregateRating (304,805)http://schema.org/SiteNavigationElement (261,452) | 832 MB | schema_TVEpisode.gz (sample) |
In case you are interested in a particular class or set of classes which is not listed above, please get in contact with the WebDataCommons team via Mailing List or our Google Group.
The source code can be checked out from our Github repository. For more information about the framework and a detailed description how to run a own extraction visit the framework page.
Please send questions and feedback to the Web Data Commons mailing list or post them in our Web Data Commons Google Group.