Rich snippets are all the rage these days. Ever since Google started
enhancing their search results with these extra tidbits of information,
everyone is rushing to update their web sites with the metadata to
enable them. So what is the benefit of having a “rich” search result for
your site? Good question. Other than giving the search engine user a
little bit of extra bit of detail, I suppose there’s also a subtle
psychological factor that kicks in. Someone might be more inclined to
click on a search engine result that has a 5 star rating and a friendly
face than one that doesn’t. Plus, they’re just plain cool. Who doesn’t
want to add bling to their search results? But this only scratches the
surface. There’s much much more to them than that.
Instant information aggregation: It’s only a matter of semantics
Rich Snippets, as Google calls them, are actually semantic markup. The
idea of marking up some sort of document with meta information for the
benefit of machines is not a new idea. Semantic markup is as old as
information technology its self. For example, a Word document contains
metadata about its author, and a digital photo contains meta data about
the camera it was taken with. You might, for instance, store your
digital snapshots in a photo archiving program which uses this semantic
data to filter your photos by date taken, lens type, flash used, etc.
So, in essence, metadata is data about data.
It’s should be clear, then, how this “data about data” can be extremely
useful to search engines. It can provide a search engine the ability to
derive a semantic meaning from a document’s meta
information rather than having to rely purely on the abstract, human
understandable, concepts within the text of the document. Searches can
become less about keywords in text documents and more about
relationships between semantical data types.
To illustrate this point further, consider the following search: Find
all restaurants with a 3.5 star or better rating on the Las Vegas strip
that specialize in Italian OR Mexican cuisine AND are open after 11 PM
on Sunday nights AND do NOT require reservations. On the
semantic web, rather than a list of links to restaurant web sites that
may or may not match your given criteria, you might get a list of
“restaurant result objects” that DO match exactly
that criteria and never even have to visit the restaurant’s web site.
This is where the real power of semantic data lies. Instant information
This “semantic web”, also, is not a new idea. In fact, Tim Berners-Lee
himself envisioned the world wide web as a kind of “Semantic Network
Model” and even the earliest HTML specifications included the concept of
meta tags, which you are undoubtedly familiar with. Later iterations,
such as XHTML, took this idea a step further. Most notably is the RDFa
specification, which has been around for quite some time.
These later meta data specifications brought with them the concept of a hierarchical
type system. Within these type systems each data type
(i.e an abstract representation of a real world object) might have any
number of subtypes. So, you might have a base level
abstraction such as “Thing” which has a derived type such as a
“Business” and then a further derived type such as a “Store”, and
further still a more specific type of store such as “Book Store.”
The latest and greatest HTML5 introduces yet another form of semantic
web data called Microdata. You can view the documentation on http://schema.org.
Whether or not Microdata will supplant the earlier semantic markups
seems unclear at this point. However, the really interesting thing about
Microdata is that all the major search engines have helped develop it
and are beginning to standardize on it. E.g Google currently recommends
its use for its rich snippets.
Adding microdata to your html does require a little leg work, but its
advantages are likely worth the effort in the long term as it appears
search engines are starting to make use of it for some really
interesting stuff such as Rich Snippets. I have a feeling we’ll be
seeing other uses for it as well as HTML5 starts to really take hold.
Marking up your pages with Microdata
Adding Microdata to your html is a relatively painless process, but you
do need to consider the semantical relationships between the data on
your page. Essentially, each element you want to specify as an “item”
has an itemscope (i.e, where the item begins and ends)
and a number of descriptive properties which may either
be primitive types such as strings, dates, or numbers, or which may be
other “items” which in turn have their own item scope and properties.
Consider the following example of a “Book.” A Book has an author
property, which is a Person item. In addition the Book has a publisher
Property which is an Organization item. Thus, we have a single item that
has a relationship between two other data types.
<div class="listItem" itemprop="itemListElement" itemscope itemtype="http://schema.org/Book"> <img itemprop="image" src="footfall.jpg" /> <div class="details"> <h3 itemprop="name">Footfall</h3> <div class="arating" itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating"> <span itemprop="ratingValue">3.9</span> stars based on <span itemprop="reviewCount">25</span> reviews </div> <p itemprop="description"> They first appear as a series of dots on astronomical plates, heading from Saturn directly toward Earth... </p> <div class="authInfo"> <span itemprop="author" itemscope itemtype="http://schema.org/Person"> By <span itemprop="name">Larry Niven</span> </span> and <span itemprop="author" itemscope itemtype="http://schema.org/Person"> <span itemprop="name">Jerry Pournell</span> </span> </div> <div class="pubInfo" itemprop="publisher" itemscope itemtype="http://schema.org/Organization"> <strong>Published by:</strong> <span itemprop="name">DelRey Books</span> </div> <div class="clr"></div> </div> <div class="clr"></div> </div>
What type you actually ascribe to your items is, of course, dependent on
your content. The type hierarchy on schema.org is fairly extensive, but
it also allows you to define your own types, if you need finer grained
data types than what the current spec provides.
As for the rest of the nitty gritty details, well the documentation at schema.org
is pretty clear so I won’t rehash it here.
A Microdata Parser
To help with marking up your web pages with HTML5 microdata, and for
discovering microdata embedded on other websites, I have developed a schema
tool which can parse a page and give you a graphical overview of the
data types embedded within the html. You can see the relationships
between the semantic types on your page, as well as all their
properties. Clicking on an object in the diagram will display the full
list of properties and the html it parsed.
This simple web page is marked up with
View it with the microdata parser here:
You can see each Microdata item embeded within the html.
Meta Utopia or Meta Baloney?
It seems unclear at this point whether the idea of a truly “semantic
web” will ever be fully realized. The idea of being able to search not
just for keywords in text documents, but for many different related
types of data matching a vast array of criteria holds great potential.
However, the skeptic inside me sees a few problems.
Not all abstract concepts can easily be described within a simple
schema. E.g what is “art”
The data has to be abundant. Creating meta data and
classifications is extra work with no real immediate benefit. There’s
currently no real “killer application” that rewards people for adding
semantic markup to their pages, so the vast majority won’t see any
need to do it.
The data has to be reliable. Spammers will inevitably exploit
it thereby making the data useless. I already see pages ascribing
types to data that doesn’t fit the schema. For the semantic web to
function the data must be reliable.
Ultimately, only time will tell if the semantic web evolves like its
proponents envision. In the mean time, it is definitely worth
experimenting with microdata, if only for the rich snippet goodness.