Drupal 7 and the Semantic Web connection

The Semantic Web is the concept that you can add machine readable meaning to you website to make it easier for computers as well as humans to understand what is going on with your content. If you look at this page you will find it easy to tell which part of the page is blog content, where my name is etc., but for a machine this is not so easy, they will only see something like “bla bla bla TITLE bla bla bla” and often will be limited to trying to find content based on keywords. The whole Semantic Web idea is the subject of major research efforts and much debate over how far it can go, but the good news for Drupal site administrators and readers of their websites is that Drupal 7 will be offering support for some of this technology out of the box.

I have to admit I am no expert on the Semantic Web and am just starting to get my head around it. The extent to which it can be a success is a bit of a hotly contested one with some arguing that that trying to classify the information in a web site and expecting authors to stop and mark sections that contain such meaning is unrealistic. Fortunately it doesn't seem to be an all or nothing situation though and travelling a little way down the Semantic Web path could have some major advantages. Back in February 2009 a roadmap was drawn up to get some of this technology into Drupal core and the other day I got curious and decided to download a development copy of Drupal 7 to see some of these ideas in action.

In October 2008 Dries Buytaert the founder of the Drupal project first raised the idea of incorporating some Semantic Web technologies into Drupal and also explained a curious problem that many Drupal sites have (and I am sure many sites powered by other content management systems). If you at the actual database for a Drupal site you will see that the data in it is highly structured, there is no mistaking over what data relates to an author, what is a tag and what is content for example. When you add modules to enhance functionality these will also store their data in a highly structured fashion. This could be incredibly useful to search engines. The problem is that the way pages are rendered effectively filters out this structure, the web page become a filter between the database and web, filtering out information.

The solution to the issue was to incorporate metadata that could describe this structure in the web page itself using RDFa. This wraps key parts of the text on a page with a tag to describe what information it contains, for example the name of the author of a blog post is wrapped in a tag to make it obvious to a computer that it is reading an author name, not the name of someone mentioned in the post or just a random collection of keywords. The visiting computer does not have to guess what this special information means either, information is embedded to tell it what the the piece of information is and what it relates to, for example in this post “Liam Green-Hughes” is a name and its relationship to this blog post is “author”. Drupal uses FOAF to describe the property of “name” and SIOC to describe the concept of “author” in this context.

SIOC (pronounced shock) is a really interesting technology. Its aim is to find a way to describe social web sites such as blogs and forums. For example it can expose data to say that a page is not just random HTML but instead a forum containing posts by authors for example. A program use this information to construct to an alternative view of the forum. Its longer term aim is bridge the gap between these different social sites and use this data in new and interesting ways. For example I write for this site and also occasionally write for olnet.org, and maybe I will write for other sites in the future. It is currently difficult to work out when you see a name if it is the same person writing for all of these sites, having an unusual name is very useful in these circumstances! SIOC may be able to come to the rescue here by marking up the name with extra data to identify it as the same person and thus creating a machine readable link between the sites.

Having Drupal support these technologies out of the box is really important as it means that it will be possible to offer this data without a per site developer effort. It will become easier to justify providing semantic data as it will not mean diverting developer effort away from other parts of a website project which might be difficult for cost or time constrained project. With more data available it will be possible to build applications that use semantic data like Yahoo's SearchMonkey for example. As applications develop and semantic data becomes more in demand this sort of technology could rise in importance for future projects and make it from a “nice to have” feature through “desirable” and maybe even “essential”.

With all this in mind I gave Drupal 7 a go. First I installed Semantic Radar, an extension for Firefox that provides indicators in your status bar when semantic data is available. Clicking on one of these indicators links you to a page where you can explore the data. After setting up Drupal 7 I added a bit of test content and immediately noticed that “RDFa” had appeared in my status bar indicating the presence of this data. When exploring it I could see the author, title, tags and content had been extracted from the document, all with no extra effort from me! Not even installing extra modules (which looks like it will be easier on Drupal 7 anyway).

A major change in Drupal 7 is the introduction of fields. If you have used databases you will be familiar with this idea, you can set up content types with different elements. For example a product content type might have a stock number, a manufacturer and a price. This provides quite a lot of structure for the information in your website. At the moment it looks like this does not get separated out yet in the semantic data, but judging by the original roadmap it looks like this is the intention long term. If this was take place Drupal would become a very powerful publishing platform for the Sematic Web!

I have only scratched the surface of the vast subject of the Semantic Web here. Drupal 7 is shaping up to be a very exciting release and I look forward to working with it. If you want to know more about RDFa and what it can do (which is far more than I have even hinted at here) I would recommend you have a look at the excellent RDFa Primer from the World Wide Web Consortium available online at: http://www.w3.org/TR/xhtml-rdfa-primer/ and the RDFa Wiki at: http://rdfa.info/wiki/RDFa_Wiki. For more information on SIOC please see: http://sioc-project.org/.

Trackback URL for this post:

http://www.greenhughes.com/trackback/11147

Comments

Re: Drupal 7 and the Semantic Web connection

good stuff - i've been tinkering too! What do you think to the new admin interface? I like some bits, but some of it is a bit strange...

I can't wait for item 2 in the sprint roadmap here http://groups.drupal.org/node/19419 - that's when it becomes even more useful imho

Post new comment

Comments are always very welcome, but please note the following:
  • Comments on this web site are monitored for spam using Mollom. By posting a comment, you accept that your message and other personal details about you will be analysed and stored for anti-spam and quality monitoring purposes, in accordance with Mollom's privacy policy.
  • Please use your own name not a company or website name to submit comments. Your comment will be removed if you don't do this.
  • All links in comments will be marked with a no follow attribute. That means posting a link to your site here won't help your search engine rankings.
  • By submitting a comment you agree that your comment can be reproduced under the same licensing terms as the rest of the content on the site.
  • Comments can be removed at any time without explanation, but won't be removed just because you disagreed with something I said.
The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <q> <blockquote> <h1> <h2> <h3> <h4> <h5> <h6>
  • Lines and paragraphs break automatically.

More information about formatting options

Back to top