An approach to consuming Linked Data with PHP

Extracting data from the web to use in our computer programs has always been a challenge. Many developers will be familiar with techniques such as Web Scraping, trying to parse a human readable web page and extract data and might dream of more reliable ways to query different sources for data in a standardised way. Linked Data is a proposed answer to this issue that seems to be gaining some momentum with data being exposed in this format by organisations such as the British Govenment and my own employer The Open University. So how do we query these resources and get the data into our PHP scripts?

Here I don't want to go into SPARQL or the technologies and discussions around Linked Data itself, but there are plenty of resources out there (if you have a favourite feel free to share it in the comments). A good way of getting started with the OU's Linked Data is to read my colleage Tony Hirst's post "data.open.ac.uk Arrives, With Linked Data Goodness" where he discusses this initiative, gives some example codes and experiment with queries using a web based tool. This post follows on and I will be using a query from Tony's post and getting the results into PHP.

After lots of searching the web and some mixed success using a library for PHP I found this post by John Wright: SPARQL Query In Code: REST, PHP And JSON [TUTORIAL]. I was very pleased to see this, finally a simple way to get hold of the results from a Linked Data source and a technique that could easily be adapted to work in other languages. In his example he was querying DBpedia and getting the results back in JSON format, but here I want to query the OU's data source and we'll be getting the result back in XML format, the precise format is detailed in the W3C document "SPARQL Query Results XML Format". So now our task can be split into three pieces:

  • Build the SPARQL query and turn it into a REST URL.
  • Send it to the OU's SPARQL endpoint
  • Parse the resulting XML and turn it into an array of results

For this example I'm going to use Tony's SPARQL query to get a list of podcasts related to an OU course, T209. The query is below:

SELECT distinct ?title ?description WHERE {
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse> <http://data.open.ac.uk/course/t209>.
?x <http://purl.org/dc/terms/title> ?title.
?x <http://www.w3.org/TR/2010/WD-mediaont-10-20100608/description> ?description } LIMIT 10

We can use the URL http://data.open.ac.uk/query as our endpoint and we must pass our query through through a parameter named, helpfully, "query". Here is the bit of PHP to do that:

$query =
"SELECT distinct ?title ?description WHERE {
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse> <http://data.open.ac.uk/course/t209>.
?x <http://purl.org/dc/terms/title> ?title.
?x <http://www.w3.org/TR/2010/WD-mediaont-10-20100608/description> ?description } LIMIT 10
";

$requestURL = 'http://data.open.ac.uk/query?query='.urlencode($query);
$response = request($requestURL);

You will see that I have used a function called request() to get the data back. This is the same function as appears in John Wright's code. If you were using a framework such as Zend Framework here you could replace this with the name of a function in your framework or even write your own (the full source of this example is at the end of this post). By the way, my PHP example here is not meant to be production quality code, you would need to handle errors such as the endpoint not being available or timing out.

Next up is parsing the XML data. Here I am going to use PHP's wonderful SimpleXML extension. When we get the result back there will be a <results> element in the XML containing multiple child <result> elements an example of which is:

<result>
  <binding name='title'>
     <literal q:qname='xsd:string' datatype='http://www.w3.org/2001/XMLSchema#string'>Downloading your soul</literal>
  </binding>
  <binding name='description'>
     <literal q:qname='xsd:string' datatype='http://www.w3.org/2001/XMLSchema#string'>Endless memory: Ian Pearson of BT Technologies describes the possibility of downloading the entire contents of your mind onto a computer.</literal>
  </binding>
</result>

Now we've got to the right place in the XML document we can loop through each <result> element and copy the values into our new array. We will use the "name" attribute of the <binding> element as the key of the cell and the <literal> element contents as the value:

// container for our data
$data = array();
// initialise SimpleXML object and load it with data
$xml = simplexml_load_string($response);
// get the <results> element
$results = $xml->results;

I've cut a few corners here and just forced the key and value into strings using a cast, otherwise these variables would be SimpleXML objects and these cannot be used as a key in an array. You could add extra code here to query the "q:qname" attribute of the <literal> element and do a more appropriate conversion. Now the data from the query should be in the $data variable. You can dump this to the screen with:

print_r($data);

This technique is a bit of a rough and ready way to use consume Linked Data with PHP but hopefully should be a useful starting point to enable the usage of these rich data sources with our own web scripts. As it does not rely on any third party libraries the same ideas could also be used with other languages without the need to import code, which could keep the size of programs down. I'm still learning about Linked Data so I am sure there are lots and lots of ways to improve this script and the approach. Any thoughts are very welcome!

<?php
function request($url){
 
  
// is curl installed?
  
if (!function_exists('curl_init')){
      die(
'CURL is not installed!');
   }
 
  
// get curl handle
  
$ch= curl_init();
  
// set request url
  
curl_setopt($ch, CURLOPT_URL, $url);
  
// return response, don't print/echo
  
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  
$response = curl_exec($ch);
  
curl_close($ch);
 
   return
$response;
}

$query =
"SELECT distinct ?title ?description WHERE {
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse> <http://data.open.ac.uk/course/t209>.
?x <http://purl.org/dc/terms/title> ?title.
?x <http://www.w3.org/TR/2010/WD-mediaont-10-20100608/description> ?description } LIMIT 10
"
;

$requestURL = 'http://data.open.ac.uk/query?query='.urlencode($query);

$response = request($requestURL);

// container for our data
$data = array();
// initialise SimpleXML object and load it with data
$xml = simplexml_load_string($response);
// get the <results> element
$results = $xml->results;
// loop through <result> elements and extract values   
foreach($results->result as $result) {
   
$line = array();
    foreach (
$result->binding as $binding) {
     
// could pick up xsd data type for right cast
     
$line[(string) $binding["name"]] = (string) $binding->literal;
    }
   
$data[] = $line;
}
print_r($data);
?>

Comments

Thanks for this Liam - very useful for me personally and great to see the first 'use' of data.open.ac.uk in the wild :)

Just to note that there is now a bit more information about data.open.ac.uk - what data are up there at the moment, what ontologies are being used, patterns for URIs etc - this is all in a blog post at http://lucero-project.info/lb/2010/10/first-version-of-data-open-ac-uk/

Hi Liam, I'm glad you found the tutorial useful! I like this approach because as you said it doesn't require depending on any libraries and it should work with any language/format. I like your sub header "life is too short for bad technology", so true!
Take care

Here is a slight modification to the script to allow URIs to be picked up as well as string literals, change the final nested foreach loop to:

foreach ($result->binding as $binding) {
           if (isset($binding->uri)) {
             $line[(string) $binding["name"]] = (string) $binding->uri;
           }
           else {
             // could pick up xsd data type for right cast
             $line[(string) $binding["name"]] = (string) $binding->literal;
           }
         }

Add new comment

Comments are always very welcome, but please note the following:
  • Sadly due to the high number of spam comments recently all comments are now manually moderated. You comment will therefore not appear on the site instantly.
  • Comments on this web site are monitored for spam using Mollom. By posting a comment, you accept that your message and other personal details about you will be analysed and stored for anti-spam and quality monitoring purposes, in accordance with Mollom's privacy policy.
  • Please use your own name not a company or website name to submit comments. Your comment will be removed if you don't do this.
  • All links in comments will be marked with a no follow attribute. That means posting a link to your site here won't help your search engine rankings.
  • By submitting a comment you agree that your comment can be reproduced under the same licensing terms as the rest of the content on the site.
  • Comments can be removed at any time without explanation, but won't be removed just because you disagreed with something I said.