Building a search engine for PlanetOU using Google CSE

The idea of PlanetOU was always to represent a community around an institution by aggregating blogs written by people connected to it (the idea is explained in full in my earlier post What planet are you from? PlanetOU of course!). It uses the core Aggregator module in Drupal to pull in content from about fifty different websites to make a constant stream view of the latest blog posts from OU staff bloggers. This approach works well, but a key point to note here is that when feed items are imported in this way Drupal does not create new node items (it isn't native content to the software that runs this site), and as such are not visible to the built-in search engine. These blogs are spread across many different domains (e.g. only a few people on the TwitterLeague for OU people use the OU's blogging facilities) and so are not indexed by any institutional search engine. Fortunately there is now a solution to this challenge; a Google Custom Search Engine (Google CSE).

The Google CSE allows you to set up a search engine, that can be embedded into other pages or used through its own homepage, that only searches for results in a specific list of sites that you supply to it. This is a very handy way to restrict a search to sites that may produce a good result in a way that is really easy to use for an end user. If you want to search the wider web you can just change an option at the top of the page and get access to wider results. Using this technology you can easily build a search engine that relates to a particular community, interest or idea without having to go the expense of running your own search engine. Very handy stuff. On this occasion I built a search facility that covers the PlanetOU blogs, but I could just as easily built a search engine that relatated to British ice hockey.

PlanetOU Search Home (screenshot)

Setting the engine up did not really require any advanced developer knowledge, once you click on the big "Create a Custom Search Engine" button you get prompted for a few basic details including the sites you want to include. You can chose to not restrict the search to these sites but just emphasise them, but on this occasion I selected the default behaviour as just to search specific websites. To get the initial list I did a database query to extract the web addresses of the blogs currently being syndicated. The other details to include are the name and some details for the search home page. To get the search engine only takes a few minutes and costs nothing. In fact you can even make money from advertising revenue!

There are many tweaks and options that can be applied to the search engine after you set it up, but I have found it works pretty well with basic setup. One I don't quite understand yet is how to keep the list of sites updated automatically from a feed (preferably an OPML feed), but this is something I will take a look at when I get chance. You can see the results for yourself by visiting the PlanetOU Search Homepage at: http://www.google.com/coop/cse?cx=008781539411680276935:unbkxp_ku84. The only thing I really didn't like about Google CSE was the URL it created for the search homepage. It would be much nicer to have a more human friendly URL.

Being able to create a search page around a group of diverse sites is a great idea whether used in a community sense or maybe around a shared interest. In the past this functionality would have been difficult to obtain, but now it is quicker to set up than for me to explain the idea to someone else, or to explain that this is, of course, entirely unofficial and unsupported! 

Trackback URL for this post:

http://www.greenhughes.com/trackback/2523

Comments

Post new comment

Comments are always very welcome, but please note the following:
  • Comments on this web site are monitored for spam using Mollom. By posting a comment, you accept that your message and other personal details about you will be analysed and stored for anti-spam and quality monitoring purposes, in accordance with Mollom's privacy policy.
  • Please use your own name not a company or website name to submit comments. Your comment will be removed if you don't do this.
  • All links in comments will be marked with a no follow attribute. That means posting a link to your site here won't help your search engine rankings.
  • By submitting a comment you agree that your comment can be reproduced under the same licensing terms as the rest of the content on the site.
  • Comments can be removed at any time without explanation, but won't be removed just because you disagreed with something I said.
The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img> <q> <blockquote> <h1> <h2> <h3> <h4> <h5> <h6>
  • Lines and paragraphs break automatically.

More information about formatting options

Back to top