Linking a podcast site into MythStream using OPML (the OU MythStream script revisited)

The OPML iconBack in December 2008 I wrote a small perl script to enable you to enjoy podcasts from the Open University in MythStream, an add on for MythTV that enables you to watch streaming video content through MythTV. The OU's podcasts site has a number of RSS feeds that relate to the varioud subject areas that the podcasts covered and to a number of containing sections like OU Life and OU Research. At the time the script was written there was no easy overall way to autodiscover all of these feeds and tie them together, so I wrote a bit of code that would work this out from the menu rendered on the right hand side. This sort of screen scraping technique is great as a short term way to get the data we need, but the problem is that it is using output that was intended for a human to read rather than a machine to process. This sort of process can easily break if the layout of the page changes. To solve this problem I've been working with Chris Valentine of the Knowledge Media Institite at the OU who has kindly provided a better way to extract this information (many thanks Chris!).

The podcast menu on the website and imported into MythStreamTo save having to do screen scraping, Chris has put together an OPML file that describes the content of those menu. As this is based on XML it can be parsed by our script and the information easily extracted to use in our output for MythStream. OPML is a content syndication format that is designed to represent an outline, or hierarchy of RSS feeds. This is a perfect way to represent the menu structure of the podcasts site and make sure that this same structure can be picked up by our script and show by MythStream so that the experience is kept consistent across both the podcasts site and on your TV with MythTV. It will be the same menu items you choose to get to the podcast you want. As this feed is specifically designed to be used by other programs, it can be encoded in a way to make it much easier and more reliable to parse, so a change in the layout or design of the podcasts site won't cause a problem in our application

OPML isn't as standardised as RSS, so it can can be difficult to extract all of the information from it without knowing the details of the site it relates to. The <outline> element can have attributes that might only relate to that site making data portability a little more difficult, however for this sort of application this is not too much of a problem as we are writing a script to extract data from a specific site. It allows us to extract information about the contents and structure of a podcasts site in a simple and efficient way which means we can bring these contents to new interfaces and new audiences. This has uses other than MythStream as well, the same technique could be adapted for streaming media capable clients like Boxee.

The new script is attached to this post, to use it follow the instructions in Getting Open University Podcasts on your TV with MythStream. Comments are very welcome, and even though this script relates to OU podcasts it could be adapted to other situations.


Binary Data KB


I think this would be a great feature for my Mythbuntu, but I can't get it to work. I copied the into the right folder and added the stream as you wrote in the post before. But it tells me to test the parser on the command line. :(
When I run it in a terminal by "perl", it says "Not an ARRAY reference at line 77.". I'm no perl expert, so I don't know what's wrong.


Hi! The structure of the OPML feed has been improved, which has sadly broken my script :( will fix it and post and updated version here in a few days. Sorry about that.

Hi Liam,

Here's a patch for your perl script:

dug@spug:~/.mythtv/mythstream/parsers$ diff -u
--- 2009-02-06 23:41:21.000000000 +1100
+++ 2009-05-11 21:03:17.000000000 +1000
@@ -74,6 +74,9 @@
if ($outline_element->{outline}) {
# loop round children to find details for channel and its children
$channel{"subchannels"} = ();
+ if (ref($outline_element->{outline}) eq 'HASH') {
+ $outline_element->{outline} = [ $outline_element->{outline} ];
+ }
foreach my $subchannel (@{$outline_element->{outline}}) {
if ($subchannel->{text} eq $channel{"name"}) {
$channel{"url"} = $subchannel->{"htmlUrl"};

If $outline_element->{outline} only contained one element it was becoming a reference to a hash rather than an array... the above just detects this and sticks it in an array reference.

Thanks for the script... the only problem I have is I get about 2 seconds of each podcast's audio before it stops for unknown reasons... nothing in the log... but obviously your script is doing it's job since I can drill down the xml hierarchy.

P.S. The html editor seems to have taken out all the patch's indentation but the critical lines are prefaced with a "+". Hope that helps.


Hi Doug, thanks for fixing this! I've patched the script as suggested and uploaded the new one to be attached with this post.

Add new comment

Comments are always very welcome, but please note the following:
  • Sadly due to the high number of spam comments recently all comments are now manually moderated. You comment will therefore not appear on the site instantly.
  • Comments on this web site are monitored for spam using Mollom. By posting a comment, you accept that your message and other personal details about you will be analysed and stored for anti-spam and quality monitoring purposes, in accordance with Mollom's privacy policy.
  • Please use your own name not a company or website name to submit comments. Your comment will be removed if you don't do this.
  • All links in comments will be marked with a no follow attribute. That means posting a link to your site here won't help your search engine rankings.
  • By submitting a comment you agree that your comment can be reproduced under the same licensing terms as the rest of the content on the site.
  • Comments can be removed at any time without explanation, but won't be removed just because you disagreed with something I said.