Extreme mashup: Turning a text RSS feed into a radio station
One of the joys of using an open source operating system like Ubuntu is that you can experiment with all sorts of ideas and not worry about constantly purchasing software or coming up against artificial limitations. By chaining some open source packages together we can do some quite interesting things, so it is fun sometimes to try a challenge. In this post I will show you how to take a text RSS feed and make it into an Internet radio broadcast that can be received on a dedicated device, so instead of being stuck in front of a screen you can catch up with your RSS feeds while sunbathing in the garden! The solution here is not intended to be production ready, and might be tough going for beginners, but the idea is it will give a basic overview which you can then go and experiment with. I'll be using Icecast2 to stream the broadcast, Ices to feed Icecast2 with files to broadcast, Espeak to generate text to speech audio files and a small custom PHP script to convert the text feed into a format suitable for ESpeak.
To start find your self a nice RSS feed, prefereably with some HTML embedded in the body of each post. Here I'll be using the RSS feed of my blog which is http://www.greenhughes.com/rssfeed. We'll need to turn this feed into a series of SSML (Speech Synthesis Markup Language - an XML format representing speech output) documents which can be fed into Espeak and turned into audio files. The easiest way to manage all of this is to put together a script to process each item in the feed. I made an unconventional choice of language for the script - PHP, which is normally used on the server side of web applications, but also makes a great scripting language. It is also my favourite computer language! Before we start, we should install some prerequisites. I'm going to generate MP3 files at the end of this process as this is one of the formats my off-the-shelf Internet radio understands, you could also adapt this process to generate OGG format files, which don't have the same patent concerns, if you have a device or software that can receive this format. The first things to do are to install PHP for the command line and LAME which is software that will take the raw audio output from Espeak and turn it into an MP3 file. You can install the packages through Synaptic or type:
sudo apt-get install php5-cli lame
Now make a directory under your home directory and a directory to hold the generated files. You can do this easily from the command line (the ~ character is a shorthand way to refer to your home directory):
mkdir -p ~/rssradio/output
Rather than writing code to process the different formats of RSS web feeds ourselves, we can save time by getting this component from the open source world and letting that do the hard work. I used the SimplePie library for this purpose, it can do all sort of things with RSS feeds beyond the simple processing needed here, but it saves us a lot of work because we don't have to worry about the formats of individual feeds, instead we just ask SimplePie for the information we need. There is a version of SimplePie in the repositories, but unfortunately it appears to be broken, so instead download it from http://simplepie.org/downloads/ and extract the download to your new "rssradio" directory. With these foundations in place we can use a PHP script to process the incoming feed. The script is quite basic, so there is plenty of scope for tinkering and improvements. Copy the script below into a textfile named transform_rss.php and save it in your rssradio directory. You also need to run the command chmod u+x transform_rss.php to make sure the script is executable.
* Convert an RSS Feed into a spoken audio file
* By Liam Green-Hughes
require 'simplepie_1.2/simplepie.inc'; $url = 'http://www.greenhughes.com/rssfeed';
// Parse the RSS document with SimplePie
$feed = new SimplePie();
$feed->init(); // loop through items
foreach($feed->get_items() as $item)
$author = $item->get_author();
echo "Processing: ".$item->get_title()."\n";
// build the file name "author - title"
$filename = sprintf('output/%s - %s',
preg_replace("/[^a-zA-Z0-9\s]/", "", $author->get_name()),
preg_replace("/[^a-zA-Z0-9\s]/", "", $item->get_title()));
// output an SSML document
$fp = fopen($filename.".ssml", 'w');
fwrite($fp, "<?xml version='1.0'?>\n");
fwrite($fp, "<speak xmlns='http://www.w3.org/2001/10/synthesis'\n");
fwrite($fp, " xmlns:dc='http://purl.org/dc/elements/1.1/'\n");
fwrite($fp, " version='1.0'>\n");
fwrite($fp, " <metadata>\n<dc:title xml:lang='en'>");
// keep only p, em, b and i tags
$content = strip_tags($item->get_content(),'<p><em><b><i>');
// map the em, b, and i tags to SSML <emphasis> tag
$content = str_replace(array('<em>','<b>', '<i>'), '<emphasis>', $content);
$content = str_replace(array('</em>','</b>', '</i>'), '</emphasis>', $content);
// end the document
// now generate an audio file with espeak
shell_exec(sprintf('espeak -m -f "%s.ssml" -w "%s.wav"', $filename, $filename));
// convert to MP3 with lame
shell_exec(sprintf('lame --tt "%s" --ta "%s" "%s.wav" "%s.mp3"', $item->get_title(), $author->get_name(), $filename, $filename));
// delete the wav file as it is now not needed
The script breaks down as follows: line 7 imports the SimplePie library for later use, your should double check this line to make sure it refers to a valid file on your system, if you have a later version of SimplePie you may need to change it. The use of 'require' here means the script won't run if that files isn't found. Line 9 is the address of the RSS feed you wish to import, feel free to experiment by changing this to point to other feeds. Lines 10-13 initialise SimplePie and tell it what feed we would like it to process. Line 16 is the start of a loop which will process each of the items found in the feed. A lot of RSS feeds will provide you with something like the most recent ten posts from a blog, you can also specify parameters to get_items() to limit the number of items it will fetch - consult the SimplePie documentation for further details. Line 18 gets the author 'object' which we will use later on to get the name, there may be other information available though this object too, it depends on the feed. Line 21 is where we build up the file name (which gets reused for the various different types of files we generate), it will generate a string of the format "author - title" which will have non alphanumeric and space characters removed (the first parameter in the preg_replace function is a regular expression meaning 'anything not an alphanumeric character or space', the second is what this should be replaced by, which is nothing, I got this from http://newsourcemedia.com/blog/php-remove-non-alphanumeric-characters/) in order to avoid any problems with file names containing invalid characters.
On line 26 the fun starts with the start of the SSML file being output. I've taken a really simple approach here and just printed out a text file. You could also use something like SimpleXML here too. On lines 31-32 the SSML title metadata element is generated from the feed item's title. Line 35 removes all HTML elements from our feed item, apart from the <b> (bold), <i> (italic) and <em> (emphasis) tags which are replace in lines 37-38 with SSML's <emphasis> tag, so this is a very simple bit of processing to carry over the author's emphasis into the voice file. You could add here more processing to treat these elements differently, or map more HTML elements onto SSML tags (there is a lot of potential here to improve the voice output!). The SSML document for the feed item gets wrapped up on lines 41-42, and on line 44 we get Espeak to "read out" the SSML file we generated, the '-m' options tells it that the file it is being given has markup (it will also read plain text files), the '-f' option specifies the file and the '-w' option says to put the output in a WAV format file instead of reading it aloud. Note the file names given to Espeak are enclosed in double quotes to cope with any spaces in the file name. You could add an option to Espeak to change the voice or accent used in the generated output, this can also be done in a <voice> tag in a SSML file. A WAV format isn't suitable for broadcast on the Internet radio station we will set up later, so we must change it into another format, in this case MP3 and that is done with LAME on line 46. Note the use of the '--tt' and '--ta' options to embed the author and title metadata in the ID3 tags of the MP3 file that gets generated. In line 48 we remove the temporary WAV file and move on to the next feed item.
When the script has finished running you should end up with a series of SSML and MP3 files in the 'output' directory. Now it is time to move on to setting up our Internet radio station. In my experiment I just set this up on my own network and it didn't go outside. If you do set this up on a public server you should always make sure you are permitted to broadcast the material contained in every file you generate. Also if you plan to experiment with transmitting the radio station to a device like an Internet radio it is worth making sure that your broadcasting computer has a static IP address (an address on the network that does not change). Before continuing, ensure you have the Medibuntu repository set up in your system (it isn't available by default), instructions can be found on their site at: http://www.medibuntu.org/. To broadcast we need to set up Icecast2 which is server software that streams audio to our listeners over the Internet, and Ices a program that sends audio files to our Icecast2 server ('server' in this case could refer to a program running on your local machine, it doesn't have to be a separate computer). In a way Icecast2 is the radio transmission tower, capable of transmitting many stations, and Ices is providing an individual radio station. I looked at the instructions at: http://www.howtoforge.com/linux_webradio_with_icecast2_ices2 and in the book "Ubuntu Linux Toolbox" (ISBN: 978-0-470-08293-5) but modified the instructions to transmit MP3 rather than OGG files.
To get Icecast2 installed, start by running this at a terminal:
sudo apt-get install icecast2
Before starting up Icecast2 we need to change the passwords used to access some its restricted features. Open up the file /etc/icecast2/icecast.xml in a text editor (you will need to have root privileges so try sudo gedit /etc/icecast2/icecast.xml) and look for the section called <authenication>. Within it you will see three passwords defined, one for a "sources" password - this will be used by Ices to connect, one for an "admin" password - this can be used by you to log in to Icecast2 over the web and check information such as how many people are listening, and one for a "relay" password, functionality we are not using here. You should change all three and keep a note of what they are. Now edit the file /etc/default/icecast2 and change the line "ENABLE=false" to "ENABLE=true". Save, and you should now be able to start up Icecast2 by entering:
sudo /etc/init.d/icecast2 start
You should be able to check in on Icecast2 by entering http://localhost:8000/ in a browser (if you put it on another machine change the localhost part). We are nearly ready to transmit! It is time to set up Ices which will send the MP3 files we generated to Icecast2. Install Ices with the command:
sudo apt-get install ices
Now we need to get Ices configured and set up. Firstly set up a place for its logs files to go:
sudo mkdir /var/log/ices
Copy the example configuration file into your working directory and rename it (notice no sudo for this):
cp /usr/share/doc/ices/examples/ices.conf ices.conf
Open up the ices.conf file in a text editor, look for an entry called <Password> under <Stream>, you should change that to the source password you set up earlier in the Icecast2 configuration file. You should also change the <Name>, <Description> and <Genre> tags, these are all up to you, but "Talk" might be a good choice for Genre. Make a note of the value for the setting <Mountpoint> (it should be '/ices') you will need this later to connect to your radio station). We can leave the other settings as they are. We now just need to create a playlist for Ices to go through so it knows what files to send (once it gets to the end of the playlist it will continue from the top). Ices needs full path names with the files, so remember to pass that to this command:
find [full path to your directory e.g. /home/liamgh/rssradio] -name "*.mp3" > playlist.txt
The play list is just a text file, so if you want to experiment later you could easily add in other MP3 files, such as jingles, or maybe mix in some existing podcasts. You can now start up your radio station with:
ices -c ices.conf -F playlist.txt
You should be now able to listen to your radio station at: http://localhost:8000/ices. Once way to listen to the stream is to use Rhythmbox, go to Music -> New Internet Radio Station and enter the address. When you click 'play' you should hear the computerised voice, and see the author and title information displayed. If you have an Internet radio, you might even be able to listen to your new radio station on that. The model I have lets you add new radio station streams to it by logging on to a website provided by the manufactures. I found that this doesn't check the feed, so it didn't mind me adding an address that was only available inside my network, so now I can listen to RSS feeds on my radio!
This mashup is maybe slightly more complex that some other mashups you will see, but it is made much simpler by having the software we need packaged and easily installable, so all we really have to do is work out how to glue the different pieces together. It also demonstrates how useful RSS feeds are, and gives a hint of the many ways they allow reuse of a site's content, their usefulness is not limited to just a feed reader. Most importantly though it shows what can be done with some imagination and the vast array of tools open source software provides.