We're hearing computers speak to us more and more these days. It might be the satnav system in your car, the telephone exchange or screen reader software. One development admittedly passed me by though, that is computers with accents!
Normally when we hear speech synthasis it will either be in an American accent or a very clear British accent (known as BBC English). Recently, things have been slowly changing. My friend Spiky recently acquired a Tom Tom satnav system, which enables you to select the voice you wish to guide you. It could be "Austrailian Ken", "Irish Kathy" or even John Cleese. Speech synthesis is much more difficult when the phrases being spoken are less predictable. Satnav systems only need a few phrases to be recorded, but a computer Text to Speech system needs to cope with a whole lot more.
So for bit of fun to start your week, wouldn't it be a good idea to try this in Linux? Don't worry you can try this without having to borrow a friend with a strong accent to make your own version of "Australian Ken"; instead we can use the a open source speech system called Espeak, which may well be available for favourite Linux distribution. There are a few speech sythesis systems available for Linux now, not just Espeak but also Festival from the University of Edinburgh and Epos from Charles University, Prague, and the Academy of Sciences, Prague, which is a text to speech engine for the Czeck and Slovak languages. The availablily of text to speech systems for Linux may prove very interesting to those building in-car systems or those who dream of being able to interrogate a computer through conversation.
Espeak attracted my attention initially as it is a relatively compact implementation of a text to speech engine, weighing in at about one megabyte. This make it a ideal candidate to run on the Asus EEE PC, an ultramobile device packing a lot of power for its size, but restricted in terms of internal storage. However, you can still try this out on any platform that Espeak will run on many versions of Linux, RISC OS, and even Windows.
Firstly, if you want to try this on the Asus EEE you will need to enable the system to get software from more sources (known as repositories) as the standard set of repositories do not include Espeak. To do this I followed the instructions on the EEEuser Wiki: http://wiki.eeeuser.com/addingxandrosrepos. Be especially careful to follow all of the instructions, particularly the section on Pinning Your System to avoid making changes to your system that might damage the operating system. After you complete these steps you might have to use the command sudo apt-get update to update the list of software available (you can execute these commands in a terminal window, which you reach by pressing Ctrl+Alt+T in easy mode). Now you can install the software with the command sudo apt-get install espeak.
Espeak offers many options to process input and turn it into speech. It can produce output in male and female voices, different accents and can also process text in many different languages. For a full set of options type man espeak. A simple way to get going is to type a few words in quotation marks after the command, for example: espeak "hello world!". Other useful options include --voices to list the available voices and -s to adjust how many words spoken per minute. To make espeak read out the contents of a file use the -f option followed by the file name. So for an example, if you had a file named mytext.txt you could get this file read out to you in a female Lancashire accent by using the command:
espeak -v en/en-n-f-s 120 -m -f mytext.txt
The -s 120 options slows down Espeak from its normal rate of 160 words per minute, which can sound a bit quick, to 120 words per minute. The -m option warns espeak that the file might contain markup (you'll need this option if you saved a file from the web and want espeak to read it out).
I got Espeak to produce a version of this blog posting from a text file, but instead of outputting it to speakers I used the option to output to a wav format file. The file was then converted to MP3 format using lame, This entire post, including generation of the speech and MP3 file encoding was done on a retail specification Asus EEE PC. Click here to listen to how it sounds.
It's fun to hear your computer talk, and this technology has many uses, not just in the world of accessibility, but to turn written information into audio which can be consumed on the move.