TkTTS: Tk Text To Speech

This script makes listening to journal articles and technical books possible. It is basically a collection of all the things I found myself having to do manually in order to get text to speech programs pronouncing technical (lots of greek symbols) and mathematical notation (things raised to powers, order of operations, names of operators) correctly. I even added a simple routine that slices out the references sections from papers and books between chapters.

A Gnome2/Nautilus-script GUI frontend for text to speech apps with a few extra text processing utilities. It's not very user friendly, lots of hard coded bits. Put it in ~/.gnome2/nautilus-scripts and make it executable. Alternately you can forget the Tk GUI entirely and use it on the command line by putting it in your PATH and then calling TkTTS.pl with the path to a file.

Relying on 3rd party system utilities to do the heavy lifting, it converts the following formats to TTS compatible words: pdf, dvju, ps, epub, html, txt, and the clipboard.

TkTTS.pl

Using TkTTS and Festival 1.96 on Ubuntu 12.04 w/MATE desktop

I recently bought a new computer and put the latest Ubuntu LTS version on it. But the latest Ubuntu ships with a horrible desktop environment and Gnome has jumped the shark so I was required to install MATE desktop instead. This meant giving up Nautilus as a file manager and Nautilus bindings are what TkTTS uses to know which files to TTS. Porting to the Caja filemanager was fairly painless though and you can download a Caja compatible copy of TkTTS below.

TkTTS_caja.pl

Unfortunately the version of Festival and the associated libs that ships with Ubuntu Precise are not compatible with the existing high quality voices. It is best just to use Festival 1.96 and the old voices. To do this you just have to remove the Festival version from the Ubuntu repos and manually download and install the old packages. stactrac's post on the Ubuntu Forum's Festival thread got me started. Here's a list of what I needed to install,

http://packages.ubuntu.com/lucid/amd64/libestools1.2/download
http://packages.ubuntu.com/precise/all/festlex-poslex/download
http://mirror.pnl.gov/ubuntu//pool/universe/f/festlex-cmu/festlex-cmu_1.4.0-6_all.deb
http://launchpadlibrarian.net/37213705/libaudiofile0_0.2.6-8ubuntu1_amd64.deb
http://launchpadlibrarian.net/35363331/festival_1.96%7Ebeta-10ubuntu1_amd64.deb

Once you've installed the old packages you can install the high quality Festival voices from the Ubuntu Forums thread just like before.


The code snippet below gives an idea of the Perl modules and third party programs that might be needed.

use strict;
use warnings;
use Encode;
use charnames':short';
use Clipboard;
use Tk;
use Tk::Menu;
use Tk::Pane; # in Tk::
use List::Util qw( reduce );
use Parallel::ForkManager;
my $manager = new Parallel::ForkManager( 4 );

#CONFIG################################################################
my $norefs = 0; # a very simple attempt to remove references from the end, default 0 off.
my $wordreplace = 0; # scientific notation pronounciation fixes, default 0 off.
my $defaultdocviewer = 'evince'; # xpdf, okular?
my $epubdocviewer = "fbreader"; # calibre
my $htmldocviewer = "true"; # opera? firefox? iceweasel? chromium? nothing(true)?
my $editor = "gedit"; # mousepad, vim, emacs, etc
my $tts = 'festival --tts'; # festival
#my $tts = "swift -f"; # cepstral swift
#my $tts = "flite -f"; # festival lite
#my $tts = "espeak -ven+f4 -p 70 -f"; # espeak is installed in lots of distros
my $pdftotext = 'pdftotext';
my $djvutotext = 'djvutxt';
my $pstotext = 'ps2ascii';
# my $epub2txt = 'epub2text'; # not needed, I built in functionality to TkTTS.
my $epub2txt = 'ebook-convert';
my $html2text = 'html2text';
#my $html2text = 'html2text -width 140';
my $tempfilepath = '/tmp/tts_temp.txt'; # probably shouldn't change the name, dir change is fine.
my $epubtempdir = '/tmp/epub2text'; # if you change this, be careful, I do a `rm -rf $epubtempdir`.
my $homedir = '~/.tktts/';
my $webpage = '/home/superkuh/www/islisteningto.html'; 
my $makewebpage = 0; # leave this at 0 unless you're me/superkuh.
my $titlerepeatremoval = 0; # for any line >10 chars long, remove all instances after the 10th repetition

And here's an example fuction the removes reference sections from technical papers and books. It works for almost all documents but there are rare false positives that'll remove more than intended.

sub filterreferences {
	my $texttoedit = shift;
	# whenever "references" follwed by a newline is encountered discard all follwing lines until 
	# encountering words like chapter, introduction, section, or abstract that indicate the start
	# of new content. This fails in ~10% of cases but it's really helpful for the other 90%.
	my $fixedtext;
	my $inreferencesstate = 0;
	my @lines = split(/\n/, $texttoedit);
	foreach my $line (@lines) {
		if ($line =~ /(chapter|introduction|section|abstract|appendix)/i) {
			$inreferencesstate = 0;
		} elsif ($line =~ /references\s?$/i) { #only if there's nothing after references like its
			$inreferencesstate = 1;        #a heading of a section.
		}
		$fixedtext .= "$line\n" unless $inreferencesstate;	
	}
	$texttoedit = $fixedtext;

	# Perhaps remove all (.+\d{4}), to remove inline references. But how to be sure?
	
	return $texttoedit;
}

Spaceweather

Interests

Other

Photos

Good Books

Member of The Internet Defense League

Legal Bullshit

DMCA Requests

Terms of Use:

You may not access or use the site superkuh.com unless you are under 7 years of age. If you do not agree then you must leave now.

The US Dept. of Justice has determined that violating a website's terms of service is a felony under CFAA 1030(a)2(c). Under this same law I can declare that you may only use one IP address to access this site; circumvention is a felony. Absurd, isn't it?

It is my policy to regularly delete server logs.