TkTTS: Tk Text To Speech

This script makes listening to journal articles and technical books easier. It is basically a collection of all the things I found myself having to do manually in order to get text to speech programs pronouncing technical (lots of greek symbols) and mathematical notation (things raised to powers, order of operations, names of operators) correctly. I even added a simple routine that slices out the references sections from papers and books between chapters.

TkTTS is a Gnome2/Nautilus-script GUI frontend for text to speech apps with a few extra text processing utilities. It's not very user friendly, and has lots of hard coded bits and dependencies.

Put it in ~/.gnome2/nautilus-scripts and make it executable. Alternately you can forget the Tk GUI entirely and use it on the command line by putting it in your PATH and then calling TkTTS.pl with the path to a file but unless you comment them out it still has perl/Tk depends.

./TkTTS.pl /home/superkuh/library/somedocument.pdf

It wraps and calls 3rd party system utilities to do the heavy lifting for the following formats: pdf, dvju, ps, epub, html, txt, and the linux clipboard.

TkTTS.pl

Festival of the apropriate version, Calibre, and DjVuLibre might not be in the system repositories. Besides them for ubuntu,

sudo apt-get install perl-tk xpdf-utils pstotext html2text

I'll try to use PAR::Packer in the future to bundle working perl modules. But until then the non repository perl dependencies are listed in the section of the script shown below.


Using TkTTS and Festival 1.96 on Ubuntu 12.04 w/MATE desktop

I recently bought a new computer and put the latest Ubuntu LTS version on it. But the latest Ubuntu ships with a horrible desktop environment and Gnome has jumped the shark so I was required to install MATE desktop instead. This meant giving up Nautilus as a file manager and Nautilus bindings are what TkTTS uses to know which files to TTS. Porting to the Caja filemanager was fairly painless though and you can download a Caja compatible copy of TkTTS below.

TkTTS_caja.pl

Unfortunately the version of Festival and the associated libs that ships with Ubuntu Precise are not compatible with the existing high quality voices. It is best just to use Festival 1.96 and the old voices. To do this you just have to remove the Festival version from the Ubuntu repos and manually download and install the old packages. stactrac's post on the Ubuntu Forum's Festival thread got me started. Here's a list of what I needed to install,

http://packages.ubuntu.com/lucid/amd64/libestools1.2/download
http://packages.ubuntu.com/precise/all/festlex-poslex/download
http://mirror.pnl.gov/ubuntu//pool/universe/f/festlex-cmu/festlex-cmu_1.4.0-6_all.deb
http://launchpadlibrarian.net/37213705/libaudiofile0_0.2.6-8ubuntu1_amd64.deb
http://launchpadlibrarian.net/35363331/festival_1.96%7Ebeta-10ubuntu1_amd64.deb

Once you've installed the old packages you can install the high quality Festival voices from the Ubuntu Forums thread just like before.


Perl "use ..." and system utility dependencies

The code snippet below gives an idea of the Perl modules and third party programs that might be needed.

use strict;
use warnings;
use Encode; 
use charnames':short';
use Clipboard;
use Tk; # from perl-tk in repositories
use Tk::Menu;
use Tk::Pane; # in Tk::
use List::Util qw( reduce );
use Parallel::ForkManager;
my $manager = new Parallel::ForkManager( 4 );

#CONFIG################################################################
my $norefs = 0; # a very simple attempt to remove references from the end, default 0 off.
my $wordreplace = 0; # scientific notation pronounciation fixes, default 0 off.
my $defaultdocviewer = 'evince'; # xpdf, okular?
my $epubdocviewer = "fbreader"; # calibre
my $htmldocviewer = "true"; # opera? firefox? iceweasel? chromium? nothing(true)? xdg-open?
my $editor = "gedit"; # mousepad, vim, emacs, etc
my $tts = 'festival --tts'; # festival
#my $tts = "swift -f"; # cepstral swift
#my $tts = "flite -f"; # festival lite
#my $tts = "espeak -ven+f4 -p 70 -f"; # espeak is installed in lots of distros
my $pdftotext = 'pdftotext';
my $djvutotext = 'djvutxt';
my $pstotext = 'ps2ascii';
my $epub2txt = 'ebook-convert'; # calibre is best for epub
my $html2text = 'html2text';
#my $html2text = 'html2text -width 140';
my $tempfilepath = '/tmp/tts_temp.txt'; # probably shouldn't change the name, dir change is fine.
my $epubtempdir = '/tmp/epub2text'; # if you change this, be careful, I do a `rm -rf $epubtempdir`.
my $homedir = '~/.tktts/';
my $webpage = '/some/dir/here/filename.html'; 
my $makewebpage = 0; # leave this at 0 unless you're me/superkuh.
my $titlerepeatremoval = 0; # for any line >10 chars long, remove all instances after the 10th repetition

Reference removal. Sort of.

And here's a rushed, buggy, fuction the removes reference sections from technical papers and books. It works for almost all documents but there are rare false positives that'll remove more than intended.

sub filterreferences {
	my $texttoedit = shift;
	# whenever "references" follwed by a newline is encountered discard all follwing lines until 
	# encountering words like chapter, introduction, section, or abstract that indicate the start
	# of new content. This fails in ~10% of cases but it's really helpful for the other 90%.
	my $fixedtext;
	my $inreferencesstate = 0;
	my @lines = split(/\n/, $texttoedit);
	foreach my $line (@lines) {
		if ($line =~ /(chapter|introduction|section|abstract|appendix)/i) {
			$inreferencesstate = 0;
		} elsif ($line =~ /references\s?$/i) { #only if there's nothing after references like its
			$inreferencesstate = 1;        #a heading of a section.
		}
		$fixedtext .= "$line\n" unless $inreferencesstate;	
	}
	$texttoedit = $fixedtext;

	# Perhaps remove all (.+\d{4}), to remove inline references. But how to be sure?
	
	return $texttoedit;
}

Spaceweather

Interests

Other

Photos

Good Books

Member of The Internet Defense League

Legal Bullshit

DMCA Requests

Terms of Use:

You may not access or use the site superkuh.com unless you are under 7 years of age. If you do not agree then you must leave now.

The US Dept. of Justice has determined that violating a website's terms of service is a felony under CFAA 1030(a)2(c). Under this same law I can declare that you may only use one IP address to access this site; circumvention is a felony. Absurd, isn't it?

It is my policy to regularly delete server logs.