This script makes listening to journal articles and technical books possible. It is basically a collection of all the things I found myself having to do manually in order to get text to speech programs pronouncing technical (lots of greek symbols) and mathematical notation (things raised to powers, order of operations, names of operators) correctly. I even added a simple routine that slices out the references sections from papers and books between chapters.
A Gnome2/Nautilus-script GUI frontend for text to speech apps with a few extra text processing utilities. It's not very user friendly, lots of hard coded bits. Put it in ~/.gnome2/nautilus-scripts and make it executable. Alternately you can forget the Tk GUI entirely and use it on the command line by putting it in your PATH and then calling TkTTS.pl with the path to a file.
Relying on 3rd party system utilities to do the heavy lifting, it converts the following formats to TTS compatible words: pdf, dvju, ps, epub, html, txt, and the clipboard.
TkTTS.plThe code snippet below gives an idea of the Perl modules and third party programs that might be needed.
use strict; use warnings; use Encode; use charnames':short'; use Clipboard; use Tk; use Tk::Menu; use Tk::Pane; # in Tk:: use List::Util qw( reduce ); use Parallel::ForkManager; my $manager = new Parallel::ForkManager( 4 ); #CONFIG################################################################ my $norefs = 0; # a very simple attempt to remove references from the end, default 0 off. my $wordreplace = 0; # scientific notation pronounciation fixes, default 0 off. my $defaultdocviewer = 'evince'; # xpdf, okular? my $epubdocviewer = "fbreader"; # calibre my $htmldocviewer = "true"; # opera? firefox? iceweasel? chromium? nothing(true)? my $editor = "gedit"; # mousepad, vim, emacs, etc my $tts = 'festival --tts'; # festival #my $tts = "swift -f"; # cepstral swift #my $tts = "flite -f"; # festival lite #my $tts = "espeak -ven+f4 -p 70 -f"; # espeak is installed in lots of distros my $pdftotext = 'pdftotext'; my $djvutotext = 'djvutxt'; my $pstotext = 'ps2ascii'; # my $epub2txt = 'epub2text'; # not needed, I built in functionality to TkTTS. my $epub2txt = 'ebook-convert'; my $html2text = 'html2text'; #my $html2text = 'html2text -width 140'; my $tempfilepath = '/tmp/tts_temp.txt'; # probably shouldn't change the name, dir change is fine. my $epubtempdir = '/tmp/epub2text'; # if you change this, be careful, I do a `rm -rf $epubtempdir`. my $homedir = '~/.tktts/'; my $webpage = '/home/superkuh/www/islisteningto.html'; my $makewebpage = 0; # leave this at 0 unless you're me/superkuh. my $titlerepeatremoval = 0; # for any line >10 chars long, remove all instances after the 10th repetition
And here's an example fuction the removes reference sections from technical papers and books. It works for almost all documents but there are rare false positives that'll remove more than intended.
sub filterreferences {
my $texttoedit = shift;
# whenever "references" follwed by a newline is encountered discard all follwing lines until
# encountering words like chapter, introduction, section, or abstract that indicate the start
# of new content. This fails in ~10% of cases but it's really helpful for the other 90%.
my $fixedtext;
my $inreferencesstate = 0;
my @lines = split(/\n/, $texttoedit);
foreach my $line (@lines) {
if ($line =~ /(chapter|introduction|section|abstract|appendix)/i) {
$inreferencesstate = 0;
} elsif ($line =~ /references\s?$/i) { #only if there's nothing after references like its
$inreferencesstate = 1; #a heading of a section.
}
$fixedtext .= "$line\n" unless $inreferencesstate;
}
$texttoedit = $fixedtext;
# Perhaps remove all (.+\d{4}), to remove inline references. But how to be sure?
return $texttoedit;
}
You may not access or use the site superkuh.com unless you are under 7 years of age. If you do not agree then you must leave now.
The US Dept. of Justice has determined that violating a website's terms of service is a felony under CFAA 1030(a)2(c). Absurd, isn't it?
I enjoy recursion, dissipating local energy gradients, lipid bilayers, particle acceleration, heliophysics instrumentation and generally anything with a high rate of change in current. This site is a combination of my efforts to archive what I find interesting and my shoddy attempts to implement the aformentioned without a budget.
I get all email sent to anything @superkuh.com
TorChat: fri6mj44l5bujjyp
This site was previously located at superkuh.ath.cx, but that subdomain system was shut down.
Then it was at superkuh.com for a while until all data was lost. Now it's back, same place, much less content.
superkuh.bit on namecoin DNS.