From www@deja.com  Sun Sep 10 15:25:47 2000
Message-ID: <8ph1q8$ekq$1@nnrp1.deja.com>
From: oleg@pobox.com
Subject: Sendmail as a Turing machine
Date: Sun, 10 Sep 2000 22:26:33 GMT
Reply-To: oleg@pobox.com
Keywords: sendmail, Turing machine, rewriting system, string rewriting
Newsgroups: comp.lang.functional,comp.mail.sendmail
Summary: Rewriting rules to make sendmail to act as a Turing machine
X-Article-Creation-Date: Sun Sep 10 22:26:33 2000 GMT
Status: OR

Many people have tried to get sendmail to do many wonderful things,
preferably on behalf of the root. This article is not about cracking
and other unseemly activities. We will show however that sendmail as
it stands is indeed capable of executing every possible computation.

Turing machine is a finite control unit operating on a modifiable tape
that has the beginning but no end. Sendmail's (address) re-writing
engine works on a potentially unbounded string. The engine is
controlled by a finite set of rules. The content of the string as well
the previously fired rule (the state) determine the rule to be
executed next. A rule may modify the string-tape.  It appears there is
the perfect match between sendmail and the Turing machine. This
article will show how to get sendmail to operate as a Turing machine,
and give examples of two Turing programs: addition of natural numbers
and a decision function (evenp/oddp).

We will model Turing machine's tape with a character string. Just as
the tape is made of cells, the string will be made of 'tokens'
separated by a character '.' (a single dot). A token is a sequence of
one or more alphanumeric characters; characters @ % ! [ ] may not
appear in a token. A character '@' has a special meaning: it denotes a
blank tape cell. The infinite tape is thus modeled by a finite string
of tokens followed by an _assumed_ infinite sequence of blank cells.
We will never write out this tail of blanks explicitly; rather we will
assume that there is always a blank cell after the right end of every
finite string under re-writing. A character '%' also has a special
meaning: it represents Turing machine's 'read/write head'. That is, %
placed before a token marks the token as current -- the one to be
analyzed by rules and perhaps overwritten.  For example, a string
@.a.bc.%def stands for a Turing machine configuration with a tape
containing a blank, three symbols, and an infinite sequence of
blanks. The read/write head is over the symbol 'def'. A string I.I.%@
represents a tape with two non-blank symbols ('I') at the very
beginning; the head is pointing to the first blank after them.

To run the Turing machine, we first need to locate the sendmail
program and its configuration file. The application itself can usually
be found in /usr/sbin/ or /usr/lib/ (a 'locate' command may help). The
configuration file for sendmail v8.8+ is located in
/etc/mail/sendmail.cf. We will use version 8.9.3 as it allows named
rulesets, which are rather convenient. Besides, if your site is
running sendmail v8.8 or below, your system administrator should be
sued for negligence and reckless disregard of system's
security. Sendmail configuration file is the place for, among other
things, rewriting rules -- the finite control of the Turing
machine. It is quite a hassle to write a correct configuration file
from scratch. Therefore, we will use an existing file, and add our own
rules to it. Of course all the changes will be made to a copy of the
configuration file rather than to /etc/mail/sendmail.cf itself.

Thus we will start by copying /etc/mail/sendmail.cf to a working
directory, e.g., /tmp. We will open /tmp/sendmail.cf in a text editor,
scroll to the very end and add the following lines:

# Move the head to the left
Smove_left
R$* @ . %	\t $@ $1 % @
R$* $- . % @	\t $@ $1 % $2
R$* $- . % $*	\t $@ $1 % $2 . $3
R$*		\t $#error underflow: $1

# Move the head to the right
Smove_right
R$* % $- . $+	\t $@ $1 $2 . % $3
R$* % $-	\t $@ $1 $2 . %@   \t there is always blank at the right
R$* %		\t $@ $1 @ . %@	   \t edge of the tape
R$*		\t $#error no-head: $1



It must be stressed that sendmail -- along with make, syslogd and a
few other UNIX programs -- distinguishes between spaces and
tabs. Although a tab looks like a sequence of blanks, its meaning is
very different. Apparently UNIX founding fathers had terminals that
somehow graphically displayed tabs. In the rules above, spaces are
written as they are, but tabs are denoted as \t. You have to press a
tab key wherever you see this two-character combination.

All the gory details about the rules are explained in a manual
	http://www.sendmail.org/~ca/email/doc8.9/op-sh-5.html#sh-5
Just for reference, sendmail rewriting engine applies a ruleset to a
string. A line Sxxx begins a ruleset named 'xxx', where xxx is a
number or an identifier. A ruleset consists of rules, lines of the
form
   R lhs \t rhs \t comments
The lhs is a pattern made of alphanumeric characters (which must match
the input string literally) and meta-symbols. The characters and
meta-symbols may be separated by spaces. We will use the following
meta-symbols in the lhs:
	  $* matches zero or more tokens of the input string
	  $+ matches one or more tokens
	  $- matches exactly one token

When a ruleset is applied to a string, the rewriting engine scans the
rules starting with the first one, trying to match their lhs patterns
against the input string. The first rule that matches becomes the
active one. The engine then replaces the input string with the rhs of
the active rule, after some processing.  $n combinations in the rhs
(where n is a digit 1..9) are replaced with whatever the n-th
meta-symbol ($+, $-, $*) of the lhs has matched.  The $>name syntax
causes the remainder of the line to be substituted as usual and then
passed as the argument to ruleset 'name'.  The final value of ruleset
'name' then becomes the substitution for the active rule. After the
input string is replaced with the rhs of the active rule, the active
rule is retried against the new current string. If the rule does not
match, the next rule is tried. If the ruleset has no more rules, the
current string becomes the result of the ruleset application.

Symbols $@ or $: placed at the beginning of the rhs change the default
behavior of retrying the active rule until it fails. A $@ prefix
causes the ruleset to return with the remainder of the rhs as the
value. A $: prefix tells the engine not to retry the active rule:
after the rhs replacement is performed, the engine should continue
processing of the ruleset starting with the next rule after the active
one. Prefix $: therefore can be used to avoid continued application of
a rule.

The rules that we have added to /tmp/sendmail.cf above move the "head
of the tape" left or right. To test the rules, we enter at the command
prompt

	/usr/sbin/sendmail -C /tmp/sendmail.cf -bt -v
and then enter the following test at the sendmail's prompt:
    move_left a.b.%c
That is, we are asking sendmail to apply the ruleset 'move_left' to a
string "a.b.%c". Sendmail prints that the ruleset returns a string "a
. % b . c" Indeed, the read-write head denoted by the symbol % moved
left. Note that the space character is universally ignored. Although
sendmail accepts named rules, it prints them out in a trace as numbers
(e.g., ruleset 183). Other tests to try (the expected result follows
the => sign)

	move_left %	=> $# error underflow : %
	move_left @.%	=> % @
	move_left a.@.% => a . % @
	move_right %	=> @ . % @
	move_right %b.c => b . % c
	move_right @.%b => @ . b . % @

move_left,move_left,move_right,move_right,move_right,move_right,move_right
a.b.%c.d => a . b . c . d . @ . % @

You can download the sendmail.cf file with the rules already entered
from
	http://pobox.com/~oleg/ftp/packages/sendmail.cf
The comments at the end of the file point out to more tests to run.


Now we can run real Turing programs.

Program 1: Addition of natural numbers. Two numbers to add are encoded
in unary, in the following format
   @ n1 @ n2 %@
where n1 is a sequence of 0 or more tokens 'a' corresponding to the
first number to add (in unary notation), n2 stands for the second
number. The numbers are terminated with a blank cell; the head is at
the blank cell that terminates the second number. The algorithm is
simple: move the head left until it hits the blank separating the two
numbers, replace this blank with token 'a', move the head right to the
first blank, replace the previous token with the blank, and
stop. Here's the corresponding program

Sadd
R$+. %	        \t $: $1.%@
R$+. %@ $*	\t $: $>move_left $1.%@$2
R$+. %a $*	\t $>move_left $1.%a$2	    \t until the first blank
R$+. %@ $*	\t $: $>move_right $1.%a$2  \t replace blank with 'a'
R$+. %a $*	\t $>move_right $1.%a$2	    \t until the first blank
R$+. %@ $*	\t $: $>move_left $1.%@$2
R$+. %$- $*	\t $@ $1.%@$3		  \t overwrite $2 with a blank

If we add these rules to the /tmp/sendmail.cf file above, run
/usr/sbin/sendmail -C /tmp/sendmail.cf -bt -v and enter
	add @.a.a.a.@.a.a.%@
at the sendmail's prompt, we will see (after a long trace of applied
rulesets)
	  @ . a . a . a . a . a . % @
That is, 3 + 2 is 5. One of the most expensive ways to add two numbers
does seem to work after all.

Other tests to try:
      add @.a.a.@.%@ => @ . a . a . % @  (that is, 2 + 0 = 2)
      add @.@.a.a.%@ => @ . a . a . % @  (that is, 0 + 2 = 2)
      add @.@.%@     => @ . % @ (that is, 0 + 0 = 0)

Program 2: A decision machine. Decide if the number of tokens on the
tape between two delimiting blanks is even or odd. The program is
self-explanatory.

Sevenp
R$* %@ $*       \t $: $>move_left $1%@$2
R$* %@ $*	\t $: $>move_right $1%@$2
R$* %@ $*	\t $@ $>move_right $1 %Yes $2   \t set the answer, halt
R$* %$- $*	\t $@ $>oddp $1%@$3	 \t overwrite $2 with a blank

Soddp
R$* %@ $*	\t $: $>move_left $1%@$2
R$* %@ $*	\t $: $>move_right $1%@$2
R$* %@ $*	\t $@ $>move_right $1 %No $2	\t set the answer, halt
R$* %$- $*	\t $@ $>evenp $1%@$3	 \t overwrite $2 with a blank

In the rules above, a rhs prefix "$: $>ruleset" effectively means a
jump (switch) to the ruleset named 'ruleset'. $@ rhs prefix means
'halt'.

Tests:

evenp @.a.a.a.%@ => @ . No . % @ (three tokens: an odd number)
oddp @.www.sendmail.org.%@  => @ . Yes . % @
evenp @.%@ => @ . Yes . % @   (zero is an even number)
oddp  @.%@  => @ . No . % @
evenp @.prep.ai.mit.edu.%@ => @ . Yes . % @ (four is an even number)

Again, you can download the sendmail.cf file with the rules already
entered from
	http://pobox.com/~oleg/ftp/packages/sendmail.cf

A practical aspect of this article, if required, can be formulated as
follows: one may wish for a tool to check sendmail.cf file. For
example, to see if a given set of re-writing rules yields a result for
_any_ input string. Alas, as the discussion in this article shows,
such a general tool is impossible.