Understanding quine-central
: the source code
This is an explanation of the source code of
quine-central
, a program that produces quine loops. A quine is a program that
prints its own source code. A quine loop is an extension of the same
concept to multiple programs. Each program prints the source code of
the next program in the loop. If you follow the loop far enough, you
get back to the program you started with!
This post's prequel (linked in the left arrow above this post)
explains why the programs generated by quine-central
are
quine loops. This post mainly explains how
quine-central
generates those programs. It might be
useful to read the sequel before reading this post, to understand the
motivation behind some of the parts of this code. I've still tried to
include that context where necessary.
quine-central
generates a quine loop containing programs
in eleven different languages. For simplicity, this article usually
only includes the code for generating programs in three languages:
Haskell, JavaScript, and Racket.
First we define a type with an option for each language we want to generate quines for.
data Languages = Haskell
| Javascript
| Racket
| ...
Each program in the quine loop has two lines. The first line always
defines a function q
and the second line always calls
that function. Let's start by generating the code for the second line
of the output program, the call to q
. In some languages
we need to wrap this call in a main method. In others we can leave it
at the top level.
defn
returns the portion of the second line before the
arguments list, endDefn
the portion after.
-- Start the main part of program
defn Haskell = "main = q \""
defn Javascript = "q(\""
defn Racket = "(q \""
...
-- End main part of program
endDefn Haskell = "\""
endDefn Javascript = "\");"
endDefn Racket = "\")"
...
Note that each result of defn
ends in a quotation mark
while each result of endDefn
starts with one. That's
because each argument passed to q
in the generated code
is a string.
To separate each pair of arguments, we need a close quotation mark to mark the end of the first argument, the character that separates function arguments (a comma in most languages, a space in others), and an open quotation mark to mark the beginning of the next argument.
divider Haskell = "\" \""
divider Javascript = "\",\""
divider Racket = "\" \""
...
driver
takes a language and a list of arguments, then
generates the second line of the output program for that language. For
example, driver Haskell ["foo", "bar", "quux"]
returns a
string containing main = q "foo" "bar" "quux"
,
driver Javascript ["a", "b", "c", "d"]
one containing
"q("a", "b", "c", "d")
.
driver l args = defn l ++
intercalate (divider l) args ++
endDefn l
Now let's generate the first line of the output program, the
definition of q
. In all languages except Perl, which has
implicit function arguments, we'll name q
's arguments
a0
, a1
, ... a{n-1}
(where
{n-1}
is replaced with the number n - 1
, one
less than the number of languages in the loop).
The helper function paramList'
takes the numbers
[0..n-1]
that should be added to the end of each
argument, then builds an argument list for the variables
a0
to a{n-1}
that we'll later insert into a
definition of q
. For example,
paramList Haskell 3 = "a0 a1 a2"
and
paramList Javascript 5
= a0,a1,a2,a3,a4
. In
C and Java we must explicitly specify that each argument is a string.
paramList' Haskell = intercalate " " . map (\n -> "a" ++ show n)
paramList' Javascript = intercalate "," . map (\n -> "a" ++ show n)
paramList' Racket = intercalate " " . map (\n -> "a" ++ show n)
paramList' C = intercalate "," . map (\n -> "char *a" ++ show n)
...
-- Generate a list or arguments to a function such as "a0,a1,..."
paramList Perl _ = ""
paramList lang n = paramList' lang [0..n-1]
Now we need to take a detour. Let's look at the program that a
simplified version of quine-central
produces when Haskell
is the only language in the loop:
q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"\nmain = q \""++a0++"\"""
This program won't compile because the string passed to
q
on the second line contains unescaped double quotes.
Let's escape them:
q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++\"\nmain = q \\\"\"++a0++\"\\\"\""
We could get quine-central
to escape them for us. The
problem is that this is no longer a quine. It prints:
q a0=putStrLn $ a0++"
main = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"
main = q \""++a0++"\"""
Looks like we might need to replace \n
with
\\n
on the second line. We don't want to print a real
newline, just \n
. If we make that change, the program
prints:
q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"\nmain = q \""++a0++"\"""
Still not a quine! The first line of the output and source code are the same, but the second line lost most of its escaping.
The problem is that, when q
prints a0
for
the second time, it doesn't re-escape the characters that were escaped
on the second line of the original program. We could probably handle
this by adding character escaping logic for each language, but I
suspect the problem becomes even more complicated with multiple
languages. For example, the Perl program wraps the arguments passed to
q
in single quotes. This means that we
don't need to escape double quotes when printing that string
in from other languages.
quine-central
's author found a simpler solution: Don't
print any characters that need to be escaped in the first place.
Instead, quine-central
prints code that builds a string
by converting the ASCII codes for the characters in the string into
characters, then joining those characters together. The
sequenceFromString
function generates this code. For
example,
sequenceFromString Haskell "Hello, world\n" = "map toEnum
[72,101,108,108,111,44,32,119,111,114,108,100,10]"
. Note how the result of sequenceFromString
doesn't
contain any characters that have to be escaped.
-- Generate code to emit individual characters comprising string.
-- Used to eliminate escape character issues.
sequenceFromString Haskell s = "map toEnum [" ++ (intercalate "," $
map (\c -> show (fromEnum c)) s) ++ "]"
sequenceFromString Javascript s = "process.stdout.write(String.fromCharCode.apply(null, [" ++
(intercalate "," $ map (\c -> show (fromEnum c)) s) ++
"]));"
sequenceFromString Racket s = "(display (bytes " ++
(intercalate " " $ map (\c -> show (fromEnum c)) s) ++ "))"
...
Now let's build the body of q
.
paramList
generated a list of parameters for
q
, called a0
to a{n-1}
.
arg
generates a string that references a given argument
to q
. In some languages, it generates a statement that
prints q
. In others, it just generates the variable name,
because q
will concatenate the variable's value to other
strings before printing it. For example,
arg Javascript 3 = "process.stdout.write(a3)"
, while
arg Haskell 0 = "a0"
.
-- Print the nth argument to a function
arg Haskell n = "a" ++ show n
arg Javascript n = "process.stdout.write(a" ++ show n ++ ");"
arg Racket n = "(display a" ++ show n ++ ")"
...
argDivide
creates a string representing some code in the
language specified by its first argument. The code is a string
containing the function argument separator in the language specified
by its second argument. In languages where
sequenceFromString
generates code to print its argument,
argDivide
doesn't wrap the result of
sequenceFromString
in string concatenation operators, and
does otherwise.
For example, consider argDivide Haskell Javascript
.
divider Javascript = "\",\""
because, in JavaScript, we
separate arguments passed to a function with commas. The quotation
marks are to wrap the arguments in quotation marks, to make them
strings.
sequenceFromString Haskell "\",\"" = "map toEnum [34,44,34]"
, so
argDivide Haskell Javascript = "++map toEnum [34,44,34]++"
. Note how, because of sequenceFromString
,
"++map toEnum [34,44,34]++"
contains no characters that
have to be escaped in a Haskell string.
argDivide Haskell l = "++" ++
sequenceFromString Haskell (divider l) ++
"++"
argDivide Javascript l = sequenceFromString Javascript (divider l)
argDivide Racket l = sequenceFromString Racket (divider l)
...
argList
uses arg
and
argDivide
to generate code in lang1
that
prints the arguments to a call to q
in
lang2
. For example,
argList Haskell Javascript 3 = "a1++map toEnum [34,44,34]++a2++map
toEnum [34,44,34]++a0"
.
argList lang1 lang2 n = intercalate (argDivide lang1 lang2) $
map (arg lang1) ([1..n-1] ++ [0])
fromTo
prints the definition of q
in the
first language it's given. This q
prints a0
,
then the second line of the quine program in the language
l
(defn l
, then argList _ l n
,
then endDefn l
). For example,
fromTo 3 Haskell Javascript = "q a0 a1 a2=putStrLn $ a0++map toEnum
[10,114,40,34]++a1++map toEnum [34,44,34]++a2++map toEnum
[34,44,34]++a0++map toEnum [34,41,59]"
. If we called this definition of q
on "foo", "bar", and
"quux", it would print:
foo
q("bar","quux","foo");
This looks a lot like the JavaScript program in the quine loop!
fromTo n Haskell l = "q " ++ paramList Haskell n ++
"=putStrLn $ a0++" ++
sequenceFromString Haskell ("\n" ++ defn l) ++
"++" ++
argList Haskell l n ++
"++" ++ sequenceFromString Haskell (endDefn l)
fromTo n Javascript l = "function q(" ++
paramList Javascript n ++ ") {" ++
"process.stdout.write(a0);" ++
sequenceFromString Javascript ("\n" ++ defn l) ++
argList Javascript l n ++
sequenceFromString Javascript (endDefn l ++ "\n") ++
"};"
fromTo n Racket l = "(define (q " ++
paramList Racket n ++
") (begin " ++
"(display a0)" ++
sequenceFromString Racket ("\n" ++ defn l) ++
argList Racket l n ++
sequenceFromString Racket (endDefn l ++ "\n") ++
"))"
...
Almost there. We declare a list of the languages we want to cycle through:
langs = [ Haskell
, Ruby
, Perl
, C
, Python
, Java
, Rust
, OCaml
, Swift
, Racket
, Javascript
]
And now the magic. langs'
contains an infinite list of
the selected languages repeating over and over. First, we use
fromTo
to print a definition of q
in the
first language that, when called, prints a quine loop program in the
second language. Then, we use driver
to print the second
line of the program, the call to q
.
The zipWith (fromTo n) ...
expression determines the
arguments passed to q
. It reduces to a list where the
first element is fromTo n
called on the second and third
language, then on the third and fourth language, all the way back
around to the last and first language. Recall that
fromTo
prints a definition of q
in the first
language passed to it. That definition, when called, prints a quine
program in the second language passed to fromTo
. So the
arguments to driver
, and therefore to the call to
q
, are:
-
A definition of
q
in the second language that prints a quine loop program in the third language -
A definition of
q
in the third language that prints a quine loop program in the third language - ...
-
A definition of
q
in the last language that prints a quine loop program in the first language
main = do
let n = length langs
let langs' = cycle langs
putStrLn $ fromTo n (head langs') (head (tail langs'))
putStrLn $ driver (head langs') $
zipWith (fromTo n)
(take n (tail langs')) (tail (tail langs'))
Let's see why this works by considering a simpler case, when
langs = [Haskell, Javascript, Racket]
.
main
prints a quine loop program in Haskell that looks
like this:
q a0 a1 a2=putStrLn $ a0++map toEnum [10,114,40,34]++a1++map toEnum [34,44,34]++a2++map toEnum [34,44,34]++a0++map toEnum [34,41,59]
main = q "{definition of `q` in JavaScript}" "{definition of `q` in Racket}" "{definition of `q` in Haskell}"
This Haskell program prints the following JavaScript program:
{definition of `q` in JavaScript}
q("{definition of `q` in Racket}","{definition of `q` in Haskell}","{definition of `q` in JavaScript}");
Recall that q
is always defined to print its first
argument, then a call to q
with arguments rotated one to
the left. Therefore this prints the following Racket program:
{definition of `q` in Racket}
(q "{definition of `q` in Haskell}" "{definition of `q` in JavaScript}" "{definition of `q` in Racket}")
Which prints a Haskell program:
{definition of `q` in Haskell}
main = q "{definition of `q` in JavaScript}" "{definition of `q` in Racket}" "{definition of `q` in Haskell}"
But the definition of q
in Haskell is just the first line
of the generated Haskell program, closing the loop.