Understanding `quine-central`: the source code

2020-09-27

← quine-central →

This is an explanation of the source code of quine-central, a program that produces quine loops. A quine is a program that prints its own source code. A quine loop is an extension of the same concept to multiple programs. Each program prints the source code of the next program in the loop. If you follow the loop far enough, you get back to the program you started with!

This post's prequel (linked in the left arrow above this post) explains why the programs generated by quine-central are quine loops. This post mainly explains how quine-central generates those programs. It might be useful to read the sequel before reading this post, to understand the motivation behind some of the parts of this code. I've still tried to include that context where necessary.

quine-central generates a quine loop containing programs in eleven different languages. For simplicity, this article usually only includes the code for generating programs in three languages: Haskell, JavaScript, and Racket.

First we define a type with an option for each language we want to generate quines for.

data Languages = Haskell
               | Javascript
               | Racket
               | ...

Each program in the quine loop has two lines. The first line always defines a function q and the second line always calls that function. Let's start by generating the code for the second line of the output program, the call to q. In some languages we need to wrap this call in a main method. In others we can leave it at the top level.

defn returns the portion of the second line before the arguments list, endDefn the portion after.

-- Start the main part of program
defn Haskell = "main = q \""
defn Javascript  = "q(\""
defn Racket  = "(q \""
...

-- End main part of program
endDefn Haskell = "\""
endDefn Javascript   = "\");"
endDefn Racket   = "\")"
...

Note that each result of defn ends in a quotation mark while each result of endDefn starts with one. That's because each argument passed to q in the generated code is a string.

To separate each pair of arguments, we need a close quotation mark to mark the end of the first argument, the character that separates function arguments (a comma in most languages, a space in others), and an open quotation mark to mark the beginning of the next argument.

divider Haskell = "\" \""
divider Javascript  = "\",\""
divider Racket  = "\" \""
...

driver takes a language and a list of arguments, then generates the second line of the output program for that language. For example, driver Haskell ["foo", "bar", "quux"] returns a string containing main = q "foo" "bar" "quux", driver Javascript ["a", "b", "c", "d"] one containing "q("a", "b", "c", "d").

driver l args = defn l ++
                intercalate (divider l) args ++
                endDefn l

Now let's generate the first line of the output program, the definition of q. In all languages except Perl, which has implicit function arguments, we'll name q's arguments a0, a1, ... a{n-1} (where {n-1} is replaced with the number n - 1, one less than the number of languages in the loop).

The helper function paramList' takes the numbers [0..n-1] that should be added to the end of each argument, then builds an argument list for the variables a0 to a{n-1} that we'll later insert into a definition of q. For example, paramList Haskell 3 = "a0 a1 a2" and paramList Javascript 5 = a0,a1,a2,a3,a4. In C and Java we must explicitly specify that each argument is a string.

paramList' Haskell = intercalate " " . map (\n -> "a" ++ show n)
paramList' Javascript = intercalate "," . map (\n -> "a" ++ show n)
paramList' Racket  = intercalate " " . map (\n -> "a" ++ show n)
paramList' C       = intercalate "," . map (\n -> "char *a" ++ show n)
...

-- Generate a list or arguments to a function such as "a0,a1,..."
paramList Perl _ = ""
paramList lang n = paramList' lang [0..n-1]

Now we need to take a detour. Let's look at the program that a simplified version of quine-central produces when Haskell is the only language in the loop:

q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"\nmain = q \""++a0++"\"""

This program won't compile because the string passed to q on the second line contains unescaped double quotes. Let's escape them:

q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++\"\nmain = q \\\"\"++a0++\"\\\"\""

We could get quine-central to escape them for us. The problem is that this is no longer a quine. It prints:

q a0=putStrLn $ a0++"
main = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"
main = q \""++a0++"\"""

Looks like we might need to replace \n with \\n on the second line. We don't want to print a real newline, just \n. If we make that change, the program prints:

q a0=putStrLn $ a0++"\nmain = q \""++a0++"\""
main = q "q a0=putStrLn $ a0++"\nmain = q \""++a0++"\"""

Still not a quine! The first line of the output and source code are the same, but the second line lost most of its escaping.

The problem is that, when q prints a0 for the second time, it doesn't re-escape the characters that were escaped on the second line of the original program. We could probably handle this by adding character escaping logic for each language, but I suspect the problem becomes even more complicated with multiple languages. For example, the Perl program wraps the arguments passed to q in single quotes. This means that we don't need to escape double quotes when printing that string in from other languages.

quine-central's author found a simpler solution: Don't print any characters that need to be escaped in the first place. Instead, quine-central prints code that builds a string by converting the ASCII codes for the characters in the string into characters, then joining those characters together. The sequenceFromString function generates this code. For example, sequenceFromString Haskell "Hello, world\n" = "map toEnum [72,101,108,108,111,44,32,119,111,114,108,100,10]". Note how the result of sequenceFromString doesn't contain any characters that have to be escaped.

-- Generate code to emit individual characters comprising string.
-- Used to eliminate escape character issues.
sequenceFromString Haskell s = "map toEnum [" ++ (intercalate "," $
    map (\c -> show (fromEnum c)) s) ++ "]"
sequenceFromString Javascript s = "process.stdout.write(String.fromCharCode.apply(null, [" ++
    (intercalate "," $ map (\c -> show (fromEnum c)) s) ++
    "]));"
sequenceFromString Racket s = "(display (bytes " ++
    (intercalate " " $ map (\c -> show (fromEnum c)) s) ++ "))"
...

Now let's build the body of q. paramList generated a list of parameters for q, called a0 to a{n-1}. arg generates a string that references a given argument to q. In some languages, it generates a statement that prints q. In others, it just generates the variable name, because q will concatenate the variable's value to other strings before printing it. For example, arg Javascript 3 = "process.stdout.write(a3)", while arg Haskell 0 = "a0".

-- Print the nth argument to a function
arg Haskell n = "a" ++ show n
arg Javascript n  = "process.stdout.write(a" ++ show n ++ ");"
arg Racket n  = "(display a" ++ show n ++ ")"
...

argDivide creates a string representing some code in the language specified by its first argument. The code is a string containing the function argument separator in the language specified by its second argument. In languages where sequenceFromString generates code to print its argument, argDivide doesn't wrap the result of sequenceFromString in string concatenation operators, and does otherwise.

For example, consider argDivide Haskell Javascript. divider Javascript = "\",\"" because, in JavaScript, we separate arguments passed to a function with commas. The quotation marks are to wrap the arguments in quotation marks, to make them strings. sequenceFromString Haskell "\",\"" = "map toEnum [34,44,34]", so argDivide Haskell Javascript = "++map toEnum [34,44,34]++". Note how, because of sequenceFromString, "++map toEnum [34,44,34]++" contains no characters that have to be escaped in a Haskell string.

argDivide Haskell l = "++" ++
                      sequenceFromString Haskell (divider l) ++
                      "++"
argDivide Javascript l = sequenceFromString Javascript (divider l)
argDivide Racket l  = sequenceFromString Racket (divider l)
...

argList uses arg and argDivide to generate code in lang1 that prints the arguments to a call to q in lang2. For example, argList Haskell Javascript 3 = "a1++map toEnum [34,44,34]++a2++map toEnum [34,44,34]++a0".

argList lang1 lang2 n = intercalate (argDivide lang1 lang2) $
    map (arg lang1) ([1..n-1] ++ [0])

fromTo prints the definition of q in the first language it's given. This q prints a0, then the second line of the quine program in the language l (defn l, then argList _ l n, then endDefn l). For example, fromTo 3 Haskell Javascript = "q a0 a1 a2=putStrLn $ a0++map toEnum [10,114,40,34]++a1++map toEnum [34,44,34]++a2++map toEnum [34,44,34]++a0++map toEnum [34,41,59]". If we called this definition of q on "foo", "bar", and "quux", it would print:

foo
q("bar","quux","foo");

This looks a lot like the JavaScript program in the quine loop!

fromTo n Haskell l = "q " ++ paramList Haskell n ++
                     "=putStrLn $ a0++" ++
                     sequenceFromString Haskell ("\n" ++ defn l) ++
                     "++" ++
                     argList Haskell l n ++
                     "++" ++ sequenceFromString Haskell (endDefn l)
fromTo n Javascript l = "function q(" ++
                        paramList Javascript n ++ ") {" ++
                        "process.stdout.write(a0);" ++
                        sequenceFromString Javascript ("\n" ++ defn l) ++
                        argList Javascript l n ++
                        sequenceFromString Javascript (endDefn l ++ "\n") ++
                        "};"
fromTo n Racket l = "(define (q " ++
                    paramList Racket n ++
                    ") (begin " ++
                    "(display a0)" ++
                    sequenceFromString Racket ("\n" ++ defn l) ++
                    argList Racket l n ++
                    sequenceFromString Racket (endDefn l ++ "\n") ++
                    "))"
...

Almost there. We declare a list of the languages we want to cycle through:

langs = [ Haskell
        , Ruby
        , Perl
        , C
        , Python
        , Java
        , Rust
        , OCaml
        , Swift
        , Racket
        , Javascript
        ]

And now the magic. langs' contains an infinite list of the selected languages repeating over and over. First, we use fromTo to print a definition of q in the first language that, when called, prints a quine loop program in the second language. Then, we use driver to print the second line of the program, the call to q.

The zipWith (fromTo n) ... expression determines the arguments passed to q. It reduces to a list where the first element is fromTo n called on the second and third language, then on the third and fourth language, all the way back around to the last and first language. Recall that fromTo prints a definition of q in the first language passed to it. That definition, when called, prints a quine program in the second language passed to fromTo. So the arguments to driver, and therefore to the call to q, are:

A definition of q in the second language that prints a quine loop program in the third language
A definition of q in the third language that prints a quine loop program in the third language
...
A definition of q in the last language that prints a quine loop program in the first language

main = do
    let n = length langs
    let langs' = cycle langs
    putStrLn $ fromTo n (head langs') (head (tail langs'))
    putStrLn $ driver (head langs') $
      zipWith (fromTo n)
        (take n (tail langs')) (tail (tail langs'))

Let's see why this works by considering a simpler case, when langs = [Haskell, Javascript, Racket]. main prints a quine loop program in Haskell that looks like this:

q a0 a1 a2=putStrLn $ a0++map toEnum [10,114,40,34]++a1++map toEnum [34,44,34]++a2++map toEnum [34,44,34]++a0++map toEnum [34,41,59]
main = q "{definition of `q` in JavaScript}" "{definition of `q` in Racket}" "{definition of `q` in Haskell}"

This Haskell program prints the following JavaScript program:

{definition of `q` in JavaScript}
q("{definition of `q` in Racket}","{definition of `q` in Haskell}","{definition of `q` in JavaScript}");

Recall that q is always defined to print its first argument, then a call to q with arguments rotated one to the left. Therefore this prints the following Racket program:

{definition of `q` in Racket}
(q "{definition of `q` in Haskell}" "{definition of `q` in JavaScript}" "{definition of `q` in Racket}")

Which prints a Haskell program:

{definition of `q` in Haskell}
main = q "{definition of `q` in JavaScript}" "{definition of `q` in Racket}" "{definition of `q` in Haskell}"

But the definition of q in Haskell is just the first line of the generated Haskell program, closing the loop.