BTW, as the author I can say this without offending myself: lisp deserves a better article. I wrote this fast, a long time ago. Although I think the equality thing is a really big deal, I have different and better reasons now. (article forthcoming).
every language gets parsed, turned into an AST (Abstract Syntax Tree), and then compiled or interpreted
This is one of the most widespread misconceptions about compilers. You don't need to build an AST. All books that give this advice originate from academic environment and are completely out of touch with reality. Unfortunately these books can be credited for the terribly bloated compilers that we are using, including Python and Perl.
"You don't need to build an AST. All books that give this advice originate from academic environment and are completely out of touch with reality. Unfortunately these books can be credited for the terribly bloated compilers that we are using, including Python and Perl."
Can you give an example of an Open Source (so we can learn how it is done) compiler that embodies an alternate strategy?
Lua, but I'm sure there are many more. The generate-as-you-go technique is best described in Wirth's Algorithms + Data Structures = Programs, first edition (the last chapter about compilers was removed in later editions for some reason).
The last chapter was removed because (from the preface of the 2nd ed):
"The entire fifth chapter of the first Edition has been omitted. It was felt that the subject of compiler
construction was somewhat isolated from the preceding chapters and would rather merit a more extensive
treatment in its own volume. "
Jack Crenshaw's "Let's Build a Compiler" gives a tutorial in Pascal of how to do this. My own StoneKnifeForth compiles itself in an AST-free way in less than 120 lines of code, but I haven't released it yet.
Ur-Scheme (http://canonical.org/~kragen/sw/urscheme/) doesn't have any intermediate representation other than the S-expression representation of the code; it uses that to perform closure analysis. Probably an AST is the easiest form to perform this on. The Lua guys came up with an implementation technique for closures that doesn't require building a parse tree first, but it's quite a bit more complicated.
Why do you think it's so common? I mean, as a lisp/forth guy I can certainly buy the idea that everyone does a stupid thing for a long time, so I could understand if there wasn't a reason, but surely some kinds of high-level optimization are easier if your code has a tree structure, no?
No, it will not make anything easier, except will just bloat the compiler and slow down the whole process. All kinds of optimizations you probably mean are possible at the level of intermediate/pseudo code, and should be done there together with the other optimization techniques.
Interesting. Do you have pointers to any papers/implementations of optimizing compilers without abstract syntax trees. Also, are you still able to implement optimizations with the same effort? I'd gladly trade 10x longer compile time for 10% faster execution time.
> We can look at specific examples if you wish.
Common Subexpression Elimination, for example. Not just syntactically equal expressions but also cases like this:
a1 = b + c
a2 = c + b // eliminated
And how do you do loop fusion & interchange?
Edit: I now realise that we may be talking about different abstract syntax trees. I thought that you meant no intermediate data structures. But you probably meant that you don't need a parse tree, you can translate to the intermediate language directly from the source. If that's the case then here's another challenge: how do you produce error messages for type errors?
Intermediate code will obfuscate code's intent. Why require from the optimizer to reverse-engineer what you meant from a description at a different (lower) level of abstraction?
On the philosophical level, "code's intent" is to run on a machine using machine language. Practically speaking, I've never seen any problem that couldn't be solved by analyzing intermediate code.
Usually a stack-based language, very close to assembly with a lot of simplifications, e.g. no registers. Because of these simplifications, intermediate code is easier to optimize and to generate native code from.
i see "first to do X" as neutral. maybe it's indicative of more innovation and goodness elsewhere, but maybe it's done better in the next iteration or language.
"best to do X" is clearly a good, though, which is listed as part of the 1st reason.
an important reason not mentioned is library support. for practical reasons many programmers want not just "a good language" but a language where they don't have to re-invent the wheel or forge new territory (of course, that can be satisfying, too, if one has the time). this is likely the case with any language that has been around a while.
i don't know if this is the case with lisp. did that reason seem not important enough for the list, or are libraries/frameworks lacking?
The topic of libraries comes up frequently when Lisp is mentioned.
If a certain task or problem arises that isn't directly related to or
considered one of the main problems that you're trying to solve with
your program, you would consider the use of a library to accomplish
that task. If there is such a library available, go ahead and use
it. If there isn't, Lisp programmers can spend a short amount of time
implementing only what they need without generalizing the task.
This isn't quite re-inventing the wheel since you have a unique set of
constraints, and it's certainly not as onerous a task as writing a
generalized library. There are those (Edi Weitz for example) who do
tend to implement complete libraries, and the Lisp community is of
course grateful for that.
Libraries and frameworks can be useful, but they also have the effect
of levelling the playing field. Lisp empowers the programmer to
`discover' a natural solution to their problem in a unique way.