Random thoughts about Syntactic Meta Programming (I)

I should write this blog long time ago, but I am so adddicted to Fan that I don’t have time to write it, programming is much more fun than blogging.

Anyway, better late than never, XD.

What’s syntactic meta programming?

What’s meta programming?

Meta programming is an interesting but also challenging domain, the essential idea is that “program as data”. Wait, you may wonder that in Von Neumann architecture, program is always data, so to be more precise, meta programming is kinda “program as structured data”, the structured data should be easy to manipulate and generate. Think about Lisp, since it does not have any concrete syntax, its program is always S-expression, a hierachical data structure which is easy to manipulate and process.

Meta-program at different layers

When you write a compiler, the program should have different representations in different stages, think about the ocaml compiler workflow

Raw String -->  Token Stream -->
Parsetree --> Typedtree --> Lambda -->
ULambda --> C-- --> Mach --> Linear -->

So, at different stages, the program as a structured data can be processed in different ways.

You can insert plugins per level, for example, the c macros mainly does the token stream transformation, but there is a problem with the token stream that it is not a structured data.

Ther earlier stage you do the transformation, the easier it is to be mapped to you original source program, the later stage you do the transformation, the compiler do more program analysis, but it’s harder to map to the original program. So each stage has its use case.

Here we only talk about syntactic meta programming(SMP), where the layer is in the parsetree or called Abstract Syntax and we only talk about the host language OCaml (OCaml is really a great language, you should have a try!), but some high level design choices should be applied to other host languages as well.

The essential part of SMP

I suggest anyone who are interested in SMP should learn Common Lisp, there are so many brilliant ideas there and forgotten by people outside the community. And two books are really fun, one is On Lisp, the other is Let Over Lambda .

The essential part of SMP is Quasi-Quotation. There is a nice paper introduces the benefits of Quasi-Quotation: Quasiquotation in Lisp.

Here we only scratch its surface a tiny bit: “Quasiquotation is a parameterized version of ordinary quotation, where instead of specifying a value exactly, some holes are left to be filled in later. A quasiquotation is a template.”, breifly, quasi-quotation entitiles you the ability to abstraction over code.

As the paper said, a typical use of quasiquotation in a macro definition looks like

(defmacro (push expr var)
 `(set! ,var (cons ,expr ,var)))

Here the “`” introduces a quasi-quotaion, and “,” introduces a parameter(we also call it anti-quote), there are a number of languages which supports quasiquotation except the lisp family, but none of them are even close to Lisp.

One challenging part lies not in quote part, it lies in anti-quote part, however. In lisp, you can antiquote everywhere, suppose you are writing Template Haskell, you can write some thing like this

[| import $module |]

In lisp, it allows very fine-grained quasi-quote.

The other challegning part is nested quosi-quotation. Since meta-program itself is a normal program, when you do meta programming a lot in Common Lisp, you will find you wrote a lot of duplicated meta-programs, here nested quasi-quotation came to rescue.

Discussing nested quasi-quotation may goes beyond the scope of the first blog, but you can have a taste here

(defmacro (def-caller abbrev proc)
 `(defmacro (,abbrev var expr)
    `(,`,proc (lambda (,var) ,expr))))

Some defects in Lisp Style Macors

Though I really enjoyed Lisp Macros, to be honest, the S-expression as concrete syntax to represent a program is not the optimal way to express ideas.

For the extreme flexibility, you have to pay that for each program you use a sub-optimal concrete syntax.

The second problem is that Lisp is a dynamically typed language, though currently practical type system can help catch only some trivial errors, but they do help a lot.

For a sufficient smart compiler, like SBCL, they did type inference or constraint propgation, and that emits really helpful warnings, the type checking may not be that important there, but that depends on the compiler implementation, some young implementations, like clojure, the compiler is not smart enough to help diagnose, yet.

The third problem is that Lisp macros ignore locations totally, when you process the raw S-expression, no location is kept, in some domains, code generation, for example, location is not that important since you only emit a large trunk of code, in other domains, Ast transformation, location is important to help emit helpful error messages. Keeping location correct is very tedious but necessary, IMHO. Some meta programming system, Template Haskell, ignores locations as well.

How to do SMP in rich syntax language

Now let’s go back to OCaml, the great language XD.

It is the same as Lisp, you have to encode the Ast in the host language, you can encode the ocaml’s Ast using S-expression as well.

S-expression is a viable option, Felix adopts this mechanism. The advantage of using S-exprssion to encode the S-expression is that you can reach the maximum code reuse and don’t need to fight against the type system from time to time.

For example, in Camlp4, once you want to get the location of an Ast node, you have to fix its type, so if have to write a lot of bolierpolate code like this

val loc_of_expr: expr -> loc
val loc_of_ctyp: ctyp -> loc
val loc_of_patt: patt -> loc

Things turn out to be better with type class support in Haskell, but that’s another story.

Think about the case you want to use a semi ; to connect two Ast node, you have to write things like

let sem e1 e2 =
   let _loc = Loc.merge (loc_of_expr e1 ) (loc_of_expr e2) in
   Sem(_loc, e1,e2)

Everytime you want to fetch the location, you have to fix its type, that’s too bad, the API to process the Syntax is too verbose

But using Algebraic Data Type does have some advantages, the first is pattern match (with exhuastive check), the second is type checking, we do tell some difference between Ast.expr and Ast.patt, and that helps, but you can not tell whether it’s an expresson of type int or type boolean, for example

(Int "3" : expr)
(String "3" :expr)

MetaOCaml can guarantees the type correctness, but there is always a trade off between expressivity and type safety. Anyway, in a staticly typed language, i.e, OCaml, the generated program is always type checked.

So, in OCaml or other ML dialects , you can encode the Abstract Syntax using one of those: untyped s-expression, partial typed sum types, records, GADT, or mixins of records and sum types. there is another unique solution which exists in OCaml, variants.

We will discuss it further in the next post.

Quasi-quotation in OCaml

Quasi-quotation in lisp is free, since the concrete syntax is exactly the same as abstract syntax.

(+ a 3 4) ;; program

`(+ a 3 4) ;; data 

There is a paper which summarizes how to do quasi-quotation in rich syntax language: Why it’s nice to be quoted.

Unlike Lisp, the different between program and data is obvious

3 (* program *)
`Int (_loc, "3") (* data *)

"3" (* program *)
`String (_loc, "3") (* data *)

(Here we use a qutoe “`” to denote that it’s an Ast )

Let’s take a look at the parsing phase (for simplicity, we ignore the locations).

When you do the parsing, the normal behavior is as follows:

 "3 + 4"
 ==> to the Ast 
`App ((`App ((`Id (`Lid "+")), (`Int "3"))), (`Int "4"))

But to do the quasi-quotation, you need to turn the Ast itself into data, so you need to encode the Ast using the Ast itself

==> to the Ast
`App ((`App ((`Id (`Lid "+")), (`Int "3"))), (`Int "4"))

==> to the Data
     ((`Vrn "App"),
              ((`Vrn "App"),
                (`App ((`Vrn "Id"), (`App ((`Vrn "Lid"), (`Str "+"))))))),
            (`App ((`Vrn "Int"), (`Str "3"))))))),
   (`App ((`Vrn "Int"), (`Str "4"))))

So, to do it once for all, we needs a function (for simplicty)

val meta_expr: expr^0 -> expr^1 

Luckily since expr^1 is a subset of expr^0, so you get the belowing function for free

val meta_expr: expr^1 -> expr^2 

Actually you may find that the category expr^2 is exactly the same as expr^1, so once you have expr^0 -> expr^1, you have expr^0 -> expr^n. (antiquotation will be discussed later).

So the problem only lies into how to write the function expr^0->expr^1, you need to encode the Ast using the Ast itself, this requires that the Ast should be expressive enough to express itself. This is alwasy not true, suppose you use the HOAS, HOAS is not expressive enough to express itself.

If you mixin the records with sum types, you have to express both records and sum types, the Ast lifting is neither easy to write, nor easy to read, with locations, it becomes even more cmoplex, the best case is to do it automatically and once for all.

Suppose you only use sum types, luckily we might find that only five tags are expressive enough to express this function expr^0 -> expr^1, here are five tags

App Vrn Str Tup Com

Here Tup means “tuple”, and Com means “Comma”.

The minimal, the better, this means as long as the changes to the Abstract Syntax Tree does not involves the five tags, it will always work out of the box.

So, to design the right Ast for meta programming, the first thing is to keep it simple, don’t use Records or other complex data types , Sum types or polymorphic variants are rich enough to express the who syntax of ocaml but itself is very simple to do the Ast Lifting.

In the next blog, we may discuss tThe right way to design an Abstract Syntax Tree for SMP.