Discussions on the Syntactic Meta Programming(wg-camlp4 list)

There are some interesting discussions in the wg-camlp4 mailing list, I wrote a long mail yesterday, I cleaned it a bit, pasted it here 


 I rewrite the whole camlP4(named Fan) from scratch, building the quotation kit and throw away the crappy grammar parser, so plz believe me that I do understand the whole technology stack of camlP4, if we could reach some consensus, I would be happy to handle over the maintaining of  Fan, Fan does not loose any feature compared with camlP4, in fact it has more interesting featrues.
   Let’s begin with some easy, not too technical parts which has a significant effect on user experience though:
   1. Performance
          Performance does matter, it’s a shame that  the most time spent in compiling the ocaml compiler is dedicated to camlP4, but it is an engineering problem, currently compiling Fan only takes less than 20s, and it can be improved further
   2. Building issues
        The design of having side effects by dynamic loading is generically a bad idea, in Fan the dynamic loading only register some functionality the Fan support, it does not have any other side effect, each file stands alone says which (ppx , or filters, or syntax) it want to use with a good default option. so the building is always something like ‘-pp fan pluging1 plugin2 plugin3’, the order of pulgings does not matter, also, loading all the plugins you have does not have any side effect, even better, you can do the static linking all the plugins you collected, the building process is simplified.  
  3. Grammar Extension (Language namespace)
       I concur that grammar extension arbitrarily is a bad idea, and I agree with Gabrier that so far only the quotation(Here  quotation means delimited DSL, quosi-quotation means Lisp style macros) is modular, composable, and  I also agree with Gabrier -ppx should not be used to do syntax overriding (this should not be called syntax extension actually), that’s a terrible idea to do syntax overriding, since the user never understand what’s going on underly without reading the Makefile. So here some my suggestion is that some really conevenient syntax extesion, i.e, (let try.. in) should be merged to the built in parser. quotations does not bring too much heavy syntax (imho). In Fan, we proposed the concept of a hierarchical language name space, since once quotation is heavily used, it’s really easy to introduce conflict, the language namespace querying is exactly like java package namespace, you can import, close import to save some typing.
    Here is a taste
     {:.Fan.Lang.Meta.expr| a + b |} ——> 
      `App (`App ((`Id (`Lid “+”)), (`Id (`Lid “a”)))), (`Id (`Lid “b”)))
     {:.Fan.Lang.Meta.N.expr| a + b |}  —–>
         (_loc, (`Id (_loc, (`Lid (_loc, “+”)))),
           (`Id (_loc, (`Lid (_loc, “a”)))))),
      (`Id (_loc, (`Lid (_loc, “b”))))) 
 the .Fan.Lang.Meta.expr the first ‘.’ means it’s from the absolute namespace,  the N.expr shares exactly the same syntax without location, though
   4. Portable to diffierrent compiler extensions(like LexiFi’s fork of ocaml)
       I am pretty sure it’s pretty easy to do in Fan, only Ast2pt (dumping the intemediate Ast into Parsetree) part need to be changed to diffierent compilers.

Now let’s talk about some internal parts of SMP.
Quasi-Quotation is the essential part of SMP,  I am surprised so far that the discussion silently ignores the quasi-quotation, Leo’s answer of writing   three parsers is neither satisfying nor practical(imho). 
Camlp4 is mainly composed of two parts, one is the extensible parser and the other significant part is Ast Lifting. Since we all agree that extensible parser increases the complexity too much, let’s simply ignore that part.
The Ast Lifting are tightly coupled with the design of the Abstract Syntax Tree.  People complain about that Camlp4 Ast is hard to learn and using quasi-quotation to do the pattern match is a bad idea.
Let me explain the topic a bit:
    Camlp4Ast is hard to learn, I agree, it has some alien names that nobody understand what it  means, quosi-quotation is definitely a great idea to boom the meta-programming, but my experience here is for very very small Ast fragment, using the Abstract Syntax Tree directly, otherwise Quasi-quotation is a life saver to do the meta programming.
   Luckily the quotation kit has nothing to do with the parser part, it’s simply several functions(I did some simplify a bit) which turns a normal runtime  
value into an Ast node generically, such kind functions are neither easy to write nor easy to read,the idea case is that it should be generated once for all, and all the data types in normal ocamlshould be derived automatically(some ADT with functions can not be derived). I bet it’s mostly likely a nightmare if we maintain 3 parsers for the ocaml grammar while two other parsers dumping to a meta-level
   So, how to make Ast Lifting easier, 
        The first guideline is “Don’t mixing with records”, 
         Once you encoding AST with records, you have to encode the records in the meta level which increases the complexity without bringing any new features, it’s simply not worthwhile.
        The second guideline is “Don’t do any syntax desugaring” , syntax desguaring makes the semantics of syntax meta programming a bit weird. Syntax desguaring happens everywhere in Parsetree, think about the list literals, it uses the syntax desuaring, if you don’t use any syntax desugaring, for example, you want to match the bigarray access, you simply needed to match `Bigarray(..)’ instead of 
                     {txt= Ldot (Ldot (Lident “Bigarray”, array), (“get”|”set” as gs)) ;_};_},
       The third guideline is to make it as uniform as possible
       This not only helps the user, but it helps the meta-programming over types to derive some utility types. Take a look at my Ast encoding in Fan https://github.com/bobzhang/Fan/blob/master/src/Ast.ml (it needs to be polished, plz don’t panic when you see variants I use here)
      The initial Ast has locations and ant support, but here we derive 3 other Asts thanks to my very regular design. AstN is the Ast without locations, the locations are important, but it is simply not too much helpful when you only do the code generation, but it complicates the expanded code a lot), AstA is the Ast without antiquotations(simply remove the ant branch), it is a subtype of Ast(thanks to the choice we use variants here), AstNA is the Ast without neither locations nor antiquotations), it is a subtype of AstN.  In practice, I found the Ast without locations is particular helpful when you only do the code generation, it simplifies this part significantly. The beautiful part is that  all the four Ast share the same grammar with the same quosiquotatoin mechanism, as I showed .Fan.Lang.N.expr and .Fan.Lang.expr
    I don’t know how many parsers you have to maintain to reach such a goal or it’s never going to happen.
    Using variants to encode the intermediate ast has a lots of other benefits, but I don’t want to cover it in such a short mail.
   So, my proposal is that the community design an Intermediate Ast together, and write a built-in parser to such Intermediate Ast then dump to Parsetree, but I am for that Parsetree still needs to be cleaned a bit but not too much change .  I do appreciate you can take something away from Fan, I think the Parsetree is not the ideal part to do SMP, HTH



Random thoughts about Syntactic Meta Programming (I)

I should write this blog long time ago, but I am so adddicted to Fan that I don’t have time to write it, programming is much more fun than blogging.

Anyway, better late than never, XD.

What’s syntactic meta programming?

What’s meta programming?

Meta programming is an interesting but also challenging domain, the essential idea is that “program as data”. Wait, you may wonder that in Von Neumann architecture, program is always data, so to be more precise, meta programming is kinda “program as structured data”, the structured data should be easy to manipulate and generate. Think about Lisp, since it does not have any concrete syntax, its program is always S-expression, a hierachical data structure which is easy to manipulate and process.

Meta-program at different layers

When you write a compiler, the program should have different representations in different stages, think about the ocaml compiler workflow

Raw String -->  Token Stream -->
Parsetree --> Typedtree --> Lambda -->
ULambda --> C-- --> Mach --> Linear -->

So, at different stages, the program as a structured data can be processed in different ways.

You can insert plugins per level, for example, the c macros mainly does the token stream transformation, but there is a problem with the token stream that it is not a structured data.

Ther earlier stage you do the transformation, the easier it is to be mapped to you original source program, the later stage you do the transformation, the compiler do more program analysis, but it’s harder to map to the original program. So each stage has its use case.

Here we only talk about syntactic meta programming(SMP), where the layer is in the parsetree or called Abstract Syntax and we only talk about the host language OCaml (OCaml is really a great language, you should have a try!), but some high level design choices should be applied to other host languages as well.

The essential part of SMP

I suggest anyone who are interested in SMP should learn Common Lisp, there are so many brilliant ideas there and forgotten by people outside the community. And two books are really fun, one is On Lisp, the other is Let Over Lambda .

The essential part of SMP is Quasi-Quotation. There is a nice paper introduces the benefits of Quasi-Quotation: Quasiquotation in Lisp.

Here we only scratch its surface a tiny bit: “Quasiquotation is a parameterized version of ordinary quotation, where instead of specifying a value exactly, some holes are left to be filled in later. A quasiquotation is a template.”, breifly, quasi-quotation entitiles you the ability to abstraction over code.

As the paper said, a typical use of quasiquotation in a macro definition looks like

(defmacro (push expr var)
 `(set! ,var (cons ,expr ,var)))

Here the “`” introduces a quasi-quotaion, and “,” introduces a parameter(we also call it anti-quote), there are a number of languages which supports quasiquotation except the lisp family, but none of them are even close to Lisp.

One challenging part lies not in quote part, it lies in anti-quote part, however. In lisp, you can antiquote everywhere, suppose you are writing Template Haskell, you can write some thing like this

[| import $module |]

In lisp, it allows very fine-grained quasi-quote.

The other challegning part is nested quosi-quotation. Since meta-program itself is a normal program, when you do meta programming a lot in Common Lisp, you will find you wrote a lot of duplicated meta-programs, here nested quasi-quotation came to rescue.

Discussing nested quasi-quotation may goes beyond the scope of the first blog, but you can have a taste here

(defmacro (def-caller abbrev proc)
 `(defmacro (,abbrev var expr)
    `(,`,proc (lambda (,var) ,expr))))

Some defects in Lisp Style Macors

Though I really enjoyed Lisp Macros, to be honest, the S-expression as concrete syntax to represent a program is not the optimal way to express ideas.

For the extreme flexibility, you have to pay that for each program you use a sub-optimal concrete syntax.

The second problem is that Lisp is a dynamically typed language, though currently practical type system can help catch only some trivial errors, but they do help a lot.

For a sufficient smart compiler, like SBCL, they did type inference or constraint propgation, and that emits really helpful warnings, the type checking may not be that important there, but that depends on the compiler implementation, some young implementations, like clojure, the compiler is not smart enough to help diagnose, yet.

The third problem is that Lisp macros ignore locations totally, when you process the raw S-expression, no location is kept, in some domains, code generation, for example, location is not that important since you only emit a large trunk of code, in other domains, Ast transformation, location is important to help emit helpful error messages. Keeping location correct is very tedious but necessary, IMHO. Some meta programming system, Template Haskell, ignores locations as well.

How to do SMP in rich syntax language

Now let’s go back to OCaml, the great language XD.

It is the same as Lisp, you have to encode the Ast in the host language, you can encode the ocaml’s Ast using S-expression as well.

S-expression is a viable option, Felix adopts this mechanism. The advantage of using S-exprssion to encode the S-expression is that you can reach the maximum code reuse and don’t need to fight against the type system from time to time.

For example, in Camlp4, once you want to get the location of an Ast node, you have to fix its type, so if have to write a lot of bolierpolate code like this

val loc_of_expr: expr -> loc
val loc_of_ctyp: ctyp -> loc
val loc_of_patt: patt -> loc

Things turn out to be better with type class support in Haskell, but that’s another story.

Think about the case you want to use a semi ; to connect two Ast node, you have to write things like

let sem e1 e2 =
   let _loc = Loc.merge (loc_of_expr e1 ) (loc_of_expr e2) in
   Sem(_loc, e1,e2)

Everytime you want to fetch the location, you have to fix its type, that’s too bad, the API to process the Syntax is too verbose

But using Algebraic Data Type does have some advantages, the first is pattern match (with exhuastive check), the second is type checking, we do tell some difference between Ast.expr and Ast.patt, and that helps, but you can not tell whether it’s an expresson of type int or type boolean, for example

(Int "3" : expr)
(String "3" :expr)

MetaOCaml can guarantees the type correctness, but there is always a trade off between expressivity and type safety. Anyway, in a staticly typed language, i.e, OCaml, the generated program is always type checked.

So, in OCaml or other ML dialects , you can encode the Abstract Syntax using one of those: untyped s-expression, partial typed sum types, records, GADT, or mixins of records and sum types. there is another unique solution which exists in OCaml, variants.

We will discuss it further in the next post.

Quasi-quotation in OCaml

Quasi-quotation in lisp is free, since the concrete syntax is exactly the same as abstract syntax.

(+ a 3 4) ;; program

`(+ a 3 4) ;; data 

There is a paper which summarizes how to do quasi-quotation in rich syntax language: Why it’s nice to be quoted.

Unlike Lisp, the different between program and data is obvious

3 (* program *)
`Int (_loc, "3") (* data *)

"3" (* program *)
`String (_loc, "3") (* data *)

(Here we use a qutoe “`” to denote that it’s an Ast )

Let’s take a look at the parsing phase (for simplicity, we ignore the locations).

When you do the parsing, the normal behavior is as follows:

 "3 + 4"
 ==> to the Ast 
`App ((`App ((`Id (`Lid "+")), (`Int "3"))), (`Int "4"))

But to do the quasi-quotation, you need to turn the Ast itself into data, so you need to encode the Ast using the Ast itself

==> to the Ast
`App ((`App ((`Id (`Lid "+")), (`Int "3"))), (`Int "4"))

==> to the Data
     ((`Vrn "App"),
              ((`Vrn "App"),
                (`App ((`Vrn "Id"), (`App ((`Vrn "Lid"), (`Str "+"))))))),
            (`App ((`Vrn "Int"), (`Str "3"))))))),
   (`App ((`Vrn "Int"), (`Str "4"))))

So, to do it once for all, we needs a function (for simplicty)

val meta_expr: expr^0 -> expr^1 

Luckily since expr^1 is a subset of expr^0, so you get the belowing function for free

val meta_expr: expr^1 -> expr^2 

Actually you may find that the category expr^2 is exactly the same as expr^1, so once you have expr^0 -> expr^1, you have expr^0 -> expr^n. (antiquotation will be discussed later).

So the problem only lies into how to write the function expr^0->expr^1, you need to encode the Ast using the Ast itself, this requires that the Ast should be expressive enough to express itself. This is alwasy not true, suppose you use the HOAS, HOAS is not expressive enough to express itself.

If you mixin the records with sum types, you have to express both records and sum types, the Ast lifting is neither easy to write, nor easy to read, with locations, it becomes even more cmoplex, the best case is to do it automatically and once for all.

Suppose you only use sum types, luckily we might find that only five tags are expressive enough to express this function expr^0 -> expr^1, here are five tags

App Vrn Str Tup Com

Here Tup means “tuple”, and Com means “Comma”.

The minimal, the better, this means as long as the changes to the Abstract Syntax Tree does not involves the five tags, it will always work out of the box.

So, to design the right Ast for meta programming, the first thing is to keep it simple, don’t use Records or other complex data types , Sum types or polymorphic variants are rich enough to express the who syntax of ocaml but itself is very simple to do the Ast Lifting.

In the next blog, we may discuss tThe right way to design an Abstract Syntax Tree for SMP.