Syntactic Meta-Programming in OCaml (II)

In this post, we continue discussing syntactic meta-programming
following last post.

My years of experience in different meta-program system(Common Lisp,
Template Haskell, Camlp4) tell me that quosi-quotation is the most
essential part in syntactic meta programming. Though all three claims
they have quosi-quotation support. But Template Haskell’s
quosi-quotation falls far behind either Camlp4 or Common Lisp. For a
decent quosi-quotation system, first, nested quotation and
anti-quotation is necessary, second, like Lisp, every part should be
able to be quoted and antiquoted except keywords position, that’s to
say, each part of the code fragment can be parametrized.

For the notation, we denote Ast^0 as the normal Ast, Ast^1 as Ast
encoding Ast^0, the same as Ast^n.

So in this post, we discuss the quosi-quotation first.

The implementation of quosi-quotation heavily relies on the
implementation of the compiler, so let’s limit the scope of how to get
quosi-quotation done to OCaml.

Let’s ignore the antiquote part, and focus the quote part first, the
essential of quosi-quotation is to encode the Ast using Ast itself in
the meta level: there are different technologies to implement
quosi-quotations, to my knowledge, I summarized three here:

Raw String manipulation

This is the most intuitive way, given a string input, the normal
way of parsing is transform it into a parsetree,

val parse: string -> ast

To encode the meta-level ast, we can do the unparsing again,
assume we have an unparsing function which unparse the ast

val unparse: ast -> string

so after the composition of parse and unparse, you transformed a
string into the meta-level

(parse "3")
- `Int "3"
unparse(parse "3")
- "`Int \"3\""

Then you can do parse again, after parse(unparse (parse "3")),
we managed to lift the Ast in the meta level. There are serious
defects with this way, First, it’s very brittle, since we are doing
string manipulation in different levels, second, after unparsing,
the location is totally lost, location is one of the most tedious
but necessary part for a practical meta programming system, third,
there is no easy way to integrate with antiquot. This technique is
quite intuitive and easy to understand, but I don’t know any
meta-system do it this way, so feel free to tell me if you know
anyone does similar work 😉

Maintaining different parsers

Unlike the string manipulation, it write different parsers for
different actions. Suppose we are in OCaml, if we want to support
quosi-quotations in such syntax categories

sig_item, str_item, patt, expr, module_type, module_expr, class_type
class_expr, class_sig_item, class_str_item, with_constr, binding, rec_binding,

And you want the quosi-quotaion appears in both expr and patt
positions, then you have to write 14 x (2+1) parsers, the parser can
not be re-usable, if you want to support overloaded quotations (I
will talk about it later), then you have to roll your own parser
again. Writing parser is not hard, but it’s not fun either, and
keeping sync up different parsers is a nightmare.

To make things worse, once anti-quotation is considered, for each
category, there are three parsers to write, but anti-quot makes
them slightly different. To be honest, this way is impractical.

Ast Lifting

Another mechanism to do quosi-quotation is that imaging we have a
powerful function:

val meta: ast^0 -> ast^1

This seems magic, but it’s possible even though in OCaml we don’t
have generic programming support, since we have the definition
of ast.

The problem with this technique is that it requires an explicit
Ant tag in the ast representation, since at ast^0 level, you
have to store Ant as intermediate node which will be removed when
applied meta function.

Let’s walk through each phase in Fan

Think about how such piece of code would be parsed in Fan:

{:expr| $x + y|}

For the first phase (I removed the location for simplicity)

      ( `Id ( `Lid ( "+")),
        `Ant ( {cxt = "expr"; sep = None; decorations = ""; content = "x"})),
     `Id ( `Lid ( "y")))

Here Ant exists only as intermediate node, it will be eliminated
in the meta-step

after applied with meta function

(Filters.ME.meta_expr _loc (t expr "$x + y"));
- : FanAst.expr =
     (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
           (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
              (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
                 (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))),
                  `Str (, "+")))),
         `Ant (, {cxt = "expr"; sep = None; decorations = ""; content = "x"}))),
     (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
      `App (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))), `Str (, "y"))))   

(t is a parsing function)
Here we see that Ant node is still kept and it will be
filtering, now we can filter the Ant node into a normal Ast,

(Ant.antiquot_expander ~parse_patt ~parse_expr)#expr (Filters.ME.meta_expr _loc (t expr " $x + y"));
- : FanAst.expr =
     (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
           (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
              (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
                 (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))),
                  `Str (, "+")))),
         `Id (, `Lid (, "x")))),
     (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
      `App (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))), `Str (, "y"))))   

(location in the meta-level is ignored)
If we want to share the same grammar between the Ast^n(n=0,1,2,...),
Ast lifting (a function of type Ast^0 -> Ast^1) is necessary.


We see the three techniques introduced here to do the
quosi-quotation, Fan adopts the third one, suppose we pick the
third one, let’s discuss what kind of Ast representation we need to
make life easier.

As we discussed previously, introducing records in the Abstract Syntax
brings in un-necessary complexity when you want to encode the Ast
using the Ast itself since you have to express the record in the
meta-level as well.

Another defect with current Parsetree is that it was designed without
meta-programming in mind, so it does not provide an Ant tag in all
syntax categories, so in the zero stage Ast^0, you can not have an
Ast node $x in the outermost, since it’s semantically incorrect in
Ast^0, but syntactically correct in Ast^n(n=0,1,2,...)

The third defect with the Parsetree is that it’s quite irregular,
so you can not do any meta-programming with the parsetree itself, for
example, stripping all the location from the Ast node to derive a new
type without locations, deriving a new type without anti-quot tags (we
will see that such ability is quite important in Fan)

The fourth defect is more serious from the point of view of
semantics, since in OCaml, there is no way to express absolute path,
when you do the Ast lifting, the time you define Ast lifting is
different from the time you use the quotations

Camlp4’s Ast is slightly better than Parsetree, since it does not
introduce records to increase the complexity.

However, Camlp4’s Ast can not express the absolute path which
results in a semantics imprecise, another serious implementation
defect is that it tries to encode the anti-quote using both two
techniques: either explicit Ant tag or via string mangling, prefix
the string with \\$:, and Camlp4’s tag name is totally not

Think a bit further , about syntactic meta-programming, what we
really care about is purely syntax, Int "3"= should not be different whether it is of type =expr or patt, if we take a
location of ast node, we should not care about whether its type is
expr or patt or str_item, right?

If we compose two ast node using semi syntax ;, we really don’t
care about whether it’s expr node or patt node

let sem a b = {| $a; $b |}

The code above should work well under already syntax categories as
long as it support `Sem tag.

Changing the underlying representation of Ast means all existing
code in Camlp4 engine can not be reused, since the quotation-kit no
longer apply in Fan, but the tough old days are already gone, Fan
already managed to provide the whole quotation kit from scratch. In
the next post we will talk about the underly Ast using polymorphic
variants in Fan, and argue why it’s the right direction.

Thanks for your reading!(btw, there’s a bug in Emacs org/blog, sorry for posting several times)