Code layout of Fan (a metaprogramming tool for OCaml)

Fan is a metaprogramming tool for OCaml. I believe it would be an
invaluable tool for the community when it’s production ready.

If you are attracted by the power of Camlp4 while frustrated with the
complexity, or slowness, I would be very glad that you could join and contribute.

It’s already ready for accepting external contribution, since the
development of Fan would be pinned to OCaml 4.01.0 for a while.

1 Code layout

The source is scattered in four directories
common, treeparser, src, cold, and unitest

In the common directory, all sources are written in vanilla OCaml,
it defines basic primitives which dumps Fan’s abstract syntax into
OCaml’s abstract syntax.

Note that all compiler related modules is isolated in this
directory. That said, in the future, if we would like to support
different patched compiler, for example, metaocaml or Lexifi’s mlfi
compiler, only this directory needs to be touched.

In the treeparser directory, it is also written in vanilla Ocaml,
it defines the runtime of the parsing structure.

The src directory is written in Fan’s syntax.
Keep in mind that Fan’s syntax is essentially the same as OCaml’s
syntax, except that it allows quotations, and other tiny
differences (parens around tuple is necessary, not optional).

The cold directory is a mirror of src directory in vanilla
ocaml
, though it is much verbose compared with src.

So when you get the source tree, the initial build is typically

ocamlbuild cold/fan.native

The first command ocamlbuild cold/fan.native would build a binary
composed of modules from common, treeparser and cold. Since
they are all written in vanilla ocaml, no preprocessors are
needed for the initial compilation.

If you are a third party user, that’s pretty much all you need to
know. As a developer of Fan, the next command is

./re cold fan.native

This shell script would symlink _build/cold/fan.native to
_build/boot/fan, which would be used by compiling src.

Now you can compile the src directory

ocamlbuild src/fan.native

The command ocamlbuild cold/fan.native would build a binary
composed of modules from common, treeparser and cold.

Note that at this time src is a mirror of cold, after preprocessored
by fan, the produced binary src/fan.native should be the same as
cold/fan.native.

Now you can

./re src fan.native

But this does not make much sense since src/fan.native is exactly
the same as ./cold/fan.native.

Okay, in most cases, your development would be in such directories:
common, treeparser, src and unitest.

If you only touch common, treeparser or unitest, commit the changes
and send me a pull request.

If you also touch the files in src directory, you should mirror
those changes back to cold directory. Here we go:

./snapshot

Yes, now all the changes in src will be mirrored back to cold
directory.
For a simple change, commit and done. For a complex change that you
are not sure whether it would break anything or not, try to run:

./hb fan.native

The command above would first build src/fan.native using the
current preprocessor _build/boot/fan.

When it’s done, it would first remove the directories _build/src,
and _build/common, _build/treeparser. Then it would set
_build/boot/fan to be the new build preprocessor src/fan.native.
After that it would call ocamlbuild src/fan.native to build a new
preprocessor based on the existing preprocessor.

Then it would compare the two preprocessors, if they are exactly the
same, it means we manage to have a successful bootstrap. There is a
large chance that your change is correct.

Then

make test

If everything goes well, it’s safe to commit now.

When the bootstrap fails, generally two cases: 1. the comparison
does not tell you the two preprocessors are the same, the normal
workflow is to repeat the command ./hb fan.native again. 2. It
fails to compile, since you always have cold/fan.native compiled,
fall back to such preprocessor and see where you made wrong.

Syntactic Meta-Programming in OCaml (II)

In this post, we continue discussing syntactic meta-programming
following last post.

My years of experience in different meta-program system(Common Lisp,
Template Haskell, Camlp4) tell me that quosi-quotation is the most
essential part in syntactic meta programming. Though all three claims
they have quosi-quotation support. But Template Haskell’s
quosi-quotation falls far behind either Camlp4 or Common Lisp. For a
decent quosi-quotation system, first, nested quotation and
anti-quotation is necessary, second, like Lisp, every part should be
able to be quoted and antiquoted except keywords position, that’s to
say, each part of the code fragment can be parametrized.

For the notation, we denote Ast^0 as the normal Ast, Ast^1 as Ast
encoding Ast^0, the same as Ast^n.

So in this post, we discuss the quosi-quotation first.

The implementation of quosi-quotation heavily relies on the
implementation of the compiler, so let’s limit the scope of how to get
quosi-quotation done to OCaml.

Let’s ignore the antiquote part, and focus the quote part first, the
essential of quosi-quotation is to encode the Ast using Ast itself in
the meta level: there are different technologies to implement
quosi-quotations, to my knowledge, I summarized three here:

Raw String manipulation

This is the most intuitive way, given a string input, the normal
way of parsing is transform it into a parsetree,

val parse: string -> ast

To encode the meta-level ast, we can do the unparsing again,
assume we have an unparsing function which unparse the ast

val unparse: ast -> string

so after the composition of parse and unparse, you transformed a
string into the meta-level

(parse "3")
- `Int "3"
unparse(parse "3")
- "`Int \"3\""

Then you can do parse again, after parse(unparse (parse "3")),
we managed to lift the Ast in the meta level. There are serious
defects with this way, First, it’s very brittle, since we are doing
string manipulation in different levels, second, after unparsing,
the location is totally lost, location is one of the most tedious
but necessary part for a practical meta programming system, third,
there is no easy way to integrate with antiquot. This technique is
quite intuitive and easy to understand, but I don’t know any
meta-system do it this way, so feel free to tell me if you know
anyone does similar work ;-)

Maintaining different parsers

Unlike the string manipulation, it write different parsers for
different actions. Suppose we are in OCaml, if we want to support
quosi-quotations in such syntax categories

sig_item, str_item, patt, expr, module_type, module_expr, class_type
class_expr, class_sig_item, class_str_item, with_constr, binding, rec_binding,
match_case,

And you want the quosi-quotaion appears in both expr and patt
positions, then you have to write 14 x (2+1) parsers, the parser can
not be re-usable, if you want to support overloaded quotations (I
will talk about it later), then you have to roll your own parser
again. Writing parser is not hard, but it’s not fun either, and
keeping sync up different parsers is a nightmare.

To make things worse, once anti-quotation is considered, for each
category, there are three parsers to write, but anti-quot makes
them slightly different. To be honest, this way is impractical.

Ast Lifting

Another mechanism to do quosi-quotation is that imaging we have a
powerful function:

val meta: ast^0 -> ast^1

This seems magic, but it’s possible even though in OCaml we don’t
have generic programming support, since we have the definition
of ast.

The problem with this technique is that it requires an explicit
Ant tag in the ast representation, since at ast^0 level, you
have to store Ant as intermediate node which will be removed when
applied meta function.

Let’s walk through each phase in Fan

Think about how such piece of code would be parsed in Fan:

{:expr| $x + y|}

For the first phase (I removed the location for simplicity)

`App(`App
      ( `Id ( `Lid ( "+")),
        `Ant ( {cxt = "expr"; sep = None; decorations = ""; content = "x"})),
     `Id ( `Lid ( "y")))

Here Ant exists only as intermediate node, it will be eliminated
in the meta-step

after applied with meta function

(Filters.ME.meta_expr _loc (t expr "$x + y"));
- : FanAst.expr =
`App
  (,
   `App
     (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
      `App
        (,
         `App
           (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
            `App
              (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
               `App
                 (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))),
                  `Str (, "+")))),
         `Ant (, {cxt = "expr"; sep = None; decorations = ""; content = "x"}))),
   `App
     (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
      `App (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))), `Str (, "y"))))   

(t is a parsing function)
Here we see that Ant node is still kept and it will be
filtering, now we can filter the Ant node into a normal Ast,

(Ant.antiquot_expander ~parse_patt ~parse_expr)#expr (Filters.ME.meta_expr _loc (t expr " $x + y"));
- : FanAst.expr =
`App
  (,
   `App
     (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
      `App
        (,
         `App
           (, `App (, `Vrn (, "App"), `Id (, `Lid (, "_loc"))),
            `App
              (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
               `App
                 (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))),
                  `Str (, "+")))),
         `Id (, `Lid (, "x")))),
   `App
     (, `App (, `Vrn (, "Id"), `Id (, `Lid (, "_loc"))),
      `App (, `App (, `Vrn (, "Lid"), `Id (, `Lid (, "_loc"))), `Str (, "y"))))   

(location in the meta-level is ignored)
If we want to share the same grammar between the Ast^n(n=0,1,2,...),
Ast lifting (a function of type Ast^0 -> Ast^1) is necessary.

Summary

We see the three techniques introduced here to do the
quosi-quotation, Fan adopts the third one, suppose we pick the
third one, let’s discuss what kind of Ast representation we need to
make life easier.

As we discussed previously, introducing records in the Abstract Syntax
brings in un-necessary complexity when you want to encode the Ast
using the Ast itself since you have to express the record in the
meta-level as well.

Another defect with current Parsetree is that it was designed without
meta-programming in mind, so it does not provide an Ant tag in all
syntax categories, so in the zero stage Ast^0, you can not have an
Ast node $x in the outermost, since it’s semantically incorrect in
Ast^0, but syntactically correct in Ast^n(n=0,1,2,...)

The third defect with the Parsetree is that it’s quite irregular,
so you can not do any meta-programming with the parsetree itself, for
example, stripping all the location from the Ast node to derive a new
type without locations, deriving a new type without anti-quot tags (we
will see that such ability is quite important in Fan)

The fourth defect is more serious from the point of view of
semantics, since in OCaml, there is no way to express absolute path,
when you do the Ast lifting, the time you define Ast lifting is
different from the time you use the quotations

Camlp4′s Ast is slightly better than Parsetree, since it does not
introduce records to increase the complexity.

However, Camlp4′s Ast can not express the absolute path which
results in a semantics imprecise, another serious implementation
defect is that it tries to encode the anti-quote using both two
techniques: either explicit Ant tag or via string mangling, prefix
the string with \\$:, and Camlp4′s tag name is totally not
meaningful.

Think a bit further , about syntactic meta-programming, what we
really care about is purely syntax, Int "3"= should not be different whether it is of type =expr or patt, if we take a
location of ast node, we should not care about whether its type is
expr or patt or str_item, right?

If we compose two ast node using semi syntax ;, we really don’t
care about whether it’s expr node or patt node

let sem a b = {| $a; $b |}

The code above should work well under already syntax categories as
long as it support `Sem tag.

Changing the underlying representation of Ast means all existing
code in Camlp4 engine can not be reused, since the quotation-kit no
longer apply in Fan, but the tough old days are already gone, Fan
already managed to provide the whole quotation kit from scratch. In
the next post we will talk about the underly Ast using polymorphic
variants in Fan, and argue why it’s the right direction.

Thanks for your reading!(btw, there’s a bug in Emacs org/blog, sorry for posting several times)