ASTs in Go - Part 1

"What's an AST", you ask? It's an Abstract Syntax Tree - also called just Syntax Tree.

What? That's not enough of an explanation? Oh, ok, I guess I'll back up a bit 😅

I'm not overly academic, so I'm going to come at this from a slightly different perspective than the wikipedia definition (though that definition is quite good and worth a read).

Do you remember diagraming sentences in elementary school? (If my bringing that up caused you pain, I'm sorry - I hated this in school  and it seemed pointless... until now I guess). This is kind of what ASTs remind me of. It's kind of a shift in "normal programming" think where the focus is in the meaning of the keywords and logic being used at runtime, the inputs/outcomes of functions, etc... What's the focus with ASTs is the structure more so than the meaning.

For example, say I have a struct type defined like this:

type Thing struct {
	FirstName string
	LastName  string

The AST representation of that would look something like this:

AST for type Thing

Now that we have that out of the way, I'd like to go over these introductory topics in this part-one-of-who-knows post:

  • Why use an AST?
  • What can we do with AST?
  • Some examples using ASTs

Why use an AST?

ASTs are useful when you want to work through the structure of a type/file/package but you don't actually need the types and such to be accessible. In our example above, a program parsing the Thing struct into an AST would not have access to the Thing type (to create new instances, etc...) but would only have the AST. So make sure not to conflate this with reflection (which I'm a huge fan of using). They are two different things, although there are things that you can do with both such as parsing out field names for a struct.

While this may sound limiting, there are actually some benefits. One benefit is SPEED - parsing an AST is extremely fast, whereas compilation is less so (though still hella fast in Go 😉) as compilation is generally a step that happens after parsing as AST.

This lack of compilation is, in itself, another benefit in my opinion. This allows us some flexibility and efficiency. If I want to parse a file and only care about information contained within that file, I don't need to parse all other packages that file may import! We can prove this out by embedding a bogus type in our Thing struct, and it would still successfully parse into an AST:

type Thing struct {
    FirstName string
    LastName string

The parser (conveniently accessible via the parser package) will just assume that this Banana type is defined elsewhere and will not error out when generating an AST. If your program that is processing the AST then attempts to find where this Banana thing is defined and what its tree is, that's when you'd hit a problem!

Another benefit of ASTs is that we can do interesting things based on the information contained within. So we can parse tags, extract field names, potentially have programs that write code for us, etc... Your imagination is honestly the limit! So this leads us to:

What can we do with AST?

We've kind of already covered what we cannot do with ASTs given that we don't have runtime-level access to instances and such, so what can we do?

  • We can look for specific formatting & naming (e.g. for linting) - as we've said, we don't have to compile and then reflect over all the types when we can just tell the parser package to parse a target package to give us an AST to process!
  • We can auto-generate code - e.g. let's say you have an interface and you want to auto-generate a stub implementation of said interface.
  • ASTs can be used to determine what modules/packages are actually used in order to "tree shake" the final bundle or executable to make it as lean as possible.
  • Let's say you wanted to build a dependency graph showing what is imported by what - parsing ASTs would be a solid approach.
  • Parse comments (could be for linting of comment structure, could be for documentation purposes, etc...)
  • Auto-transform a go structure to another data type (e.g. maybe auto-generate a base YAML configuration file from a struct definition)
  • Code augmentation, such as automatically reformatting or adding header/footer comments. In this scenario, you would actually change the tree instead of just reading it!
  • Likely lots more I've not even thought of!

Can we see some examples?

Sure! Before I continue on with the next parts of this series and walk through how we can go about processing ASTs in order to accomplish some task, let's look at some existing resources using this approach!

  • gomock is a CLI tool that allows you to auto-generate mocks (with a controller mechanism for setting up expectations for calls, evaluating args, providing return values, etc...) for testing. I love this package and find it inspiring!
  • swaggo is a CLI tool that auto-generates swagger documentation based on the comments on your API endpoint handlers. I use it, and like it quite a bit. If you check out their parser-related files, you'll see they are using AST parsing to capture the information needed to auto-generate the documentation - pretty cool!
  • The Go Playground parses the code you submit from the web interface to perform validation and setup based on the AST data! Search for parser. in  to see the places it's used.
  • Many, many more! Your JS bundlers are using ASTs. Many of your linters are using ASTs. Your compilers are for sure using ASTs, but most of us aren't writing compilers on the reg so I've held off on mentioning that until last.

Conclusion (of part 1)

There are so many possibilities with learning to work with ASTs, and I'm only just beginning to experiment and learn in this arena! It's pretty exciting - and I have a specific use-case I've been working through and am using ASTs to solve the need. I will be covering more specifics of this use-case and showing AST parsing/traversal in the coming posts as we continue this series - please stick around for more!

- Bradley