meain/blog

Mar 15, 2023 . 2 min

Splitting and joining using tree-sitter

EDIT: I've since updated the code to support leading commas. You can find that version here

Another day, another blog on tree-sitter. Not sure if you have read any of my previous blogs on tree-sitter, but I have a few of them.

First, let me show you what this one is about. Here is a recording of the code in action.

You can use this to split/join arguments in a function definition or call into multiple lines or split/join items in a json doc.

You might think it is a simple job that a "split by comma" can do, but you would be wrong. What if you have a , in a string argument or if one of the arguments is a function that has arguments inside it or a list within a list. What if the user just puts a lot of empty newlines in between? The package function deals with all of this by letting things built by smarter people (tree-sitter) deal with it. We just ask it to get us the list of arguments which we arrange how it supposed to be.

You can find the code for this in my dotfiles. Once you have this file in your Emacs, just make sure to load it and then you can call tree-surgeon-split-join and it will alternate between single line and multiline. As of now it only support go, rust and json, but that is because that is what I need. Mostly just go as the formatter does not split long lines. That said, it should be easy (just a line) to add support for more languages or more things that you can split and join. You can add them in tree-surgeon-split-join-settings if you understand what is going on, or you can reach out to me if you need any help.

Let me explain how it works so that you can edit this for your use case. We'll walk through an example of go buffer. In case of go, the config is ((argument_list parameter_list) . t))) which means we can split and join argument_list and parameter_list. The t indicates that it requires a trailing comma. First we look for all the nodes that we are in with these as the type (ref). Once we have that, we find the one that we are closest to and find the direct children of it. Once we have that, text is extracted from these and used as the basis for splitting or joining them (ref). This gets used in the body of the let expressions. When joining, we can create a single line with all the items joined and when splitting we create items line by line adding proper indentation. And that is pretty much it.


I'm planning to move this to a separate plugin, and thus the name tree-surgeon, but that will need more time plus I want to make sure I figure out the scope of the plugin. The idea is that it will be a generic set of tree-sitter based buffer operations that you can do. I have a few other in my config that I would can extract out as well already.

← Home