I have been working on tokf for a bit, a tool for compressing the results of CLI command execution for use by AI models (Claude Code, Gemini CLI etc.). There has been a bit of feedback and I have tried to work on it promptly.

Recently I wanted to look at the feedback from orgoj on Github. One request was adding the ability for tokf to interact with a permission resolver like Dippy. Straightforward enough — delegate the decision to Dippy, pass along the original request, done.

The other request was much more interesting: don’t use hand-rolled bash parsing, use something tried and tested. Use an AST parser so we can be sure about correctness. This matters because we are talking about security — if someone is able to inject calls through malformed input, those strings end up in LLMs, and the damage from a prompt injection could be wide. It’s why tokf already has a certain level of filter inspection as a precaution.

tree-sitter-bash - first attempt

My first implementation was to pull in tree-sitter and use its bash grammar. It’s trusted by major players like Github, so I assumed it was correct. The implementation went smoothly and seemed to work well.

When I shared the changeset with orgoj, his feedback surprised me: tree-sitter-bash isn’t necessarily that accurate. I hadn’t tested it rigorously. He recommended Parable, a more precise AST parser by ldayton, who makes a great case for why existing solutions fall short — many open bugs across many of the tools.

The problem with using Parable directly is that it’s a Python library. I can’t expect users to pull in Python when installing a Rust binary. So I decided to try a different approach: have Claude Code use the test cases from Parable and build a new parser from scratch.

Claude Code on autopilot - building rable

I wanted to try and do it as hands-free as possible. Within 13 hours of project start, 93.6% of the test cases were passing. By hour 17, we were at 100% compatibility with the Parable test suite. I prompted it to refactor in between.

The result was functionally similar to Parable but architecturally messy. The distribution of concerns was wrong — parsing logic had leaked into the S-expression output module to compensate for data the lexer wasn’t extracting properly.

Next I compared rable’s output against bash-oracle, a tool that uses bash itself as the reference parser. I gathered new test cases from the disagreements — cases where rable and bash-oracle differed and bash-oracle didn’t produce an error. This uncovered more issues. Some were fixable with the existing code, but complexity was starting to pile up.

At that point I stepped back and tried integrating the parser into tokf to see what was actually usable. There were a couple of false starts. The AST wasn’t giving consumers what they needed — you had to manually calculate code positions, or re-parse sections to extract the right data. I fed this back into the rable session and got an improved AST. That’s when I started seeing the real problem clearly: the lexer wasn’t doing enough, and the S-expression layer was doing too much.

From there I started directing Claude towards cleaning up the architecture — moving decomposition logic into the lexer, keeping the S-expression module focused on output only. The complexity washed away and a much cleaner design emerged. The errors melted away with it. By session 3, rable was at 99.9% compliance with all test cases.

That remaining 0.1%? A single cosmetic difference in command substitution formatting — bash’s internal parser inserts a space before ) in $(cmd <<heredoc &\n...\n ) that rable omits. The two forms are semantically identical, so I’m comfortable calling it done.

Integrating rable in tokf

After all the back and forth, I got rable integrated into tokf. This meant removing the dependency on tree-sitter-bash, which in turn removed several transitive dependencies. tokf now handles full AST parsing with a single focused dependency. It’s fast, reliable, and well tested.

Rable vs Tree-sitter-bash

To see how the two parsers actually compare, I ran both against 1,783 test cases from the Parable test corpus and the bash-oracle fuzzing suite — covering words, redirects, here-docs, arithmetic, arrays, extglob, and dozens of edge cases.

MetricRableTree-sitter-bash
Tests passed1,7821,673
Accuracy99.9%93.8%
Unique passes1090

Note on methodology: Rable’s bar is higher — a test passes only if the output is a byte-for-byte S-expression match with the reference parser. Tree-sitter’s bar is just “parsed without ERROR or MISSING nodes.” Even with that easier standard, rable wins by 6 points.

Where tree-sitter struggles

CategoryTS failures
Word boundaries16
Arithmetic11
Here-documents9
Extglob7
Redirect formatting6
Heredoc formatting6
Variable fd redirects5
Process substitution5
Other43

Tree-sitter-bash produces ERROR nodes on inputs that are valid bash — particularly around arithmetic expressions, here-documents, extended globbing, and edge cases in word splitting. Rable handles all of these correctly.

Comparing Parable to rable

To close the circle on Parable, I also exposed Python bindings for rable and ran a benchmark. rable is 8.1x faster, but that’s a moot point — the languages are very different and speed was never the goal. More importantly, rable only exists because Parable existed first. Great work = great inspiration.

What next

I’m looking at building something similar to Dippy but in Rust, with tighter integration into tokf and possibly other tooling around hooks. Let’s see what tomorrow brings.