LLM coding tools eat tokens, and a lot of those tokens are noise: git output, build logs, test runner chatter. Your actual code is a fraction of what gets sent.

I have been mulling about ways to lower the context use of different CLI tools. I started approaching the problem specifically. How to improve the token density of code coverage information - jacoco-coverage-inspector was one first attempt.

Then I stumbled across RTK. RTK is a CLI proxy that reduces the amount of tokens which will be propagated to an LLM coding tool during use (it uses Hooks to transparently rewrite the command). The idea is great, the implementation is well done.

I looked at the code and appreciated that it is in Rust and quite well written. But it created an itch, one that I just needed to scratch, it was all written in Rust, including the whole parsing approach, it meant you would have to rebuild it every time you add a filter. I started thinking about what the process entails. How is the filter functioning?

It is a pipeline, from A to B, with selectors, ways for breaking inputs down and accumulators + deduplication. We can express that with configuration, and worst comes to worst, we can always leverage some Lua to script ourselves through the hardest parts.

From Itch to Tool

I do a lot of vibe coding these days, not ashamed of it. Great planning and good ideation go a long way, good engineering does too - get it done the first time, make sure it keeps working - that way you can keep iterating quickly and value can accumulate.

I started by building out a backlog of Github issues, these included the first ideas on how it should work. The meat was designed in collaboration with Claude, using RTK as an inspiration, but providing my own vision as to how the implementation should go, learning about the needs from the existing codebase.

The first issue was setting up the repository, tokf - including all of the best practices from the start.

Then it started getting more fun, but not for me, for Claude Code (CC from here on out) - I had it work through the issues one by one, building out the codebase, preparing the parts, reviewing its code (by running multiple agents and getting them to review the code, and then having CC fix the issues it found).

How tokF Works

The base idea of tokF is configuration, everything lives in TOML files, structured, defining the pipeline that is used and working with the state machine that exists inside of tokF.

A simple example that only rewrites the command without filtering output, for git show:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
command = "git show"

# Override: use --stat for compact summary; full diff can be thousands of lines
run = "git show --stat {args}"

[on_success]
output = "{output}"

[on_failure]
tail = 5

Without --stat, git show dumps the full diff — potentially thousands of lines. This single override keeps the output compact.

When it comes to something with actual meat, here is the one for git push:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# git-push.toml — Trivial filter (Level 1)
# Raw: 15 lines of "Enumerating objects...", "Counting objects...", etc.
# Filtered: "ok ✓ main" (1 line)

command = "git push"

# Check full output for special cases first
match_output = [
  { contains = "Everything up-to-date", output = "ok (up-to-date)" },
  { contains = "rejected", output = "✗ push rejected (try pulling first)" },
]

[on_success]
skip = [
  "^Enumerating objects:",
  "^Counting objects:",
  "^Delta compression",
  "^Compressing objects:",
  "^Writing objects:",
  "^Total \\d+",
  "^remote:",
  "^To ",
]

extract = { pattern = '(\S+)\s*->\s*(\S+)', output = "ok ✓ {2}" }

[on_failure]
tail = 10

This filter will end up removing a number of lines (you can see which ones in the skip section), which will reduce token usage significantly, the LLM doesn’t need to know how many objects we wrote or which remote we are pushing to.

The other notable section is match_output — pattern-matched strings like “Everything up-to-date” collapse to a single token-light response, rather than passing through raw git output.

extract will use a pattern to select a specific piece of content (here we are looking for the branch push info) and then output it as designed.

Features along the way

As I kept using the tool (saying I feels wrong - CC did), I could start seeing some issues pop up. Sometimes the output wouldn’t be correctly parsed and then it was an effort to know why. The natural response for me was building a history component, storing information about the last few commands so as to be able to debug or retrieve some of the filtered information.

The number of commands was increasing, every time a command was added, a new test case had to be created in Rust. Since tokF is all about configuration and the actual tests all seemed fairly similar, I decided to set up a small testing framework, allowing me to configure the tests in TOML just as I had the filtering pipelines.

Building the filters can be tedious, and I was never meaning to do it by hand, instead the idea was that CC should be able to do so, here I decided to build a Skill (a Claude Code concept — a reusable prompt that teaches an agent a specific workflow), explaining to CC (or another agent) how to build, validate, and set up a new filter.

Then there were some issues when I was blocking the LLM from actually doing data extraction, those who are knowledgeable in command line interactions know how you can use the | to pipe the output of one command to another, this way you can split text, select values, and then do something else like sum them. The way I was rewriting the commands ended up preventing the LLM from using that. Adding a check to ensure we didn’t do so helped.

Advertising

This was probably one of the hardest things for me, I am not a great advertiser, so finding how to do it ended up being the same story all over again, set up a website (tested Antigravity - an LLM powered IDE - for that) and you can judge the results yourself. Then finalizing the automations around it (get the changelog publishing, get the website analytics).

I started publishing posts about the tool, sharing it with people, finding a way to get the word out by adding it to awesome lists.

What I found out is that the work to get the word out is hard, it isn’t just pushing a single button, and while traction can be had, it seems to come in ebbs and flows. It has only been a couple of days, but it already feels like there have been ups and lows - I will definitely keep at it, will report on that in the future.

Conclusion

This started two days ago, the first issue was opened by Claude at 10 on the 18th of February 2026. From there Claude Code (CC) took over, setting it up, adding more features, and fixing issues.

A minute here, a minute there to get things done, when I go for coffee, when I get a drink of water, but mostly after work is done, then there is some time to work on tokF. What is the actual outcome?

My current output of tokf gain is:

1
2
3
4
5
tokf gain summary
  total runs:     3071
  input tokens:   985,737 est.
  output tokens:  98,226 est.
  tokens saved:   887,511 est. (90.0%)

This is definitely an improvement on the previous usage, 90% reduction in my token usage means less context bloat, means I get more done and less compaction, the model stays on task longer. So the results are there, but the best part is the fun along the way.

In the end, I really want to thank Patrick who built RTK in the first place, the itch I wanted to scratch was so strong that I ended up having a lot of fun on the way. The outcome of this project is still unknown, it might be good, it might burn out, but the fun that I had will be remembered.