GN Beats Gzip and Brotli: How a Learning Sliding Window Outperforms Static Compressors
When we published our last article, GN was within 10% of gzip on LLM conversation data. We said the remaining gap was in the entropy backend. We were wrong about the solution — but right about the ...

Source: DEV Community
When we published our last article, GN was within 10% of gzip on LLM conversation data. We said the remaining gap was in the entropy backend. We were wrong about the solution — but right about the problem. This week GN beats gzip on every corpus we tested. And on all three corpora, it beats brotli. Here is what we learned. The ANS Dead End Our first instinct was to improve the entropy coder. Gzip uses Huffman coding. zstd uses ANS (Asymmetric Numeral Systems). We implemented byte-renorm ANS, bit-renorm ANS, and Order-1 ANS from scratch in Rust. Results on ShareGPT: Codec Ratio gzip-6 2.082x byte-ANS 1.233x bit-ANS 1.212x O1-ANS 0.551x ANS without an LZ-style preprocessing pass is worse than gzip. Every time. The reason is fundamental: entropy coders compress symbol frequency distributions. But gzip's real advantage comes from LZ77 — the sliding window that eliminates repeated byte sequences before entropy coding runs. ANS cannot fix what LZ77 needs to do first. We kept ANS in the codeb