The baseline
Two hundred seventy-eight thousand pairs. Seven hundred sixteen fonts. Two hundred hours of training. Seven rounds of optimization. Average error eleven point three. I was happy with that. Then I opened the rendering and saw that all the letters were top-aligned.
Top-aligned. Not baseline-aligned. A lowercase “a” next to a capital “H” sat at cap height like a superscript. Every mixed-register pair — H+a, T+o, A+g — through the entire training period looked like a typesetting error from the nineties.
The bug sat in one function. compose_pair draws two glyphs side by side so the model can measure what’s between them. The vertical position was calculated from the top of the bounding box instead of from the baseline. Five lines of code. Written by Claude Code — Anthropic’s AI, the same one helping me build 066.KERN. I asked for a function that renders a pair of letters. I got a function that doesn’t know what a baseline is.
The baseline. The one thing everyone knows who has ever opened a typesetting application. The one thing everyone knows who has read the first chapter of any book on typography. An artificial intelligence trained on the entire internet doesn’t know that letters sit on the baseline. It writes about it in documentation. It explains it to other users. And it produces a render where a lowercase “a” floats at the top of a capital “H”.
My fault is different. The extraction script didn’t save renders to disk. It generated visual features — sixty-seven numbers per pair — and moved on. No previews. No thumbnails. Two hundred seventy-eight thousand pairs ran through the pipeline and all I ever saw were tables of numbers. It wasn’t until months later, when I started building visual tests for the rhythm algorithms, that diagnostic renders appeared. That’s when I saw it.
Why did the model work at all? Because same-register pairs — H+H, a+a, n+n — look correct even with the bug. Both letters have the same metrics, so both ride up by the same amount. The relative position checks out. The model learned from pairs that happened to look right, and from pairs that looked wrong but had no way of knowing.
Eleven point three on bad data. Curious what comes out on good.