Doğukan Tuna profile photo

dtthinky!

📄

Hello, I usually go by the "dt.thinky!" alias, though my name is .

Currently, I'm working on megakernels, optimizers, and kernel optimizations.

You may enjoy reading some of the articles I've written.

Check The Connectionism Codex Substack

📬
📝

My notes, experiments and things I find worth sharing:

🧠

Neural Networks & New Kinds!

Compression is how I think about learning. The tighter a model can compress its inputs, the more structure it has actually found. Kolmogorov complexity makes this precise — it measures the length of the shortest program that produces a given output, which turns out to be the theoretical floor for any compressor.

The ultimate compressor

K(X) = length of the shortest program that outputs X

For any computable compressor C and all strings X:

K(X) ≤ |C(X)| + K(C) + O(1)

via the simulation argument — run C inside a universal machine

The catch

K(X) is uncomputable — you can never know the true shortest program.

But a deep network is a finite parallel computer that approximates it with bounded resources.

MAGICAL!

Why neural nets are compressors

Neural nets can simulate arbitrary programs

They are small computers — circuits wired by data

SGD searches over the space of programs they can express

Micro-Kolmogorov complexity

Fix an architecture, then fit a network with SGD — the bit-length of the resulting weights is a practical proxy for description length:

micro-K(f) ≈ bit-length of weights in a fixed architecture

minf ∈ F [ loss(f) + λ · micro-K(f) ]

Shorter description length → better generalization.

I'm deeply invested in methods that make learning systems compress harder and generalize further.