

Dogukan Tuna - Personal Website
Hello, I usually go by the "dt.thinky!" alias, though my name is Doğukan.
I'm a research engineer working on frontier AI algorithms and agent systems for the physical sciences and the physical world. My research spans RL algorithms for LLMs, RL training infrastructure, GPU kernels, and multi-agent infrastructure. Currently, I'm a founding AI engineer at Manuel AI, where I work on AI for HVAC and energy domains, and at Ultraresearch, where I work on chip design, semiconductors, lithography, and energy systems.
You may enjoy reading some of the articles I've written.
My notes, experiments and things I find worth sharing:
On this website:
Research!
View allA running log of posts, experiments, and longer notes from the work.
Verified Replay Distillation (VRD) recipe for continual learning in verifiable domains
autoresearch-mamba: Karpathy-Style Autoresearch for Mamba-2, Mamba-3, and Hybrid Mamba-Transformer MoE
Mem-RLM — Memory-Augmented Inference for Recursive Language Models
On Compression, Computation and the Space Between
Defeating Nondeterminism in LLM Inference: Reproducing Batch-Invariant Ops (RMSNorm & Tiled Matrix Multiplication) in JAX
Streaming deepagents and task delegation with real-time output
Energetics of Allosteric Communication in Ubiquitin Revealed by Hybrid MCTS-Langevin Simulations
Neural Networks & New Kinds!
Compression is how I think about learning. The tighter a model can compress its inputs, the more structure it has actually found. Kolmogorov complexity makes this precise — it measures the length of the shortest program that produces a given output, which turns out to be the theoretical floor for any compressor.
The ultimate compressor
K(X) = length of the shortest program that outputs X
For any computable compressor C and all strings X:
K(X) ≤ |C(X)| + K(C) + O(1)
via the simulation argument — run C inside a universal machine
The catch
K(X) is uncomputable — you can never know the true shortest program.
But a deep network is a finite parallel computer that approximates it with bounded resources.
MAGICAL!
Why neural nets are compressors
Neural nets can simulate arbitrary programs
↓
They are small computers — circuits wired by data
↓
SGD searches over the space of programs they can express
Micro-Kolmogorov complexity
Fix an architecture, then fit a network with SGD — the bit-length of the resulting weights is a practical proxy for description length:
micro-K(f) ≈ bit-length of weights in a fixed architecture
minf ∈ F [ loss(f) + λ · micro-K(f) ]
Shorter description length → better generalization.
I'm deeply invested in methods that make learning systems compress harder and generalize further.





