
dtthinky!
Hello, I usually go by the "dt.thinky!" alias, though my name is . Currently, I'm working on high-compute RL (reinforcement learning) research focused on superlearners relying on experience streaming that grows at runtime. Focused on continual model-based RL, experience replay and streaming, general digital/physical environments. You may enjoy reading some of the articles I've written.
My notes, experiments and things I find worth sharing:
On this website:
Build!
Experiental AI system that grows a superlearner from experience at runtime
Research!
A running log of posts, experiments, and longer notes from the work.
Verified Replay Distillation (VRD) recipe for continual learning in verifiable domains
autoresearch-mamba: Karpathy-Style Autoresearch for Mamba-2, Mamba-3, and Hybrid Mamba-Transformer MoE
Mem-RLM — Memory-Augmented Inference for Recursive Language Models
On Compression, Computation and the Space Between
Defeating Nondeterminism in LLM Inference: Reproducing Batch-Invariant Ops (RMSNorm & Tiled Matrix Multiplication) in JAX
Streaming deepagents and task delegation with real-time output
Energetics of Allosteric Communication in Ubiquitin Revealed by Hybrid MCTS-Langevin Simulations
Neural Networks & New Kinds!
I like compression as a lens on learning systems, representation learning, and generalization. Kolmogorov complexity gives a clean way to think about how much structure a model can capture and how we might reason about concise descriptions of data.
Kolmogorov complexity as the ultimate compressor:
K(X) = len of shortest program that outputs X
If C is a computable compressor, then;
for all X
K(X) ≤ l(C(X)) + K(C) + O(1)
proof: the simulation argument
Computing K(X)
* Undecidable / not computable
* A deep NN/transformer is a parallel computer with finite resources
MAGICAL!
* NNs can simulate and program!
↓
They are little computers
↓
They're circuits, circuits are computers, computing machines
↓
SGD searching over program space!
micro micro Kolmogorov complexity
fitting a NN
with SGD we can compute our miniature Kolmogorov compressor
micro-K(f) ≈ bit-len of weights within a fixed architecture
minf ∈ F [ loss(f) + λ · micro-K(f) ]
lower desc len => better generalization
I'm extremely bullish on methods that make learning systems more efficient and more useful.