Home Blog Miscellaneous
Swimming Around the World Horse Riding

Miscellaneous

Notes, transcripts, and other writing that doesn’t fit elsewhere.

The Information Bottleneck: Naomi on Understanding Training

A Kempner Fellow at Harvard and incoming BU professor on why training dynamics matter more than final weights. Topics include grokking and phase transitions, what the loss curve hides, data quality, multilingual interlingua, sparse autoencoders and interpretability, language as the main modality, and why non-determinism across training runs is harder to deal with than it looks.

The Information Bottleneck: Stefano Ermon on Diffusion Language Models

Stanford professor and Inception AI CEO Stefano Ermon on why diffusion models are better suited for language generation than autoregressive transformers. Topics include the history of score-based diffusion, discrete diffusion theory, Mercury 2's inference speed, hardware constraints, and what the architecture means for AGI timelines.

Brainstorm Session: Generalizing Backprop

A lab discussion led by Yoshua Bengio on whether backpropagation can be extended beyond smooth, differentiable computations. Topics include why REINFORCE does not scale well, the memory problem for biological credit assignment, intermediate rewards and learned credit machinery, discrete actions and entropy, and why deep network loss surfaces tend to be smoother than expected.

World Modeling Workshop 2026: Day 1

Notes from six talks at Mila on world models, covering work from Schmidhuber's 1990 model to LeCun's JEPA to robots that learn from video. There was broad agreement that the field is at an important moment, but not much agreement beyond that.

World Modeling Workshop 2026: Panel Discussion

LeCun, Bengio, Shirley Ho, Shuran Song, and Alessandro Lazaric in conversation at the World Modeling Workshop 2026, moderated by Randall Balestriero. Topics include input-space vs. latent-space modeling, where supervision for hierarchical representations comes from, error accumulation in neural simulators, causality and agency, and when world models are more useful than policies.