A week of links

Author

Jason Collins

Published

September 5, 2025

This week, language, language models and replication:

How To Become A Mechanistic Interpretability Researcher: So much great material in here, even if you’re just interested in getting across LLM foundations.
From that list, ARENA’s AI Safety course is fantastic - again, even if you are just interested in LLM foundations.
Do Machine Learning Models Memorize or Generalize?: A great explainer on grokking.
After hearing it mentioned on Dwarkesh’s podcast episode with Sholto Douglas and Trenton Bricken, I’ve been reading The Symbolic Species by Terrence Deacon. I’ve learnt a lot about language, although I must admit that my eyes glaze over (as always) during the extended discussion of brain parts. Definitely worth the read (and listen to the podcast episode too).
On the process and value of direct close replications: A rejoinder to Shafir and Cheek’s (2024) commentary on Chandrashekar et al. (2021): So often failures to replicate experimental results are met with a load of waffle about context, precise experimental conduct and the like. Shafir and Cheek provided one such example. Chandrashekar and Gilad Feldman provide a fantastic response.