The Math Behind Online Softmax
Understanding the mathematical principles behind online softmax, an optimization technique used in Flash Attention to efficiently compute softmax in chunks.
Understanding the mathematical principles behind online softmax, an optimization technique used in Flash Attention to efficiently compute softmax in chunks.
A step-by-step tutorial to code up your own GRPO Trainer.
Figure: Do LLMs recognize Medical Definitions? There’s been a never-ending debate about whether LLMs understand the data they process, so much so that people have started to debate what understanding...
Mechanistic Interpretability: What’s superposition? Mechanistic interpretability is an emerging area of research in AI focused on understanding the inner workings of neural networks. LLMs and Diffusion models have taken the...