AI 6
- Gated Delta Net Attention: A Deep Dive into the Linear Attention Mechanism Powering Qwen3.5
- Speculative Decoding Tutorial
- The Math Behind Online Softmax
- The One Big Beautiful Blog on Group Relative Policy Optimization (GRPO)
- Do LLMs recognize Medical Definitions?
- Mechanistic Interpretability: What's Superposition?