Collections ToolkitAttention Is All You Need

PaperAdvancedTechnical

Attention Is All You Need

Vaswani A, Shazeer N, Parmar N, et al.

What to Read Next

Scaling Laws for Neural Language Models

The paper that proved 'bigger is better' - showing that model performance improves predictably with more data and compute.

A Very Gentle Introduction to Large Language Models without the Hype

Clear, accessible overview of what LLMs actually are and how they work — no jargon.

ToolIntermediate

Hugging Face

The hub for open-source AI models, datasets, and demos.

Attention Is All You Need