Neural-Net Intuition, LLMs & AI Capstone

Watch: Attention Weights

You just implemented scaled dot-product attention. Here is what those matrices actually do. Attention lets each word look at every word in the sentence (including itself), score how much each one matters, and rebuild itself as a weighted blend of them.

Press play. Watch the word sat attend across the cat sat: the raw scores become softmax weights that sum to 1, the thickest line points to the word that matters most, and sat absorbs its context, mostly from cat.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

Watch: Attention Weights

This lesson is locked

Best on a laptop