Neural-Net Intuition, LLMs & AI Capstone
Watch: Attention Weights
You just implemented scaled dot-product attention. Here is what those matrices actually do. Attention lets each word look at every word in the sentence (including itself), score how much each one matters, and rebuild itself as a weighted blend of them.
Press play. Watch the word sat attend across the cat sat: the raw scores become softmax weights that sum to 1, the thickest line points to the word that matters most, and sat absorbs its context, mostly from cat.
Lesson complete. Nice work.
Code · runs in your browser
Output
This lesson is locked
Lessons open one at a time. Finish the previous lesson to unlock this one.