Neural-Net Intuition, LLMs & AI Capstone

Backprop & Gradient Checking

You trained a toy model by nudging weights downhill. But where do the gradients come from, and how do you know they are right? This lesson answers both, and the second answer, gradient checking, is the single habit that separates working neural nets from silently broken ones.

Backprop is just the chain rule. Take one example through a tiny model: predict z = w * x, then measure squared error L = (z - y)**2. To improve w you need dL/dw. Compute it in two hops and multiply:

dL/dz = 2 * (z - y)        # how loss moves with the prediction
dz/dw = x                  # how the prediction moves with w
dL/dw = dL/dz * dz/dw = 2 * (z - y) * x

That product IS backpropagation, in miniature: each layer multiplies the gradient flowing back by its own local derivative. Stack a million of these and you have a deep network.

Now, is your formula correct? Verify it numerically. The definition of a derivative is a tiny nudge, so estimate it with a central finite difference and check the analytic gradient agrees:

numeric = (L(w + h) - L(w - h)) / (2 * h)   # h tiny, e.g. 1e-5
assert abs(analytic - numeric) < 1e-6   # they should agree to many digits

If they disagree, your backward pass has a bug, every serious DL codebase gradient-checks for exactly this reason.

Build two functions. loss(w, x, y) returns (w*x - y)**2. grad(w, x, y) returns the analytic gradient dL/dw = 2*(w*x - y)*x. The hidden tests will gradient-check your grad against a finite difference of your own loss across many random points, so a wrong formula cannot pass. Press Run to see analytic vs numeric line up.

Your turn

Write loss(w, x, y) returning the squared error (w*x - y)**2, and grad(w, x, y) returning its analytic gradient with respect to w, which by the chain rule is 2*(w*x - y)*x. Your grad will be gradient-checked against a central finite difference of your own loss at many random points, so it must be the true derivative, not an approximation or a constant.

Spotted a problem in this lesson? Report it

Code · runs in your browser

Output

Back Next lesson

Backprop & Gradient Checking

This lesson is locked

Best on a laptop