One formula turns a blob of random numbers into a point shaped like a real embedding:
It looks opaque, so let's watch it happen — one piece at a time, in 2D. The maths is the same in 1024 dimensions; the picture is just flatter. We'll build up to what the actual layers of a real model look like.
Embeddings are compared by cosine — the angle between two vectors, not their length. So length is throwaway, and every vector is really just its tip on the unit circle (a sphere, in 1024-D). That last step, normalize, is just "put the tip on the circle." Learn it first; then it's not a mystery later.
You just saw where points end up (the circle). Now watch one random point z get there — step through the three operations.
One point followed the chain. A whole cloud of noise follows the same chain — so the round blob becomes a tilted, off-center ellipse on the circle. Drag the knobs above and watch this react.
If three knobs fully describe a blob, you should be able to recreate one by hand. Here's a hidden target (gray). Tune the knobs until your green blob sits on top of it.
You matched a single tidy blob. But real data curves, and one ellipse can't bend. The fix: chop the cloud into K patches and give each its own √Λ, V, m — the same formula, once per patch. That's "local PCA".
That's the whole story. √Λ stretch, V rotate, +m shift a ball of noise, normalize onto the sphere — that's one blob. Real embeddings are many such blobs tiling a curved surface. And the model's own layers are just different settings of the same knobs: the word layer (WE) shrinks every patch to a single dot (a lookup table — each word is one fixed vector), while the final layer (L24) fattens them into spread, overlapping clouds (context has smeared each word out).