Grokking

Date: 15/01/2025

Briefly, this post is about grokking and how is prolly double descent or perhaps not. Most of the notes is borrowed from this paper.

Elevator pitch version

Grokking (Nope it has nothing to do with X’s grok) is the phenomenon where the validation accuracy increases much later in the training step as compared to train accuracy. Kinda sounds similar to double descent - where the test accuracy initially improves and then worsens (this is where practitioners initially said “A’ight man, this is where we stop”) and then MIRACULOUSLY it increases again.

The Paper

The authors of the paper (henceforth mentioned as dem authors) piggy-back on the viewpoint that all that a neural network learns are just patterns - “pattern learning”. With this viewpoint, the author puts forth two claims:

…. TODO


References