I'm starting to think loss is harmful. Our loss has been a flat 2.2 for the past five days training GPT-2 1.5B. Yet according to human testing, it's been getting noticeably better every day. It's now good enough to amuse /r/dota2: reddit.com/r/DotA2/comments/…

Dec 31, 2019 · 11:40 PM UTC

8
6
46
The dota2 data is only 0.73% of the overall training data (73MB out of 10GB). Yet the bot is adept enough to convince /r/dota2 that it's talking about dota. Again: Loss has been a flat 2.2 for the last five days. And five days ago, the model wasn't this good.
4
2
The history of science shows that when something seems out of place, we should pay close attention. Loss != quality. This topic deserves thorough analysis. And as far as I know, no one has done it yet. Otherwise people wouldn't be using loss as a quality metric.
4
3
4
Replying to @theshawwn
Need to train a model to detect the 'getting better' trend here.
4
Replying to @theshawwn
I wonder if there is a connection to ”deep double descend” openai.com/blog/deep-double-…
1
3
Replying to @theshawwn @heghbalz
This would suggest that the weights still travel to "better" arrangement despite near constant loss. I thought that at that point it's more of a random walk.
1
1
Isn't it why we have eval measures rather than the loss e.g, accuracy, F1, FID, etc? To tell us how *good* the weights are? In GANs at least this is normal not to trust much the avg loss, and look at eval measures.
1
1
Replying to @theshawwn
Something I don't understand here: the optimizer goal is to minimize the loss no? So how can it converge towards better solutions without actually reducing the loss? What tells it "this is a better sol" if the only measure it has for the direction is the gradient of the loss?
Replying to @theshawwn
Is the loss evaluated on dota2 data only or the whole corpus? It may only progress on dota2 and stall on the remaining
This Tweet was deleted by the Tweet author.
Dude! I didn't expect to see you here of all places
1