Blog
Just things I like to talk about and wanted to share.
-
Why should BERT trust its own confidence scores to decide when to stop thinking?
-
Why does BERT need twelve layers to classify “I love this movie” as positive?
-
Why are we burning GPU hours to answer “2 + 2 = 4”?
Jules Belveze