Blog
Just things I like to talk about and wanted to share.
-
Why does BERT need twelve layers to classify “I love this movie” as positive?
-
Why are we burning GPU hours to answer “2 + 2 = 4”?
Just things I like to talk about and wanted to share.
Why does BERT need twelve layers to classify “I love this movie” as positive?
Why are we burning GPU hours to answer “2 + 2 = 4”?