bertsqueeze#
Project#
Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.
It gathers a non-exhaustive list of techniques such as distillation , pruning, quantization, early-exiting, … The repo is built using PyTorch Lightning and Transformers.