KeyNote Talk
in
Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants
Deploying efficient translation at every level of the stack
Kenneth Heafield
Abstract:
Practical efficient neural networks combine several optimizations ranging from assembly code to network structure. Yet most papers about optimization start with an unoptimized baseline, omitting comparison even with simple methods like using a smaller network. Shared tasks force a different mentality, where each idea has to prove its worth against a highly optimized baseline. This informs our work on fast and small machine translation with latency under 20 ms for an average sentence. The models are now deployed in Firefox.
Chat is not available.