Attention Is All You Need

Source:

arXiv:1706.03762
on
December 6, 2017
Curated on

April 18, 2023

The Transformer is a new, simple network architecture that relies solely on attention mechanisms, dispensing with the complex recurrent or convolutional neural networks often used in encoder-decoder configurations for sequence transduction models. Experiments on two machine translation tasks reveal that the Transformer is superior in quality, more parallelizable, and requires significantly less time to train compared to traditional models. It achieves a 28.4 BLEU score on the WMT 2014 English-to-German translation task, out performing the existing best results, including ensembles, by over 2 BLEU. The Transformer also sets a new single-model state-of-the-art BLEU score of 41.8 on the WMT 2014 English-to-French translation task after training for just 3.5 days on eight GPUs, a fraction of the training costs of the best models in the literature. The Transformer's success extends to other tasks as well, such as English constituency parsing, demonstrating its potential for generalization.

Ready to Transform Your Organization?

Take the first step toward harnessing the power of AI for your organization. Get in touch with our experts, and let's embark on a transformative journey together.

Contact Us today