Dissecting the Transformer
Dataiku
JULY 20, 2020
We saw how attention works and how it improved neural machine translation systems (see the previous blog post ), we are going to unveil the secrets behind the power of the most famous NLP models nowadays (a.k.a BERT and friends), the Transformer. In this second part, we are going to dive into the details of this architecture with the aim of getting a solid understanding of how its different components operate and interact.
Let's personalize your content