A new paper from researchers at Microsoft proposes a novel neural network architecture called Retentive Networks (RetNets) that could supersede Transformers as the go-to model for large language models.
[Article] Retentive Networks: The Next…
A new paper from researchers at Microsoft proposes a novel neural network architecture called Retentive Networks (RetNets) that could supersede Transformers as the go-to model for large language models.