Efficient Attention Mechanisms for Long-Context Transformers
Research Paper • 2024
Standard self-attention in transformers scales quadratically with sequence length, limiting their application to long documents. We survey recent approaches to achieve linear or ne
Deep LearningTransformersResearch
Read Full Paper →