Deep Learning with Yacine

Deep Learning with Yacine

Share this post

Deep Learning with Yacine
Deep Learning with Yacine
How Minimax-01 Achieves 1M Token Context Length with Linear Attention (MIT)
Copy link
Facebook
Email
Notes
More

How Minimax-01 Achieves 1M Token Context…

Yacine Mahdid
Apr 1

Share this post

Deep Learning with Yacine
Deep Learning with Yacine
How Minimax-01 Achieves 1M Token Context Length with Linear Attention (MIT)
Copy link
Facebook
Email
Notes
More

I've dig into the internal of an MIT licensed MoE system that makes use of Linear Attention (Lightning Attention) to extend it's context length to 1M input tokens.

Read →
Comments
User's avatar
© 2025 Yacine Mahdid
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More