<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[learning with yacine ]]></title><description><![CDATA[a newsletter untangling the complexity of deep learning and the brain! 
ideal for scholars, industry practitioners or toddlers.]]></description><link>https://www.yacinemahdid.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Xq9Y!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704b7ed5-15f6-4fb8-b8f4-46d68ab893d2_200x200.png</url><title>learning with yacine </title><link>https://www.yacinemahdid.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 30 Apr 2026 11:29:15 GMT</lastBuildDate><atom:link href="https://www.yacinemahdid.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Yacine Mahdid]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[yacinemachinelearning@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[yacinemachinelearning@substack.com]]></itunes:email><itunes:name><![CDATA[Yacine Mahdid]]></itunes:name></itunes:owner><itunes:author><![CDATA[Yacine Mahdid]]></itunes:author><googleplay:owner><![CDATA[yacinemachinelearning@substack.com]]></googleplay:owner><googleplay:email><![CDATA[yacinemachinelearning@substack.com]]></googleplay:email><googleplay:author><![CDATA[Yacine Mahdid]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Self-Distillation Is Taking Over LLM Post-Training (w/ the Researchers Behind It)]]></title><description><![CDATA[very pumped for this one shout out to Jonas H&#252;botter (ETH Zurich) and Idan Shenfeld (MIT) for blessing us with their knowledge]]></description><link>https://www.yacinemahdid.com/p/why-self-distillation-is-taking-over</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/why-self-distillation-is-taking-over</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 28 Apr 2026 14:49:45 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/195756828/1ff710c2ddebb078f95b0d2b99197d13.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>I had an awesome time interviewing @IdanShenfeld  and @jonashubotter  from MIT and ETH Zurich about self-distillation.</p><p>this very promising post-training paradigm where the model acts as its own teacher by conditioning on environment feedback or demonstrations.</p><p>we cover the SDPO algo for reinforcement learning with rich feedback and SDFT for continual learning without forgetting along with many applications.</p><p>we dig into how it works, why it&#8217;s simpler and faster than GRPO, and where this is already showing up in production systems.</p><h1>table of content:</h1><p>0:00 - what is self distillation</p><p>2:50 - idan (MIT) and jonas (ETH Zurich) introduction and motivation</p><p>18:40 - different perspective of on-policy self-distillation (presentation)</p><p>36:00 - metacognition and specificity in self-distillation</p><p>37:24 - very long hard task and self-distillation</p><p>42:00 - continual learning with self-distillation (presentation)</p><p>1:16:50 - what is next in this research direction?</p><p>1:20:00 - is there any experience with subjective feedbacks?</p><p>1:22:50 - quality vs number of feedbacks?</p><p>1:26:40 - what setting would self-distillation struggle vs GRPO?</p><h1>my random thoughts on the paradigm</h1><p>I think this is it. I think this will have the same impact as CoT had when it was first introduced and I would be very surprised given it&#8217;s strong performance against SFT that it is not already weaved into the main closed source models.</p><p>It also is becoming clearer to me that the traditional very rigid boundary between pre-mid-post training are starting to collapse a bit and that the reality is more of a mix that is very dependent on the expected performance of the model.</p><p>on a more meta note, I think having the model being it&#8217;s own teacher make just so much sense. like I was looking at my kids and I realized that each one of them has this bias of looking at the next in term of age for clues about how to do things.</p><p>they have this innate fascination about HOW the one that is a bit older is doing things, not even the eldest. </p><p>and I don&#8217;t think it&#8217;s just a fun quirk of nature.</p><p>the &#8220;policy&#8221; of the next in line kid is technically very close to whatever the base kid is at the moment. yes they can learn from me or my wife, but we are so much advanced that it&#8217;s a bit hard for them to understand how we are doing things (even though the movement are more precise).</p><p>it&#8217;s much easier to copy the one that is a bit older even with a slightly flawed policy and listen to them if they give feedback because they can just understand them better.</p><p>it reminded me of the example idan gave where if a smaller model was just listening to the big teacher model in robotics they will just hit point in the data they can&#8217;t recover from because they are going to make mistakes that are just not possible for the larger model.</p><p>anyway, this paradigm has a beautiful kernel of truth in term that is fundamental in my view and is a very exciting angle to get up to speed with!</p><h1>papers </h1><p>the slides were super crisp really cool of them to share!</p><p>if you are interested in digging more into the literature here is a few papers that are worth checking out:</p><p>&#128204; Reinforcement Learning via Self-Distillation (SDPO): :<a href="https://arxiv.org/abs/2601.20802">https://arxiv.org/abs/2601.20802</a></p><p>&#128204; Self-Distillation Enables Continual Learning (SDFT): <a href="https://arxiv.org/abs/2601.19897">https://arxiv.org/abs/2601.19897</a></p><p>&#128204; Aligning Language Models from User Interactions: <a href="https://arxiv.org/abs/2603.12273">https://arxiv.org/abs/2603.12273</a></p><p>&#128204; RL&#8217;s Razor: Why Online Reinforcement Learning Forgets Less: <a href="https://arxiv.org/abs/2509.04259">https://arxiv.org/abs/2509.04259</a></p><p>&#128204; Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs: <a href="https://arxiv.org/abs/2410.08020">https://arxiv.org/abs/2410.08020</a></p><p>enjoy my guys &#127801;</p>]]></content:encoded></item><item><title><![CDATA[The Science of Learning Math (and Anything Else) with Justin Skycak]]></title><description><![CDATA[you gonna have to take a whole day off for this one guys it's 3h wide with knowledge]]></description><link>https://www.yacinemahdid.com/p/the-science-of-learning-math-and</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/the-science-of-learning-math-and</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Wed, 15 Apr 2026 13:02:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/194233213/34af818dbc63f8db7b897b0e2c71077e.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>hey folks first time posting this sort of interview as a podcast (shout out the three anon on twitter that requested it).</p><p>now that I think of it the format does make sense since me and the interviewee are mostly chatting and exchanging ideas!</p><p>will be posting more of these in the future!</p><p>in this episode I had the pleasure of spending not 1 but 2 days interviewing the learning legend <a href="https://www.justinmath.com">justin skycak</a>.</p><p>we've talked about his quite impressive self-learning journey (3000h of math in high school) all the way to how he hand curated the initial knowledge graph for math academy to make that process more efficient.</p><p>I like justin because I understand his inner desire for learning.</p><p>I had the exact same &#8220;awakening&#8221; let&#8217;s say in my tiny bedroom when I was a teenager. I realized while doing some homework that I loved learning and found an immense sense of peace within it.</p><p>this realization kind of got lost in the midst of winter depression for a while until it got awoken again with a burning flare that never died down since (<a href="https://www.yacinemahdid.com/p/how-to-stop-having-fomo-as-a-curious">more trauma dump here</a>).</p><p>this understanding that there is a meta-learning-skill is so fun because then the whole world kind of open up for you.</p><p>you encounter a hard problem you don&#8217;t know how to even get started? &#8594; great let&#8217;s use &#8220;learning&#8221;.</p><p>you got an opportunity that you are under prepared for? &#8594; great let&#8217;s use &#8220;learning&#8221;.</p><p>you want to make something seemingly impossible happen? &#8594; great let&#8217;s use &#8220;learning&#8221;.</p><p>what I like about justin&#8217;s story is that it is very concrete and touch math which has far reaching usefulness in life.</p><p>so here is the very lively 3h discussion where we touch these topics:<br>0:00:00 - intro: <br>0:02:10 - justin background <br>0:05:45 - 3000h math self study in high school<br>0:11:45 - what a day looked like for that 3000h stretch<br>0:16:10 - meta-learning vs pure math learning<br>0:21:50 - when did you get into cognitive neuro?<br>0:29:55 - how did the fundamental math helped in your research projects<br>0:43:10 - what does the math academy learning system looks like<br>0:47:34 - how did you guys build the 2000 topic knowledge graph<br>1:01:15 - would LLM be useful as an interface to that knowledge graph for the students?<br>1:10:46 - how does the FIRe spaced repetition algorithm works?<br>1:17:34 - does the same knowledge graph structure would work for physics? or other topic?:<br>1:34:05 - how do you understand the subject vs the curiculum<br>1:35:50 - is there a connection between studying math and learning a sport?<br>1:42:00 - do you think in math doing and teaching requires different skills?<br>1:56:25 - could you get understanding without automaticy?<br>2:05:35 - do you see any upside of confusion in learning?<br>2:14:11 - learning math as an adult?<br>2:19:20 - how to fill the motivation gap after learning the fundamental?<br>2:24:10 - how should teaching math for kids and adults balance fundamentals and creativity?<br>2:33:55 - is it ever too late to learn math seriously?<br>2:46:00 - mastery learning vs ultra learning<br>2:51:30 - top-down vs bottom-up<br>2:53:40 - mastery learning for domain without a structured hierarchical structure?<br>2:56:30 - neurodivergence / adhd for structured math learning?<br>3:06:20 - amateur mathematician augmented with technology will be able to contribute to research?<br>3:14:37 - what are you most excited about right now in term of learning</p><p>enjoy my guys! :)</p>]]></content:encoded></item><item><title><![CDATA[within the chrysalis of prime intellect]]></title><description><![CDATA[Meditation at the prime intellect office on the shapes of artificial intelligence to come.]]></description><link>https://www.yacinemahdid.com/p/within-the-chrysalis-of-prime-intellect</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/within-the-chrysalis-of-prime-intellect</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 24 Nov 2025 15:34:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p7us!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I was walking in the steep streets of san francisco in the quiet morning hours. actually it was 10h and I was up since 6h because of jetlag, but I didn&#8217;t want to come banging at prime intellect headquarters too early.</p><p>I was a bit surprised how calm the city was on a wednesday morning. I always pictured san francisco as being extremely agitated.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p7us!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p7us!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p7us!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p7us!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p7us!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p7us!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg" width="900" height="506" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:506,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!p7us!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 424w, https://substackcdn.com/image/fetch/$s_!p7us!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 848w, https://substackcdn.com/image/fetch/$s_!p7us!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!p7us!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d8af29f-cf85-460a-9e46-eaf379f0fad3_900x506.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>the street of the headquarters was actually even calmer, like a really nice residential area. as I walked in I was welcomed by a researcher that just recently joined the team. he told me that they were moving offices soon as the place was getting a bit cramped.</p><p>I understood what he meant after moving out of the small inner courtyard and looking at the packed computer room where a team of absolutely focused engineers and researchers stood in various configurations. I was looking at a few rows of people I can only describe as not actually being there; they were in some of the purest forms of flow there is.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V9CP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V9CP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V9CP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg" width="900" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!V9CP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V9CP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7156607e-7979-4400-8c31-e99df8cd13c2_900x460.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">heyyyy yo what&#8217;s up yall what&#8217;s popping!</figcaption></figure></div><p>now. I&#8217;m a researcher at heart. I understand this particular state where you feel like mentally crawling over a shape thinly tangible but which terrain you start to map out. there is hardly anything in life I like more than to drag my metaphysical body in these spaces.</p><p>to not disturb these fine folks in their sacred trance I made my way upstairs. I realized too late that the situation wasn&#8217;t much better. a few designers were discussing and exchanging short sentences right in between making iterations on some kind of launch video (which in my humble paleolithic opinion looked pretty lit).</p><p>I sat down quietly at the dinner table for a few minutes trying to collect my initial feelings about this team. one that I&#8217;ve been interacting with cybernetically for quite some time now.</p><p>from where I was I could see the bay. a large boat was dragging its immense hull through the water. the contrast between this large expense and this crowded monastery made me feel that I was in a cocoon, a chrysalis of sorts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1G6V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1G6V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1G6V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg" width="900" height="506" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:506,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!1G6V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1G6V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d5a91e-82a9-48e8-ac3f-422120ce77fb_900x506.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">uh guys where&#8217;s the restroom at</figcaption></figure></div><p>as I was meditating about what the hell that even meant, a meeting suddenly started led by <a href="https://x.com/samsja19">sami</a> who also leads the research effort at prime.</p><p>the rumbling of all these people suddenly snapping out of their flow subspace and congregating around me was interesting. not that I mattered much. It&#8217;s just that I was on a kind of stool right in front of the screen (which I didn&#8217;t even notice).</p><p>getting right into it, sami showed us how their new intellect-3 model (100B+ MoE which was almost done training on 512 gpus) was generating some interesting SoTA results.</p><p>results which I usually only hear about in blog posts or tech reports online to my great excitement.</p><p>I zoned out for about 5 min thinking about how weird all of this was. here we were all excited about something we couldn&#8217;t touch, see or even understand entirely.</p><p>artificial intelligence has this interesting property that the substrate these researchers were working on isn&#8217;t the end artifacts (like the final model weights).</p><p>It&#8217;s not like software engineering or woodworking where the artist is sculpting, chipping away at its canvas until the vision is realized .</p><p>as I was looking at these guys looking intently at sami I realized that they looked like chemists or physicists building the machinery that creates the environment the desired result needs to grow into. and we were all looking at the result of this machinery producing state-of-the-art results in the realm of intelligence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ey8M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ey8M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 424w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 848w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ey8M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png" width="900" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!ey8M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 424w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 848w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 1272w, https://substackcdn.com/image/fetch/$s_!ey8M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b13be96-9926-4f0d-8874-922524a1afd3_900x720.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">tip top dancing on the infra while building the harness for verifying the unverifiable</figcaption></figure></div><p>something about seeing the team that made it possible in a physical space was also pretty odd. because none of these researchers was entirely responsible for building this end system, but they all kind of were in each of their own ways.</p><p>as I snapped out of wherever I was I realized that surprisingly the presentation wasn&#8217;t a victory reel built to showcase how cool their results were. the team already switched gears to discuss with great excitement the shapes of things to come.</p><p>a disproportionate amount of time was now being spent on thinking and talking and hypothesizing about what they want to do for the next intelligence they wanted to grow.</p><p>I have a lifelong fascination with learning in all its facets and the last 10-minute discussion triggered by <a href="https://x.com/kalomaze">kalomaze</a> about how it could be possible to push continual learning stayed with me well throughout the night.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SMvt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SMvt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 424w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 848w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 1272w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SMvt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png" width="489" height="257" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7321e40-696c-43b7-9224-df964d7a687c_489x257.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:257,&quot;width&quot;:489,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!SMvt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 424w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 848w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 1272w, https://substackcdn.com/image/fetch/$s_!SMvt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7321e40-696c-43b7-9224-df964d7a687c_489x257.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">&#8230;man&#8230; it could work</figcaption></figure></div><p>It reminded me of all these evenings I&#8217;ve spent singularly focused on detecting consciousness in my little lab at the very top of the montreal general hospital. knowing that whatever blockers I was facing I just needed to talk it out with a colleague and figure out the shape of it and that at some point one of these ideas would allow me to hold onto something in that darkness.</p><p>sometime I wonder why I haven&#8217;t just continued chipping away in that trance-like way of living forever and I then remember I decided to build a startup and have three kids.</p><p>I love startups btw. the energy in a healthy one is very palpable especially when the team is holding one of these rare gems of value they know they can actively refine. yet, it&#8217;s usually hard to keep the flow of value coming out without dropping said gem. the metaphorical pressure is just too much.</p><p>I always had this dream of doing both research and startup in some shape or form. finding such a gem with value pouring out in waterfalls all around me and pooling in such a way I can meditate until I die on some of these hidden mysteries of life.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6zlw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6zlw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 424w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 848w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 1272w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6zlw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png" width="851" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:851,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!6zlw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 424w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 848w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 1272w, https://substackcdn.com/image/fetch/$s_!6zlw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff03903a1-f923-45cf-ad6f-e59708549ecb_851x360.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">just let me soak up in value while I think about what it takes to think about what it takes.</figcaption></figure></div><p>I got a glimpse of that dream and how to make it a reality at prime intellect. that team is actually a mashup of researchers and engineers which are chipping away at superintelligence with the energy of a startup and the stillness of research. all while shaping their organization to grow and capture these streams of value.</p><p>yes folks we&#8217;re talking business now. I know I know. money sucks, but superintelligence isn&#8217;t exactly a cheap endeavor.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PN47!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PN47!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PN47!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PN47!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PN47!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PN47!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg" width="900" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!PN47!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PN47!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PN47!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PN47!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff741264-865b-475f-a7c1-3840df89f086_900x484.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I was zoning out again as I was looking at these words on a slide labelled mission in the applied research meeting. I don&#8217;t even remember which day it was. my stay was such a blur.</p><p>but I remember thinking <em>&#8220;what an interesting combination of concepts to strive for...&#8221;</em></p><p>I quickly saw by the discussion that ensued on that slide that they were actually serious about the whole lot:</p><ul><li><p>bringing about superintelligence using tools they built.</p></li><li><p>providing these tools to everyone.</p></li><li><p>doing that work in the open.</p></li></ul><p>It was getting hard not to love these guys honestly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!55Vt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!55Vt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 424w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 848w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!55Vt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg" width="900" height="508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:508,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!55Vt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 424w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 848w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!55Vt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f726f8f-6724-4292-a18e-14cce93478cf_900x508.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">so much taste baked into one place</figcaption></figure></div><p>I was sitting later that night in an overcrowded in-and-out out on fisherman wharf with my double-double thinking that there was something about the current era of machine learning that felt electric to the practically minded soul (which I&#8217;m a proud member of).</p><p>the empirical nature of machine learning shines with such a gleam.</p><p>you can just try things and get feedback pretty rapidly at a scale and breadth that I&#8217;ve never thought we would get to. back in the prehistoric machine learning time of 2019 all the ai projects were such a miserable slog of issues and problems and blockers.</p><p>now problems still exist but it feels as if you are running fast enough you can just leap past a whole bunch of them. these fine prime folks were running a whole lot and preparing to accelerate some more it seemed.</p><p>this pattern of intense focus on that multifaceted mission wasn&#8217;t isolated within the cocoon of prime intellect; the second meeting I got invited to was for the RL residency where a few researchers distributed all across the world got together to chat up about various projects they were working on.</p><p>I love student researchers man. one of the coolest slices of our society.</p><p>they are usually such interesting individuals; each one I met was a world. all have extreme uniqueness in how they go about things. very hard for them to be anything but themselves because they are thinking so deeply about doing the seemingly most random technical feat at the boundary of what is known.</p><p>they barely have any brain power left to fake or conform much.</p><p>one of them was showing us his 20 page draft notion page with the most radiant smile I&#8217;ve seen in a while. another researcher spoke about 42 words on a topic that was at the fringe of RL finetuning yet everyone understood. another was making a whole book environment and his neighbor in the google meet was making a universal verifier jesus man I was pumped.</p><p>they were all thinking and working and tinkering on the problems littering the path to this superintelligence mission with such a zeal. It was beautiful.</p><p>each project stretching the possible a tiny bit while being grounded in empirical findings these guys have been bulldozing in the past weeks. the interesting bit is that these jolly souls weren&#8217;t even full time employees. they were working on pushing reality in between their PhD works or right after they&#8217;ve clocked out of their dev jobs or at night when the stillness is conducive to these oversized ideas making contact.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5duR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5duR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 424w, https://substackcdn.com/image/fetch/$s_!5duR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 848w, https://substackcdn.com/image/fetch/$s_!5duR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 1272w, https://substackcdn.com/image/fetch/$s_!5duR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5duR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png" width="600" height="431" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:431,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!5duR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 424w, https://substackcdn.com/image/fetch/$s_!5duR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 848w, https://substackcdn.com/image/fetch/$s_!5duR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 1272w, https://substackcdn.com/image/fetch/$s_!5duR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb55efc29-e22b-4349-bf40-ab0716d0e8fe_600x431.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">they got the most refined sense of humor too </figcaption></figure></div><p>one of these researchers found a hilarious result that appeared when there was a visual moral dilemma on some closed source models. the personality of all the models was surprisingly on-point glowing through their weights when confronted in the visual spaces with two bad moral outcomes.</p><p>and here we were all looking at this quirk of intelligence and thinking about ways of bending it to our will some more using anime filters.</p><p>I&#8217;ll be honest I felt good. not only because I&#8217;ve escaped the most random winter storm to hit montreal in november no no (a bit yes), but because I was surrounded with such an interesting breed of people that all are pushing in the same beautiful direction.</p><p>the last time I felt so intellectually good was in 2018 when I was at a conference about transformers at the Mila with the one and only yoshua bengio. he was being asked in a hundred different ways if attention was truly all we needed to scale intelligence by a whole crowd of very ecstatic academics.</p><p>funnily enough the man had one of the weirdest prosthetic casts on his leg and at first I thought his whole leg was robotic or something. yet no one cared.</p><p>people were eagerly grasping and crawling over what he had to say, anything really that he could show or hint towards the path forward.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yIU9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yIU9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 424w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 848w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 1272w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yIU9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png" width="640" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!yIU9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 424w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 848w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 1272w, https://substackcdn.com/image/fetch/$s_!yIU9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b45af4-abb2-49f4-93b5-23dfe6d8bba5_640x360.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>in the days that passed in the san francisco bay I had many other conversations with the denizens of prime intellect all ablaze with desire to make things move.<a href="https://x.com/creet_z">creet_z</a> who comes from their OSS effort (<a href="https://www.creetz.com/grpo.html">check his blog post on GRPO btw very sharp</a>),<a href="https://x.com/asharoraa">ash</a> with a superb operational mind and that gave me one of the best compliments I ever had (almost got a nosebleed from it),<a href="https://x.com/willccbb">will brown</a> with a mystical third eye for RL,<a href="https://x.com/vincentweisser">vincent</a> and <a href="https://x.com/johannes_hage">johannes</a> who have such a clear vision of what prime intellect shape should be and the many others I took out of flow states with my left-field questions.</p><p>as someone who dislikes groups in general; I kinda liked these unique people and wished I could spend a whole week with each of them. they all felt so unique yet all came together in these microstructures that blended their skill sets and perspectives into a coherent whole.</p><p>this whole setup felt reminiscent of some words from karpathy that I read from a medium blog post about modules melting into an optimal whole (waaaaaaay back in the software 2.0 days).</p><p>It felt like these folks were fusing their intellect into a strange and intense logical machinery to melt orbs of intelligence to create a goddamn whole lot of babelic trowels.</p><p>pretty rad if you ask me.</p><p>while I was there I decided to sample a bit more of the city and its surroundings by extending my tiny digital tentacles. I dined with big tech researchers (some breaking NDAs), chatted with people building in ai labs (big and small), guys doing startups late at night and academic researchers at stanford on a secret mission to decode the brain (<a href="https://www.enigmaproject.ai/jobs">they hiring btw</a>).</p><p>through each of these individual personalities, I could see they were reaching in a similar subspace in various ways. somewhere they all felt something gigantic slowly rumbling over.</p><p>each one of them interpreting its shape through the lens of its own ideals and thinking about harnessing it for their own purpose.</p><p>as my plane took to the beautiful san francisco sky I was still thinking about the shape of what we were all building.</p><p>not only at prime intellect but at all the different labs around this city and the world. thousands upon thousands of minds iterating on the ideal of intelligence to usher in its metamorphosis.</p><p>maybe the agitation that I pictured in my head about this place was real after all. happening not in the street but in the curvatures of this new idea looking to take form. all these offices, apartments, co-working places focused inward in rearranging how intelligence is done.</p><p>whether open, closed, decentralized or monolithic, something was moving within the chrysalis and about to take flight.</p>]]></content:encoded></item><item><title><![CDATA[how to stop having FOMO as a curious engineer: hold a thread]]></title><description><![CDATA[little worthless advice to learn passionately without boxing yourself]]></description><link>https://www.yacinemahdid.com/p/how-to-stop-having-fomo-as-a-curious</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/how-to-stop-having-fomo-as-a-curious</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 11 Nov 2025 22:14:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!aiL8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aiL8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aiL8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aiL8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg" width="600" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/faff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Suggestion] Add Danish cookie tin that stores crafting supplies (needle,  thread, balls of wool, etc.) similar to the Tackle box : r/2007scape&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Suggestion] Add Danish cookie tin that stores crafting supplies (needle,  thread, balls of wool, etc.) similar to the Tackle box : r/2007scape" title="Suggestion] Add Danish cookie tin that stores crafting supplies (needle,  thread, balls of wool, etc.) similar to the Tackle box : r/2007scape" srcset="https://substackcdn.com/image/fetch/$s_!aiL8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 424w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 848w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!aiL8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaff12eb-ea97-4c7a-8542-ca320dfd9002_600x450.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I received quite a lot of distressed queries from a pretty funny population of engineers which is the curious kind.</p><p>those with such an insatiable desire to learn that they end up in a peculiar situation where they feel completly lost.</p><p>it starts to dawn on them that this beautiful universe is infinite in the way it&#8217;s shaped and that their mortal shell will not hold long enough to understand any sizeable portion of it.</p><p>so then they feel stuck and they end up either in constant FOMO or jumping from one thing to another like a puppy in a buffet.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9HGb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9HGb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9HGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg" width="275" height="183" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:183,&quot;width&quot;:275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Puppy Play: Best Games for Building Skills - PD Insurance&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Puppy Play: Best Games for Building Skills - PD Insurance" title="Puppy Play: Best Games for Building Skills - PD Insurance" srcset="https://substackcdn.com/image/fetch/$s_!9HGb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9HGb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa87d64e5-8aa4-409e-b401-b2fad5ef64b9_275x183.jpeg 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">weeeeeeeeeeeeeeeeeeeeeeeeeeee</figcaption></figure></div><p>what I usually tell these precious rare souls is that they haven&#8217;t found their thread yet and that they should look closely.</p><p>and that&#8217;s what I&#8217;m going to show you how to do right about now with a rather convoluted story.</p><h2>short trauma dump about my seasonal affective disorder</h2><p>I get SAD during winter.</p><p>like real SAD.</p><p>it&#8217;s better now that I&#8217;m literally pilled up all winter on high doses of vitamin D.</p><p>but there was a time right in between high school and university that it was pretty bad.</p><p>I was walking the beautiful winter soggy path between the subway station and this very inspiring and colorful pre-university school (CEGEP for those that speak quebecois).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NBvk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NBvk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 424w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 848w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 1272w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NBvk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png" width="800" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Missing Canadian teenagers may have link to ISIS | CNN&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Missing Canadian teenagers may have link to ISIS | CNN" title="Missing Canadian teenagers may have link to ISIS | CNN" srcset="https://substackcdn.com/image/fetch/$s_!NBvk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 424w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 848w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 1272w, https://substackcdn.com/image/fetch/$s_!NBvk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4bb6c49-c073-465a-883c-f6dd5c0ad89e_800x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>god damn.</p><p>everything was gray.</p><p>the trees were gray the clouds were gray the road was full of all kind of greyish grime the road was gray the metro was gray I believe I had a gray shirt and when I got inside the cafeteria I saw different tones of grayish brown.</p><p>I was looking forward to jackshit.</p><p>not even to be done with my day because the road home will be darkish gray with the everlasting winter dome hovering above my head.</p><p>I proceeded to move FSD-style on the third floor where I found an empty spot to just open up my books.</p><p>now that day I came in a bit earlier than I usually do because in my grayish luck I didn&#8217;t realize that my morning class was canceled.</p><p>so I mechanically opened up my books.</p><p>and this is where I caught a glimpse of my thread.</p><p>a sparkling glimmer that looked almost blinding to my very SAD eyes.</p><p>it was a physic book if I remember correctly about electricity and magnetism. nothing cutting edge that would blow fullfilled people boots away.</p><p>but for me it was a glimmer of something I truly enjoyed at a cellular level.</p><p>I was reading doing some exercise reading some more correcting my little mistakes and I was having a jolly time.</p><p>I felt a true hue of something passing through my little parched and wrinkled gray soul.</p><p>and suddenly as I was reaching inside the depth of the universe through the easiest magnetic fields questions you could think of I latched onto my thread </p><p>I couldn&#8217;t even tell at that time what fabric it was made of. but I felt so so good holding it.</p><p>THEN something even more remarkable happened.</p><p>some of my friends saw me and came to sit with me. smile in hand they all were surrounding me like a warm and earnest social hug.</p><p>now I love these guys. truly great people with most likely a much more well balanced mind than I ever had.</p><p>but at that time it was like a whole fat gray persian rug was thrown on my head and was suddenly unrolling it&#8217;s beautiful gray motifs over my eyes.</p><p>that particular group of friend didn&#8217;t give two shit about physics let alone any particular subject. they were purely social beings of the highest degree. they were into exchanging and having fun and making amazing social plans.</p><p>now. I&#8217;m not a recluse or an antisocial mole dweller.</p><p>I like people I like to chat.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oPC1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oPC1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oPC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg" width="1200" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;He was in mystic delirium': was this hermit mathematician a forgotten  genius whose ideas could transform AI &#8211; or a lonely madman? | Mathematics |  The Guardian&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="He was in mystic delirium': was this hermit mathematician a forgotten  genius whose ideas could transform AI &#8211; or a lonely madman? | Mathematics |  The Guardian" title="He was in mystic delirium': was this hermit mathematician a forgotten  genius whose ideas could transform AI &#8211; or a lonely madman? | Mathematics |  The Guardian" srcset="https://substackcdn.com/image/fetch/$s_!oPC1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oPC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ed41158-5c1f-4969-bc75-57d0ec9b60d2_1200x1200.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">buuuuuuuuut</figcaption></figure></div><p>but at that particular moment in time I would have spelunked and wiggled my body down to a comfy hole if I could just open that godamn book and study just a few more minutes.</p><p>I then did something that was totally out of character which was to excuse myself because I had to go to a non-existent class at the fourth floor.</p><p>I got up took the escalator to the fourth floor then another floor then I walked down a corridor then I climbed a whole bunch of stairs into this secluded tower that hosted the math classes.</p><p>I walked some more and I would have continued to walk through the wall if I wasn&#8217;t at the literal furthest end of this beautiful gray building.</p><p>I opened my books studied some more and fucking loved every minute of it. I didn&#8217;t stop at the physics stuff god no. I did my math problems I read that novel I had to write a report on I read and worked and studied aahhhhhhhhhhh man.</p><p>it was during this stretch of time and the weeks that followed while I was sitting in my gray puddle of SADness that I realize that I love learning.</p><p>I loved it to a ridiculous degree.</p><p>not just the physics or the math or any subject really.</p><p>It was this act of learning this meta concept of learning that brought me boundless joy.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_k37!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_k37!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_k37!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_k37!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_k37!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_k37!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg" width="1200" height="1142" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;I need more knowledge : r/dankmemes&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="I need more knowledge : r/dankmemes" title="I need more knowledge : r/dankmemes" srcset="https://substackcdn.com/image/fetch/$s_!_k37!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_k37!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_k37!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_k37!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75d27956-6dfd-4d86-ae58-39deaf117e04_1200x1142.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">hell yeah man lets learn some STUFF</figcaption></figure></div><p>felt like a 3 yo davinci discovering painting existed with hands and feet completly smudged with a joyful mess of colors.</p><p>I got better over the years to allow myself space to enjoy this learning act while not ending up like a crazen oakenshield alone in his room filled with gold.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LmhR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LmhR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LmhR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg" width="480" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Thorin Oakenshield - Gold - YouTube&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Thorin Oakenshield - Gold - YouTube" title="Thorin Oakenshield - Gold - YouTube" srcset="https://substackcdn.com/image/fetch/$s_!LmhR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LmhR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c7cd6d-896c-43d8-badb-10153237ef4a_480x360.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">look at all my learning man look at it it&#8217;s beautiful don&#8217;t touch anything thanks gtfo</figcaption></figure></div><p>but that thread that I grasped that day I never ever ever let it go.</p><h2>what can you do with this thread?</h2><p>great question buddy.</p><p>nothing, the thread is useless.</p><p>hope it was useful &#127801;</p><p>best,</p><p>yacine mahdid</p><div><hr></div><p>for more learning topic do consider subscribing haha no man I&#8217;m joking.</p><p>but for real this thread that I clasped with my rattling hands from the jaws of the universe has no purpose.</p><p>a bit like the color purple or a triangular shape has no purpose.</p><p>these little threads that compose our existence are higher level concepts and construct that are layered in our slice of the world.</p><p>if you look closely in different situations you can see them being there holding up our human conceptualization of what&#8217;s going.</p><p>remember that group of friend? boy did they find their thread. I don&#8217;t even know what the name of the thread they found was but it must have been something like &#8220;social connection&#8221; or &#8220;friendship through sharing&#8221; or whatever.</p><p>I&#8217;m very happy for them. my thread was something else.</p><p>with that thread firmly in my hands I started to make my moves but always by looking for its very distinct shape and color in whatever I was doing.</p><p>sometime it was vibrant. layering a whole room in thick woolly texture.</p><p>other time it was there thin and barely visible (you had to squint hard).</p><p>and a few time it was just not there severed by accident or sabotaged.</p><p>that thread helped me finally decide what I wanted to do with my life which was to learn about learning. </p><p>I went into cell and molecular biology in uni to really understand how cells worked. I was in particular very interested about neurons and how they manage to create this concept of learning I loved so much.</p><p>I was voraciously learning anything I could put my hands on about the brain and its functions.</p><p>I pushed my learning skills academically and outside of a classroom for the pure thrill of it.</p><p>there wasn&#8217;t a single subject I didn&#8217;t find the thread right there for me to hold. even things I couldn&#8217;t give two lints about (like cooking) I found it solidly anchored on the oven and the pans.</p><p>there were situations though where this thread was so weak that it was waste of time to look for it. </p><p>so I cut myself out of these setups pretty quickly.</p><p>from the outside it looked sudden but from my perspective thread in hands it was nonsensical to continue wasting my days while I could just go where the thread was so much more there.</p><h2>what I did with this thread</h2><p>I did a whole lot of research in different labs</p><p>I published scientific articles ran some experiments that tested the limit of how I understood how to learn</p><p>I taught to hundreds of people in person online 1-1 in groups video blogs power point via dms anything goes</p><p>I got into artificial intelligence where the blueprint of this thread is dissected to its first principle and the learning stream can be controlled and oriented towards useful tasks</p><p>I also found this thread with such a gleam in entrepreneurship (albeit totally by accident) which led me to build a cool ai company in the industrial supply chain space that solves a real problem</p><p>I also became a father and found that thread with all it&#8217;s primal radiance within these babbling learning masters</p><p>more generally it gave me these little secret superpowers:</p><ul><li><p>even when I go into seemingly absolute random tangents I never feel I&#8217;m off track because I see that thread as clear as day.</p></li><li><p>I rarely feel any sort of FOMO because what matters to me is if this thread is there in the situation I&#8217;m in. there is massive advance in thermodynamic computing? that&#8217;s very cool man I&#8217;m looking at latent reasoner right now.</p></li><li><p>It gives me a silver lining whenever I work on the most boring tasks known to man (grant writing) because I can always bring it back to this higher level concept that is dear to my heart.</p></li></ul><h2>what now? what do I do yacine!!!</h2><p>get depression.</p><p>go into debt if you have to.</p><p><em>(note: do maybe delete this dumb joke before publishing)</em></p><p>it&#8217;s rather simple. you just need to look and peer and look at all the stuff that truly brings you a sizeable amount of joy in life.</p><p>shed for a minute all the other&#8217;s expectations and look deep. shed also the convention about topics which are very arbitrary delimitation made by others.</p><p>these thread lingers and appear across subjects.</p><p>maybe it&#8217;s a thread called &#8220;creation&#8221; maybe it&#8217;s something about &#8220;community&#8221; maybe it&#8217;s &#8220;power&#8221; or maybe it&#8217;s also this &#8220;meta-learning&#8221; thingy I got going?</p><p>I can&#8217;t know for you and these threads have this wondrous quality of not having clear-cut names.</p><p>but once you have it in your hands you would find it ridiculous to let go.</p><p>hope it helps buddy god speed</p><p></p>]]></content:encoded></item><item><title><![CDATA[Hierarchical Reasoning Model assembly manual for toddlers]]></title><description><![CDATA[fun time for the whole family.]]></description><link>https://www.yacinemahdid.com/p/hierarchical-reasoning-model-assembly</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/hierarchical-reasoning-model-assembly</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 09 Sep 2025 17:21:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BUwL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BUwL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BUwL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BUwL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png" width="1280" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:231028,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BUwL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 424w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 848w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 1272w, https://substackcdn.com/image/fetch/$s_!BUwL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbcb65d8-3e58-42da-b9be-741b67c3b00b_1280x720.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>hello toddlers today we are assembling the hierarchical reasoning models by sapient intelligence step by step.</p><p>it&#8217;s a very fun kit you&#8217;ll see</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q8ut!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q8ut!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 424w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 848w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 1272w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q8ut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png" width="1365" height="615" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:615,&quot;width&quot;:1365,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!q8ut!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 424w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 848w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 1272w, https://substackcdn.com/image/fetch/$s_!q8ut!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec6d4612-d373-4c1a-a688-fa64dbc019f2_1365x615.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">we are going to construct all this, so take a good look here.</figcaption></figure></div><p>you will be able to use this architecture on all your favorite puzzles like:</p><ul><li><p>your grandma sudoku</p></li><li><p>your dad sunday newspaper maze</p></li><li><p>your uncle francois chollet arc-agi 1 test sets</p></li></ul><p>on the last puzzles set you can even get about ~37% of them correct in like 12h.</p><p>It&#8217;s great value for the price, but watch out kids some steps require adult supervision, so grab one of those.</p><p>also please don&#8217;t eat the puzzle embeddings at the end. </p><p>they are small and absolutely important (you can also choke on them so you are legally warned here).</p><p><strong>&#9888;&#65039; disclaimer for adults:</strong></p><p>for a complete assessment of the HRM usefulness, do check out the full video by our collaborator <a href="https://x.com/juliarturc">Julia Turc</a>:</p><div id="youtube2-cQ8J0gvWbn8" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;cQ8J0gvWbn8&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/cQ8J0gvWbn8?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h1>step 0: get a 27M encoder block</h1><p>first block you should reach out for is the encoder-only block which is about 27M parameters in size.</p><p>the block looks like the one on the left:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nBKE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nBKE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 424w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 848w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 1272w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nBKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png" width="1064" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!nBKE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 424w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 848w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 1272w, https://substackcdn.com/image/fetch/$s_!nBKE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5753b9fb-8914-4b1f-bff5-c5ad4153183f_1064x673.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">it&#8217;s usually green but block color availability might vary by regions</figcaption></figure></div><p>remember kids we are building a puzzle solving model not a language model (yuck).</p><p>those sucks kids, you hate them because they are ugly and pompous and really wasteful when you think of it.</p><p>if a language model was a thing, it would be a dictionary of the effective functions to recconstruct the humans representation of the world.</p><p>BOOOOOOORING bleh.</p><p>anyway, you are going to feed the full puzzle all at once and return it transformed with the proper solution added so no need for a decoder.</p><p>like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yZ2m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yZ2m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 424w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 848w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 1272w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yZ2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png" width="940" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10da05a0-c280-485b-95a9-76697623146e_940x556.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:940,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!yZ2m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 424w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 848w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 1272w, https://substackcdn.com/image/fetch/$s_!yZ2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10da05a0-c280-485b-95a9-76697623146e_940x556.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">we are only dealing with grid puzzle here which are fed all at once as it should be.</figcaption></figure></div><h1>step 1: plug the 27M block to input and output source</h1><p>now carefully plug the encoder with the input block and the output block like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i_mT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i_mT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 424w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 848w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 1272w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i_mT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png" width="1436" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1436,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!i_mT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 424w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 848w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 1272w, https://substackcdn.com/image/fetch/$s_!i_mT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a08bc1e-dec1-435e-855e-dfd2ca2324ca_1436x482.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">ask your adult of choice to configure the 27M block with RoPE, GLU, RMSNorm and Adam-atan2 please.</figcaption></figure></div><p>the input block will turn your puzzles into fun embeddings and the output block will spit out a solution for the puzzle the model is working on.</p><p>you can turn the power on and test it.</p><p>do notice how much the output puzzle kinda suck.</p><p>it&#8217;s because this 27M encoder model is fixed depth which is poopoo when dealing with puzzles that requires making mistakes and backtracking.</p><p>there is an easy fix for this.</p><h1>step 2: add a latent recurrence wire</h1><p>now carefully unplug the input and output block and add the thin latent recurrence wire.</p><p>like so, very simple modification:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LSUl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LSUl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 424w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 848w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 1272w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LSUl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png" width="780" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:780,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!LSUl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 424w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 848w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 1272w, https://substackcdn.com/image/fetch/$s_!LSUl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faaf70855-fc6c-48b4-bcec-12c30ac398c9_780x292.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>at first glance the wire might look like a chain-of-thought wire from these disgusting large-language-models. yuck no.</p><p>those are slimy and no good at all.</p><p>no no no my sweet child we are dealing with latent reasoning here.</p><p>this is pure, sweet, like the hug from your mom after coming back from school.</p><p><em>as a legal disclaimer we have to say that the HRM didn&#8217;t invent this latent reasoning and we can see the exact same pattern in the universal transformer.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W6E8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W6E8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 424w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 848w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 1272w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W6E8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png" width="1055" height="652" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1055,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!W6E8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 424w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 848w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 1272w, https://substackcdn.com/image/fetch/$s_!W6E8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a45436f-f18e-44f3-b218-071a5506563d_1055x652.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">we are effectively doing the one on the left right now.</figcaption></figure></div><p>adding this latent reasoning loop changed the architecture for one with fixed depth to one with variable depth.</p><p>the model can now technically loop however long it wants.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c3B2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c3B2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 424w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 848w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 1272w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c3B2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png" width="1373" height="274" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:274,&quot;width&quot;:1373,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!c3B2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 424w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 848w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 1272w, https://substackcdn.com/image/fetch/$s_!c3B2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83d972cf-1ad7-449b-8216-02ac86d406de_1373x274.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">look, this one has four looping when we unroll it through time.</figcaption></figure></div><p>now test it. it&#8217;s better isn&#8217;t it? but wait&#8230;</p><p>oh no! this poor model got lost.</p><p>poor thing doesn&#8217;t remember what to do.</p><p>we can&#8217;t leave it like this, let&#8217;s help it out.</p><h1>step 3: add a recall wire</h1><p>models like this need to be told to always keep an eye on their original input.</p><p>so take the recall wire and connect it back to the model.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BicF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BicF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 424w, https://substackcdn.com/image/fetch/$s_!BicF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 848w, https://substackcdn.com/image/fetch/$s_!BicF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 1272w, https://substackcdn.com/image/fetch/$s_!BicF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BicF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png" width="1326" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:1326,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!BicF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 424w, https://substackcdn.com/image/fetch/$s_!BicF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 848w, https://substackcdn.com/image/fetch/$s_!BicF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 1272w, https://substackcdn.com/image/fetch/$s_!BicF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaef4aa5-3ee1-46d8-bb3d-96206613691a_1326x320.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">it won&#8217;t be lost ever again.</figcaption></figure></div><p>now this is called a recurrent network with recall and it&#8217;s a good thing.</p><p>but your mom didn&#8217;t buy you this kit to build a <strong>good</strong> <strong>model</strong>, she wants you to do <strong>great</strong> <strong>things</strong> &#8482;&#65039;.</p><p>let&#8217;s look at rats.</p><h1>step 4: rats and funny pranks</h1><p>now listen close because this is important.</p><p>big important adult, which we call neuroscientists, had a job for rats.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4btg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4btg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 424w, https://substackcdn.com/image/fetch/$s_!4btg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 848w, https://substackcdn.com/image/fetch/$s_!4btg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 1272w, https://substackcdn.com/image/fetch/$s_!4btg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4btg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png" width="788" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:788,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!4btg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 424w, https://substackcdn.com/image/fetch/$s_!4btg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 848w, https://substackcdn.com/image/fetch/$s_!4btg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 1272w, https://substackcdn.com/image/fetch/$s_!4btg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5a88af2-49a1-4cc1-a3b6-d585ca5dfc6f_788x784.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>they gave the rats the task to find a funny surprise in two bucket in two different room.</p><p>the trick for rats was that the funny surprise was switched in the two rooms (what a funny little prank).</p><p>now, the neuroscientist weren&#8217;t just pulling the legs of the rat for fun, they had an important job to do.</p><p>they were secretly looking inside parts of the rat brain (called the hippocampus, looks like a seahorse yehaaaw) to see how the rat was learning about the prank.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ixcz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ixcz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 424w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 848w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 1272w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ixcz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png" width="797" height="536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:536,&quot;width&quot;:797,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:215117,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ixcz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 424w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 848w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 1272w, https://substackcdn.com/image/fetch/$s_!ixcz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2324270a-65d4-468d-9910-8c4307fb2dfd_797x536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">big red means big collaboration and friendship</figcaption></figure></div><p>what they saw was that some parts of the brain were working and talking to each other to learn the trick!!!!:</p><ul><li><p>some part were working rush rush rush because they were very busy and had lots of rapid information to handle.</p></li><li><p>other were working sloooooooowly like an old adult and helping the busy part get the work done.</p></li></ul><p>this great teamwork made it possible for these poor little rat to not get too confused and understand the jolly-ol&#8217; prank from these funny neuroscientist.</p><p>now we too love collaboration and working with friends.</p><p>let&#8217;s add collaboration to our model!</p><h1>step 5: hierarchical reasoning</h1><p>now, I will need you to take a deep breath and be big a big boy/girl ok?</p><p>yacine was pulling a little fun prank on you since the beginning (just like with the rats!).</p><p>we are going to remove some of the puzzle pieces we put in because they weren&#8217;t the right one to put in the first place! (oh no!)</p><p>don&#8217;t get mad, you will see it&#8217;s a good thing.</p><p>you will need to take two smaller identical encoder-only blocks and place them as follows:</p><ul><li><p>one of the blocks needs to be wired in the input (with recall and recurrence wires); we&#8217;ll call this one low-level block</p></li><li><p>and the other block needs to be wired to the low-level block (with recall and recurrence) and to the output; we&#8217;ll call this one the high-level block.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6cU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6cU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 424w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 848w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 1272w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6cU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png" width="1449" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:455,&quot;width&quot;:1449,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:47788,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C6cU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 424w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 848w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 1272w, https://substackcdn.com/image/fetch/$s_!C6cU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa75695ad-5e3a-474b-9568-3c8a5099ada2_1449x455.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">like so</figcaption></figure></div><p>you might be a bit confused here, which one is my low level block and which is my high level block???</p><p>&#8220;IS THIS ANOTHER PRANK YACINE&#8221; I hear you say fist full of yacinego&#8482;&#65039; block.</p><p>no my dear child, the blocks become a high or low level block because of the wiring!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ExvH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ExvH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 424w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 848w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 1272w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ExvH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png" width="1071" height="623" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:623,&quot;width&quot;:1071,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ExvH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 424w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 848w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 1272w, https://substackcdn.com/image/fetch/$s_!ExvH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed54bca5-6bef-45af-8ea7-7109952b331e_1071x623.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">the two module only behave has high and low when they are trained (c and e plot).</figcaption></figure></div><p>the l module will be the one that will talk a lot, it will be recurring with itself T times before giving it&#8217;s work to the h module which will review it. the h module will then give gentle guidance to the l modules and it will do this N times.</p><p>in total now your model will talk and work in friendship on your puzzle N*T times.</p><p>good news also for your mom and dad credit card, you don&#8217;t need to run back-propogation-through-time with this model!</p><p>you can just use a one-step gradient at the very end of a N*T cycle!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-5q5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-5q5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 424w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 848w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 1272w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-5q5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png" width="1414" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1414,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-5q5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 424w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 848w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 1272w, https://substackcdn.com/image/fetch/$s_!-5q5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a26f136-ea9d-4935-a402-96b3c59679be_1414x563.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">it&#8217;s just works!!</figcaption></figure></div><p>now, if you test this model on sudoku and mazes you might be a bit disappointed with the result.</p><p>we are missing an important element, let&#8217;s add it.</p><h1>step 6: add outer loop wire</h1><p>now carefully take two recurring wires from the kit and plug them as so:</p><ul><li><p>plug one end on the h module end state to it&#8217;s starting state</p></li><li><p>and another one on the l module end state to it&#8217;s starting state</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!owIA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!owIA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 424w, https://substackcdn.com/image/fetch/$s_!owIA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 848w, https://substackcdn.com/image/fetch/$s_!owIA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 1272w, https://substackcdn.com/image/fetch/$s_!owIA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!owIA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png" width="1456" height="746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:746,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100199,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!owIA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 424w, https://substackcdn.com/image/fetch/$s_!owIA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 848w, https://substackcdn.com/image/fetch/$s_!owIA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 1272w, https://substackcdn.com/image/fetch/$s_!owIA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b8931e9-7c56-4ab0-b3b2-f218361d831b_1547x793.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">don&#8217;t forget to enable the deep supervision switch to avoid gradient leaking through segments there is no warranty for what will happen if you don&#8217;t.</figcaption></figure></div><p>the model is now able to do something very important which is spend more time thinking about a problem.</p><p>kind of like when someone asks you a difficult riddle.</p><p>you don&#8217;t like to be rushed don&#8217;t you?</p><p>well the model is the same, he can now spend more time thinking on a puzzle.</p><p>but wait a second&#8230;</p><p>how long should we let the model think?</p><p>that&#8217;s where the road lights come in, let&#8217;s add them.</p><h1>step 7: add road lights</h1><p>take the tiny road lights, the one with the green and red lights, and plug it at the end of the model like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PfgF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PfgF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 424w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 848w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 1272w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PfgF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png" width="1113" height="524" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:1113,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:66938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PfgF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 424w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 848w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 1272w, https://substackcdn.com/image/fetch/$s_!PfgF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe29f8ecd-e8cb-4fcf-a32f-d1becaf2e79a_1113x524.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">it&#8217;s the q head on the right.</figcaption></figure></div><p>with this road light the model will be able to know when it&#8217;s time to stop thinking and when to continue for another cycle.</p><p>this way if it hits an easy puzzle it can do it quick quick.</p><p>if it hits a very difficult puzzle it can sloooooow doooooown and do a lot more thinking.</p><p>It&#8217;s great! It&#8217;s a good thing!</p><p>with the magic of reinforcement learning, it can even learn to use the road light on its own (just make sure to wire it to the loss properly).</p><p>now you have a very good model for your grandma's sudoku and your dad's sunday mazes.</p><p>but uncle chollet arc-agi puzzles are still too hard!</p><p>how come?</p><p>if you look carefully in uncle chollet puzzles set you see that he only gives you two example for each puzzle.</p><p>&#8220;JUST TWO??? NOT FAIR&#8221; I hear you say.</p><p>worry not my dear child, this trusty kit has got you covered.</p><h1>step 8: data-augmentation-framework&#8482;&#65039;</h1><p>thanks to the data-augmentation-framework&#8482;&#65039; you can turn any puzzles with limited instructions into a full dataset to train your model!</p><p>the only thing you need to do is to carefully lift and insert your HRM model into the provided case, like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z_aV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z_aV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 424w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 848w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 1272w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z_aV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png" width="1456" height="598" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:598,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:141531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z_aV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 424w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 848w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 1272w, https://substackcdn.com/image/fetch/$s_!z_aV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F895d4dcd-c664-431d-a7b4-3f4a41a3e592_1502x617.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">this fit like a glove.</figcaption></figure></div><p>the data-augmentation-framework&#8482;&#65039; works by taking your few examples and doing all sort of funny changes (300-1000 of them) like wiggling a bit the puzzles, rotating them, flipping and even changing the color.</p><p>how fun for the dataset!</p><p>after all these silly augmentations, the model will do the same but in reverse to clean up the mess a bit.</p><p>because at the end of the day, even if we had a lot of fun, we need to be serious and get the job done.</p><p>so it will take all of the possible puzzle answers and count those that look the same.</p><p>the two answers the model will have counted the most will be what we are going to give to uncle chollet.</p><p><em>legal disclaimer: the HRM didn&#8217;t invent the data augmentation methodology for arc-agi it was already tried in a previous network that didn&#8217;t use pre-training called compressARC</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dy9j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dy9j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 424w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 848w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 1272w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dy9j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png" width="1001" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1001,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153794,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dy9j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 424w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 848w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 1272w, https://substackcdn.com/image/fetch/$s_!dy9j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc58f14b8-136e-4eab-a6d4-a03a3f08a119_1001x346.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>we are almost there kiddo, almost we just need one tinyyyyyy piece to make everything work.</p><h1>step 9: the last piece of the puzzle</h1><p>look into your kit, you should see one tiny tiny piece which is the puzzle embedding layer.</p><p>take this piece, resist the urge to taste it, and plug it right at the beginning where your input block is, like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NgEN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NgEN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 424w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 848w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 1272w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NgEN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png" width="1064" height="624" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:624,&quot;width&quot;:1064,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:145342,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/173177312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NgEN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 424w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 848w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 1272w, https://substackcdn.com/image/fetch/$s_!NgEN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69c0d659-0174-47e3-af9e-8c89cdb70b23_1064x624.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>this tiny piece will take each transformation the data-augmentation-framework&#8482;&#65039; has done on the arc-agi puzzles and will create a super duper secret name for it.</p><p>this secret name is very useful for the HRM model to understand what it needs to do, because remember this isn&#8217;t a bulbous disgusting hunk of language model junk (bleh bad bad).</p><p>this is a small and humble latent reasoning model; it needs all the help we can provide to do a good job.</p><p>nothing more nothing less, as our lord and savior intended.</p><p>and voila now with this kit you are able to solve all these puzzle!</p><p>how fun!!!!!!!!</p><p><em>don&#8217;t forget to ask your parent to get you the next yacinego kit, where we will add a moral judgment circuit into these disgusting language models!</em></p>]]></content:encoded></item><item><title><![CDATA[get your A in college and stop wasting your life]]></title><description><![CDATA[it's not complicated.]]></description><link>https://www.yacinemahdid.com/p/get-your-a-in-college-and-stop-wasting</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/get-your-a-in-college-and-stop-wasting</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 25 Aug 2025 19:11:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!r1j7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r1j7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r1j7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 424w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 848w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r1j7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg" width="760" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:760,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;22 Fun and Creative Painting Ideas for Toddlers - Okinja ELC&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="22 Fun and Creative Painting Ideas for Toddlers - Okinja ELC" title="22 Fun and Creative Painting Ideas for Toddlers - Okinja ELC" srcset="https://substackcdn.com/image/fetch/$s_!r1j7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 424w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 848w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!r1j7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde6838a5-a41d-4f1c-8a68-2ab154f80b93_760x552.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">easier than hand painting</figcaption></figure></div><p>the school year is soon starting back and I got reminded by a master student in germany how brutal it is for some.</p><blockquote><p>so this sem, I doubled down, stopped all outdoor activities, spent every hour in learning the subjects and practicing math to the point I got saturated staying in my room (subject is Signal Processing) and I still got failed (I panicked in the exam of concepts I was familiar with and forgot them totally). <br><br>This is really bothering me as I thought I will get done with subjects and start writing papers as I want to become a Research Engineer.</p></blockquote><p>my heart goes out to these poor students for whom failing a class is a likely stressor throughout the year.</p><p>I tutored all throughout my undergrad and grad time and I&#8217;ve seen firsthand how it can turn people into molten stress balls.</p><p>so here&#8217;s my simple mindshift steps to make sure there is 0% chance you will fail a class ever again.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!of_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!of_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 424w, https://substackcdn.com/image/fetch/$s_!of_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 848w, https://substackcdn.com/image/fetch/$s_!of_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 1272w, https://substackcdn.com/image/fetch/$s_!of_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!of_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png" width="1456" height="536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:536,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99541,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171903270?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!of_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 424w, https://substackcdn.com/image/fetch/$s_!of_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 848w, https://substackcdn.com/image/fetch/$s_!of_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 1272w, https://substackcdn.com/image/fetch/$s_!of_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd09d16ea-6232-4e11-a434-81c55b69e078_1680x618.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">unhinged advice brought to you by a 3.93 gpa dean&#8217;s honour list boy who took 13 credits too much</figcaption></figure></div><h3>disclaimer:</h3><ol><li><p>please don&#8217;t turn into an asshole</p></li><li><p>always anchor your action in fairness</p></li><li><p>this is mostly applicable to adversarial grading setup doesn&#8217;t work for research or practical projects.</p></li></ol><h3>step 1: burn your current grades mindset down</h3><p>let&#8217;s get one thing straight; you are currently paying for this education (you and your dads and moms).</p><p>hard earned money.</p><p>not only that but with actual time your non-refundable life essence.</p><p>the way you are likely conceptualizing this transaction is the following:</p><ol><li><p>you pay the school money and time</p></li><li><p>in exchange there is a teacher who will teach you about the subject</p></li><li><p>if you do a good job at learning the topics the teacher will award you points</p></li><li><p>if you get enough points you pass the class and can move on to other class</p></li><li><p>you can even get enough points to get these golden A grades that are such a pride for you and your genetic dynasty</p><p> </p></li></ol><p>my sweet summer child, what is this.</p><p>this isn&#8217;t 1943, this isn&#8217;t how any of this work.</p><p>you take all of these nice ideas and bury them deep deep deep.</p><h2>step 2: flip this mindset up</h2><p>it&#8217;s the year 2025 of our lord jesus guys.</p><p>everything is available online to an extent that the only gap between you understanding something is your drive to learn the topic.</p><p>you aren&#8217;t paying to learn. </p><p>you are paying to have a stamp of approval from a university that you have a validated baseline understanding of a set of topics.</p><p>the way they can give you that stamp without dilluting the value of the stamp is to grade you *objectively.</p><p>you are giving money and time to get the stamp, not to get the learning.</p><p>the deal is actually the following:</p><ol><li><p>you pay the stamp provider money and time.</p></li><li><p>in exchange they assign a whole roster of teachers to GATE access to that final stamp.</p></li><li><p>if you do a good job at following the teacher's expectation they will remove less points in the class.</p></li><li><p>if you have enough points left at the end of the class you can move on to the next gate.</p></li><li><p>if you do a very good defensive job you can even get these A-grade which give you access to even better stamps!</p></li></ol><p>*now now I know what you are thinking.</p><p>what is this bastardization of the learning institution? where are these golden value of learning and curiosity that should be instilled into every puppil heart?</p><p>hold that thought because it&#8217;s going to get worse.</p><h3>step 3: you start with an A grade and your job is to defend that grade</h3><p>with this mindset in your noodle basket next thing you need to understand is that every class start with an A grade.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FVVV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FVVV!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 424w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 848w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 1272w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FVVV!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif" width="640" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Pepe Happy GIF - Pepe Happy - Discover &amp; Share GIFs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pepe Happy GIF - Pepe Happy - Discover &amp; Share GIFs" title="Pepe Happy GIF - Pepe Happy - Discover &amp; Share GIFs" srcset="https://substackcdn.com/image/fetch/$s_!FVVV!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 424w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 848w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 1272w, https://substackcdn.com/image/fetch/$s_!FVVV!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d9cac8a-103c-4e25-9197-ebc395b874cf_640x640.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">nice work buddy.</figcaption></figure></div><p>your only job throughout the semester is to guard this grade from the teachers that have the mandate from the power-that-be to chip away at your beautiful and perfect grade.</p><p>now, they are only doing their job and 99% of teacher are doing it fairly.</p><p>they will tell you flat out how they are going to remove you points, like in excruciating details.</p><p>any modifier like curving down or up they usually also mention it during the very first classes.</p><p>you can even drop out of the class in most uni without any penalty during the first few weeks if you don&#8217;t like the deal.</p><p>this chipping away at your sweet beautiful grade is anchored in fairness. </p><p>it&#8217;s your job to understand how the chipping is done.</p><h3>step 4: plan your defense</h3><p>this process usually starts way before the semester start during the selection of the class.</p><p>you got options and agency here. </p><p>you can decide if this semester will be a confusing mess or an uphill negotiation battle or a topic focused season or an embarassingly dumb-easy ones.</p><p>it&#8217;s your life guys. </p><p>there&#8217;s many permutations you can do but overall here&#8217;s what principles I think you should adopt:</p><ul><li><p>avoid at all cost courses that are thought by a difficult teacher. if you like the topic they teach, well nice just self-learn it.</p></li><li><p>select courses that make sense for your stamp (aka avoid courses on lunar cycles just because they are easy).</p></li><li><p>try to pick topics that overlap in main subjects to get some transfer learning happening during the study sessions.</p></li><li><p>if you can&#8217;t avoid a difficult teacher, make sure to spread them out so that you don&#8217;t get a whole semester in hell.</p></li></ul><p>don&#8217;t forget that all of the above is subjective.</p><p>some people click better with a teacher than others or maybe you really fucking love lunar cycles.</p><p>just make sure you feel comfortable with your choices here.</p><h3>step 5: understand how the points are removed</h3><p>understand this for each of these beautiful hand-picked organically sourced classes you took ownership of going through.</p><p>it&#8217;s absurdly simple to figure this part out:</p><ol><li><p>read that syllabus and study it.</p></li><li><p>figure out when points are up for deduction and to which proportion.</p></li><li><p>figure out for each point deduction events (i.e. assignment, midterm, take-home, finals, bonus exercise, quizz, attendance, lab) how they are deducted. sometimes you can&#8217;t have all the details too early.</p></li><li><p>whenever applicable try to get examplar of these points deduction event (legally) like an old exam, prior assignment with correction or quizzes.</p></li><li><p>whenever in doubt you go knock on these teacher doors and you flat out ask them.</p></li></ol><p>I can&#8217;t emphasize enough how important this is folks. </p><p>you don&#8217;t have to yolo this. </p><p>it&#8217;s not a mystery how points are removed from your grade; there is a direct reasoning trace behind the grading scheme.</p><p>just get this information and document it for each one of your class.</p><h3>step 6: fortify your fortification</h3><p><em>you study for an assignment or an exam to not have your previous points deducted.</em></p><p>repeat this every time you open these pricey books.</p><p>this means a few things:</p><ul><li><p>the bulk of your study session should be aimed directly at defending your points. don&#8217;t go on to read a random chapter for the sake of it. is this in the grading scheme in any sort of way??? NO??? WHY YOU DO THIS THEN!</p></li><li><p>sorry, but it needs to be emphasized folks, your time is short before the grading chipping start and the stress start growing so make sure that all your study actions are aligned with the point removal ritual.</p></li><li><p>whenever you don&#8217;t know if something is going to be on the exam, just ask man. worst case you are mildly annoying and they are going to say &#8216;maybe&#8217;. you take that as a yes and you move on.</p></li><li><p>move most of your learning to exercises that tightly mirror how you are getting graded. </p><ul><li><p>multichoice knowledge exam? &#8594; flash card</p></li><li><p>open-ended question? &#8594; outcome-based project learning</p></li><li><p>short objective question? &#8594; drill all the exercises multiple times.</p></li></ul></li></ul><p>if there is a topic that resonate with your toddler soul with a vibrant note use it to decompress. but make a clear separation between that type of learning and the stamp type.</p><p>btw as soon as you know that your A is defended 100%, move on to the less defended class. </p><p>there&#8217;s no point in reinforcing the murud-janjira with taller brick walls.</p><h3>step 7: keep your eye on fairness</h3><p>now come the touchy subject which can be culturally very difficult to understand for some.</p><p>here we go:</p><blockquote><p>your teacher is an human being and can make mistake.</p></blockquote><p>oh boy, yes and it happens much more than you think.</p><p>the situation can get very very very sticky very fast if you don&#8217;t have your eye on fairness.</p><p>which means that for every point that is deducted it is your god given imperative to understand why.</p><p>this is one of the few spots in life where you are literally owed an explanation.</p><p>AND BY THE GREAT MONAD UP ABOVE YOU ARE GOING TO GET IT.</p><p>this is the portion of the article where I will emphasize two conflicting things:</p><ol><li><p>you got to stand up for yourself</p></li><li><p>don&#8217;t be an asshole</p></li></ol><p>since the situation can get emotional and very adversarial fast just try your best ok.</p><p>here&#8217;s some guidance:</p><ol><li><p>when you get your corrected assignment/exam look at each of your deductions.</p></li><li><p>try to understand why you got points deducted.</p></li><li><p>if you do understand don&#8217;t snivel take the loss and fortify your defense.</p></li><li><p>if you do not understand you go see whoever corrected it for an explanation. </p></li><li><p>if the explanation has a logical flaw you will need to give your argument as to why that is and get your points back.</p></li><li><p>public forum are great for this if you have a knack for insightful and polite argumentative writing, but it is more emotionally charged for the teacher.</p></li><li><p>whenever a TAs commited a mistake and doesn&#8217;t want to budge, you NEED to escalate to the teacher and get to the bottom of it. doesn&#8217;t matter if it&#8217;s just 2 points these are yours.</p></li></ol><p>this is the hardest realization for a polite and shy student.</p><p>you sometime have to stand the hell up and point to a situation which is unfair and sometime it&#8217;s not a fun discussion to have.</p><p>some ways to alleviate these tricky situations are the following:</p><ol><li><p>whenever you find a potentially ambiguous question in an exam, even if it&#8217;s multiple-choice, always write why you think the question is ambiguous and detail your thought process for the ambiguity.</p></li><li><p>network with your peers to ensure that the fairness was evenly applied and to garner some group power modifier when dealing with a very difficult teacher with a <em>penchant</em> for unfairness.</p></li><li><p>build a relationship with the teacher by not only showing up for your points but also for your questions. they know they can&#8217;t just shoo you away because guess who&#8217;s coming back next week for office hour?</p></li></ol><p>this pay-back crusade in the holy name of fairness is an integral part of the university experience.</p><h3>final step: make use of your reclaimed time for real learning</h3><p>now that you stopped wasting your time on bumbling around for your grade you can invest these newfound hours into something that has a high learning potential like:</p><ol><li><p><strong>get invovled in labs:</strong> you most likely have access to labs in your uni go volunteer in there to witness the limit of human knowledge.</p></li><li><p><strong>get an internship:</strong> the stamp is for getting a job, try to secure that job beforehand my guy.</p></li><li><p><strong>sign up in a student club:</strong> you will be able to network with driven folks, work on cool cv worthy projects and get proximity with the industry.</p></li><li><p><strong>build your own cool projects:</strong> you have access to much more than just gates and stamps in uni. make use of the facility and all your perks in there to make something you are proud of.</p></li></ol><p>anyway hope this useful folks have a fantastic 2025-2026 school year!</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[how to stop feeling lost in tech: the wafflehouse method]]></title><description><![CDATA[you need to sit for this one for 48h]]></description><link>https://www.yacinemahdid.com/p/how-to-stop-feeling-lost-in-tech</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/how-to-stop-feeling-lost-in-tech</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Wed, 20 Aug 2025 20:50:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sTSG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sTSG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sTSG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sTSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg" width="1456" height="964" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:964,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Cora (restaurant) - Wikipedia&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cora (restaurant) - Wikipedia" title="Cora (restaurant) - Wikipedia" srcset="https://substackcdn.com/image/fetch/$s_!sTSG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sTSG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f6c9d93-4947-49dd-ab21-557d5ed9b74a_4928x3264.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>these past few weeks I&#8217;ve received a massive amount of dms from CS students being absolutely lost.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c1Hw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c1Hw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 424w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 848w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 1272w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c1Hw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png" width="779" height="187" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:187,&quot;width&quot;:779,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60389,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171501635?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c1Hw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 424w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 848w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 1272w, https://substackcdn.com/image/fetch/$s_!c1Hw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7b38e46-66a9-491b-85db-c9c5ea857877_779x187.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>like to a level of lost I&#8217;m feeling their stress and anxiety in my own chest.</p><p>on top of that some of you are going through the absolute worst time of your life I don&#8217;t know how you guys are still standing up but I&#8217;m in awe.</p><p>I found myself reiterating the same advice to at least 7 people now, so I&#8217;ll write it all here for those who are too crippled to ask for help.</p><p>hope it&#8217;s useful folks and keep it up.</p><p>never surrender.</p><h3>Hey I&#8217;m in {{UNIVERSITY or JOB}} and I feel lost can you help me learn {{AI/ML/MATH}} so I can get {{DREAM JOB}}</h3><p>I&#8217;m receiving a variation of this, but the common element is always that the person feels completly lost.</p><p>lost because AI will take their job, lost because it seems there is too much to learn, lost because they don&#8217;t know if they actually like this whole tech thing, etc.</p><p>hold on.</p><p>hold it right there.</p><h3>what do you want to be in like 5 years? who are you and what do you do?</h3><p>most had no idea. </p><p>they had the vaguest picture of what they would be doing in 5 years.</p><p>and trying to figure it out by looking at technology is the wrong way to look at it.</p><p>it would be like trying to obsess intensely over hammers, nails, saw to build a house. tech is the same; it&#8217;s just a set of tools to do something.</p><h3>the wafflehouse method &#8482;&#65039;</h3><p>enter my favorite method to get your life untangled, which I call the wafflehouse method (bear with me, I swear this blog post is useful).</p><p>I&#8217;ll detail here how to use it, but remember that untangling your life goals is a deeply personal process. </p><p>take inspiration from this, but feel free to modify so that it feels good.</p><h4>step 0: take two days off in the middle of the week</h4><p>call in sick, cancel your meetings and plans for two days.</p><p>straight in the middle of the week.</p><p>don&#8217;t tell your kids, wife, moms or friends that you are taking off.</p><p>you are going to go on a deeply personal introspective adventure and the only one invited is you.</p><p>48h to find yourself in the whole year is the least you can do.</p><h4>step 1: [day 1] it&#8217;s time to vomit everything at your favorite spot</h4><p>sorry for the image but yes that&#8217;s what you gonna do.</p><p>pick your favorite spot, the one you feel very good to be at, and can spend a whole day by yourself.</p><p>can&#8217;t give you advice on picking a spot just make sure that wherever you are going has deep significance for yourself.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KXHB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KXHB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 424w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 848w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 1272w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KXHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png" width="1060" height="660" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1060,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1405925,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171501635?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KXHB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 424w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 848w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 1272w, https://substackcdn.com/image/fetch/$s_!KXHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e09321d-b5ea-4498-9618-a076bca1c743_1060x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>mine was this bench.</p><p>now sit alone with your notebook and be quiet for a while.</p><p>try to imagine this one image: yourself 5 years from now.</p><p>not as of how you are going to get there from the current trajectory you are following, but how you would feel proud to be.</p><p>imagine it vividly, it may be still very abstract, but try to dive a bit in your mind what is going on in this vision.</p><p>does it have to be tech related?</p><p>no absolutely not, on the contrary.</p><p>imagine yourself at the pinacle where you really feel proudness bursting within your chest. this looks different for everyone. </p><p>try to remove anything attached to other expectations of yourself. doesn&#8217;t matter what your moms and dads want of you if you feel like a husk inside.</p><p>now that you have something to latch upon, write and write and write. write everything, how you feel now, how you feel then, what is different from now, what you regret, the things you think are not realistic, that one hunk of shame you have been carrying over forever, all these fears and hope lay them all down.</p><p>write and write until there is nothing more to be said.</p><p>write it all up until you have colors and shape of things you want to come to pass.</p><p>it doesn&#8217;t have to be clear for now, it just needs to be vivid.</p><p>when you are satisfied with this painting, it&#8217;s time to head home for a good day rest you are going to need it.</p><h4>step 2 : [day 2] enter the wafflehouse</h4><p>now is time to enter the wafflehouse (or whatever regional equivalent you have mine is called cora).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wR2b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wR2b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wR2b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg" width="850" height="850" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:850,&quot;width&quot;:850,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Cora D&#233;jeuners et d&#238;ners Montr&#233;al - Menu, avis &amp; plus ao&#251;t 2025&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cora D&#233;jeuners et d&#238;ners Montr&#233;al - Menu, avis &amp; plus ao&#251;t 2025" title="Cora D&#233;jeuners et d&#238;ners Montr&#233;al - Menu, avis &amp; plus ao&#251;t 2025" srcset="https://substackcdn.com/image/fetch/$s_!wR2b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wR2b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab750c08-7308-43a4-a745-3273343ac1bd_850x850.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>beautiful architecture</p><p>make sure that wherever you are going can tolerate you for a whole day because you are going to be squatting there opening until closing.</p><p>bring you computer, your notepad and this image of yourself.</p><h4>step 3 : [day 2] refine the image into a 5-year goal</h4><p>now that you are comfortably sitting with your beverage of choice it&#8217;s time to refine the image you painted into 2-3 clear goals that happen in 5 years time.</p><p>the goals need to be very specific and detailed.</p><p>if you feel absolutely ridiculous writing them down, it&#8217;s maybe too much, but don&#8217;t hold back. the goals need to match the image.</p><p>write as much detail as you can about each of these goals and make sure that there are as few as possible while still retaining the essence of the image.</p><h4>step 4 : [day 2] time for interpolation on 3-year goals</h4><p>now that you have these 2-3 5-year goals you interpolate what you need to achieve in 3 years time to be on track to hit them.</p><p>it&#8217;s ok if your interpolation is a bit crooked, you just need to trace a course here.</p><p>if you feel the 3 years are ridiculous, feel free to change the 5 years and adjust the knob of your dream self.</p><h4>step 4 [day 2] bring it down down still on 1-year goals</h4><p>now you do the same on the 1-year goal.</p><p>you are for sure going to do some back and forth between the 1-3-5 years adjusting things as you go.</p><p>try to keep the number of goals as few as possible.</p><p>if they sprawl around, it might be best for you to reduce the scope of your image a bit. for instance trying to be a nobel prize winner marathon karateka black belt might be too much.</p><p>focus on one aspect if you see your goals proliferating, the one that is more important to you.</p><p>I know it&#8217;s tough to make a call here, but you have to be somewhat pragmatic at some point.</p><p>btw the goals aren&#8217;t starting in like january 1st, they start now now.</p><h4>step 5 [day 2] 100s of nodes in a directed acyclic graph</h4><p>now you are going into the details for real. you&#8217;re going to make a DAG.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wi-A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wi-A!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 424w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 848w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 1272w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wi-A!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif" width="680" height="410" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:680,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Directed Acyclic Graph (DAG) Overview &amp; Use Cases | Hazelcast&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Directed Acyclic Graph (DAG) Overview &amp; Use Cases | Hazelcast" title="Directed Acyclic Graph (DAG) Overview &amp; Use Cases | Hazelcast" srcset="https://substackcdn.com/image/fetch/$s_!Wi-A!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 424w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 848w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 1272w, https://substackcdn.com/image/fetch/$s_!Wi-A!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ecefb58-5438-4d7d-990c-4ad2615a9d95_680x410.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>they look like this usually notice that there are no cycle (for the rest of the blog view them as the node on the left being at the top and the node on the right being at the bottom).</p><p>at the top it&#8217;s where you are at. put some numbers in there about your current situation (e.g. maybe you are broke).</p><p>at the bottom put your 1-year goals (e.g. maybe you are not broke anymore)</p><p>now in between write as many steps as possible you think you need to take in a year's time to go from your current situation to the end goals (remember there is more than one goal usually) based on what you have right now and the fact that luck is against you.</p><p>this last bit is important. your whole plan is based on the fact that every damn day you will wake up will be a bad day.</p><p>sun is shining exactly 0 times for 5 years.</p><p>create a 100s of these steps that flow logically from your current node to the end node.</p><p>some will be more vague than others. these are your knot points.</p><h4>step 6 [day 2] : unknot the knot points with knotledge</h4><p>for each of your knot points node you should research to the best of your current ability how to unknot them.</p><p>sometime it&#8217;s because it&#8217;s actually 5 nodes mushed together and you don&#8217;t know it&#8217;s 5 steps.</p><p>sometime it&#8217;s because it&#8217;s the wrong step and you are acting on false prior.</p><p>doesn&#8217;t matter try to untangle them as much as possible so that you have 100s of very highly defined step from start to finish.</p><p>as highly super duper detailed as possible.</p><h4>step 7 [day 2] : make your monthly goals</h4><p>this month.</p><p>not the other month which will start in 3 weeks.</p><p>now. this month now now.</p><p>take whatever first steps there is in this directed acyclic graph and put it as stuff to do in the month you have left.</p><p>here you have to be realistic about what you can chew, but remember that these tasks are what bring you closer to what you want to be.</p><p>that image that made you feel good remember?</p><p>yeah that&#8217;s the path you are rolling in now so take as much tasks as you can from the list and put it in the month.</p><p>cancel other stuff if you can to have more time to work on these.</p><h4>step 8 [day 2] : make your weekly goals</h4><p>you thought we would stop at monthly!</p><p>NO!</p><p>we&#8217;re going down still!</p><p>take whatever you picked for the month and bring them into your weekly goals for this week.</p><p>remember when we are now at the wafflehouse? in the middle of the week? well good news you got 3 days left to get these tasks done.</p><p>and before I see any of you start to spend 3 weeks making the perfect notion dashboard dingy NO!</p><p>you take whatever you have right NOW I don&#8217;t care if it&#8217;s on a notebook.</p><p>you spend 0 minutes optimizing your project management here, you take all the time into actually doing the tasks towards a future you are proud of.</p><h4>step 9 [day 2] : make your daily goal</h4><p>you should be getting dangerously close to getting evicted from the wafflehouse so take the little time that is left to look at your weekly goals and pick a few tasks to work on tomorrow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hEI0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hEI0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hEI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg" width="390" height="280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:280,&quot;width&quot;:390,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;10+ Hundred Angry Waitress Royalty-Free Images, Stock Photos &amp; Pictures |  Shutterstock&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="10+ Hundred Angry Waitress Royalty-Free Images, Stock Photos &amp; Pictures |  Shutterstock" title="10+ Hundred Angry Waitress Royalty-Free Images, Stock Photos &amp; Pictures |  Shutterstock" srcset="https://substackcdn.com/image/fetch/$s_!hEI0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hEI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F38192cf0-31a3-47d9-8d08-cb81ee0d280d_390x280.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>the waitress should be looking at you like this.</p><p>at this point thing should be so concrete that even if you were to not remember your yearly goals it wouldn&#8217;t matter at all.</p><p>every day from now on you will have two category of things to do:</p><ol><li><p>stuff that actually brings you closer to where you want to be in life </p></li><li><p>and the other stuff.</p></li></ol><p>things that your mom wants you to do, the stuff that this one friends always put on you or that your boss decided you were responsible for.</p><p>I&#8217;m not saying to throw all of this in the garbage (. . .).</p><p>but you have to make conscious decisions that you have a plan that is yours and that it is up to you to get it done.</p><p>good day bad day.</p><h4>step 10 : step out of the wafflehouse</h4><p>now you&#8217;re done, it&#8217;s time to work and this work is your life's work.</p><p>you don&#8217;t need to parade this mega plan to your relatives it&#8217;s a deeply personal one.</p><p>it really doesn&#8217;t matter because they should start to see the difference anyway since this plan is one where you will transform yourself into what you truly want to be.</p><p>every day before going to bed take some of the stuff from this week and put it in your bucket for the next day.</p><p>when you wake up look at the task your past self is imploring you to get done and oblige.</p><p>every week on a quiet sunday afternoon, tend to this plan like a garden. </p><p>look out at the work done in the week, check what need to be done in the month, revise a bit the directed acyclic graph or the 1-3-5 years goals.</p><p>make the small tweaks based on your learning from the week that passed.</p><p>and when you still feel good about the trajectory and that this image still bring a smile to your soul pick a few tasks for next week.</p><h4>disclaimer:</h4><p>you have to be careful though.</p><p>if you actually do this you get a very stressful side effect:</p><ul><li><p>there is no one to blame but yourself</p></li></ul><p>it&#8217;s not your mom, it&#8217;s not your teacher, it&#8217;s not your friends, it&#8217;s you.</p><p>from that point going forward you have your life in your hands and it&#8217;s time to act like it.</p><div><hr></div><p>if this was useful to you in anyway hit me up periodically with the update.</p><p>you got this.</p>]]></content:encoded></item><item><title><![CDATA[muon optimizer explained to a toddler]]></title><description><![CDATA[there's no way you won't get it]]></description><link>https://www.yacinemahdid.com/p/muon-optimizer-explained-to-a-toddler</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/muon-optimizer-explained-to-a-toddler</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 19 Aug 2025 21:16:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kIl_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>adamw is as of 2025 the most used optimizer to train deep neural network</p><p>adamw is momentum based though</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!545K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!545K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 424w, https://substackcdn.com/image/fetch/$s_!545K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 848w, https://substackcdn.com/image/fetch/$s_!545K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 1272w, https://substackcdn.com/image/fetch/$s_!545K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!545K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png" width="853" height="408" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:408,&quot;width&quot;:853,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:156432,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!545K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 424w, https://substackcdn.com/image/fetch/$s_!545K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 848w, https://substackcdn.com/image/fetch/$s_!545K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 1272w, https://substackcdn.com/image/fetch/$s_!545K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe028d6fb-1869-4a83-b8e2-d8feeb101c65_853x408.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>empirically momentum based optimizer tend to favor few update directions over the rest</p><p>update matrix look suspiciously low-rank</p><p>it&#8217;s like trying to roll a boat down the cost plane</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kIl_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kIl_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 424w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 848w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 1272w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kIl_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png" width="788" height="477" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9377af40-9305-4fdc-976f-ec57a417750f_788x477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:477,&quot;width&quot;:788,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:560996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kIl_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 424w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 848w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 1272w, https://substackcdn.com/image/fetch/$s_!kIl_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9377af40-9305-4fdc-976f-ec57a417750f_788x477.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>muon&#8217;s contributor thinks that if we could dampen these directions and boost rarer ones it would lead to better convergence</p><p>kinda bringing the boat back to ball shape</p><p>muon&#8217;s contributor says that if we run svd on momentum update matrix and throw away the singular value then we get roughly unit vector length in all directions</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!76nR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!76nR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 424w, https://substackcdn.com/image/fetch/$s_!76nR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 848w, https://substackcdn.com/image/fetch/$s_!76nR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 1272w, https://substackcdn.com/image/fetch/$s_!76nR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!76nR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png" width="553" height="549" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:549,&quot;width&quot;:553,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!76nR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 424w, https://substackcdn.com/image/fetch/$s_!76nR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 848w, https://substackcdn.com/image/fetch/$s_!76nR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 1272w, https://substackcdn.com/image/fetch/$s_!76nR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56950569-5c72-43d8-b6c4-4a793ebad904_553x549.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>can&#8217;t run a svd on each update step though too costly</p><p>we don&#8217;t have to be that precise so we can look at faster iterative algorithms</p><p>newtonschulz is one such algorithm</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!218B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!218B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 424w, https://substackcdn.com/image/fetch/$s_!218B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 848w, https://substackcdn.com/image/fetch/$s_!218B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 1272w, https://substackcdn.com/image/fetch/$s_!218B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!218B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png" width="647" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:647,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77354,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!218B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 424w, https://substackcdn.com/image/fetch/$s_!218B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 848w, https://substackcdn.com/image/fetch/$s_!218B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 1272w, https://substackcdn.com/image/fetch/$s_!218B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F719bb54a-d91b-4aa7-a56c-5c029b1c68b2_647x583.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>basically if you take an odd-matrix polynomial and apply it to a matrix it&#8217;s like applying it straight to the singular values while U and V.T are left unchanged</p><p>would be neat to apply the sign function to the singular value because then we would just have 1s on that diagonal</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OU-P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OU-P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 424w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 848w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 1272w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OU-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png" width="564" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:564,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99493,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OU-P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 424w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 848w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 1272w, https://substackcdn.com/image/fetch/$s_!OU-P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff1736d3d-9694-4ec0-ba40-6cafb7701c91_564x618.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>cant use sign because it&#8217;s not a polynomial even though it&#8217;s odd</p><p>we can approximate the sign function though with cubic/quintic polynomial if we apply the function multiple times on itself (not well but close enough around [-1,1])</p><p>5 iterations (or steps) are good enough</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Xx6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Xx6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 424w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 848w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 1272w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Xx6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png" width="910" height="353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:353,&quot;width&quot;:910,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:62432,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Xx6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 424w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 848w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 1272w, https://substackcdn.com/image/fetch/$s_!-Xx6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1d4e6f-a75c-4cb2-a5d3-f898b340d9f3_910x353.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>if we normalize the update matrix between [0,1] we are always sure that the approximation is sign-like and gives us a 1 out</p><p>actually, empirically, we don&#8217;t even need to be at 1 straight, if we are at [0.7,1.3], that&#8217;s good enough</p><p>the cursed quintic gives us one ugly-looking sign function but has all the right characteristics + it&#8217;s fast</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G1YW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G1YW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 424w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 848w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 1272w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G1YW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png" width="848" height="340" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:340,&quot;width&quot;:848,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80224,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G1YW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 424w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 848w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 1272w, https://substackcdn.com/image/fetch/$s_!G1YW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7186b1c9-b570-47a0-960c-57295caa5b32_848x340.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>it&#8217;s parameter are `<code>a, b, c = (3.4445, -4.7750, 2.0315)`</code></p><p>that&#8217;s muon</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9SQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9SQk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 424w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 848w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 1272w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9SQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png" width="528" height="355" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76457db3-3869-4510-b914-6c61e29ad78c_528x355.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:355,&quot;width&quot;:528,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97169,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/171404611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!9SQk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 424w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 848w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 1272w, https://substackcdn.com/image/fetch/$s_!9SQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76457db3-3869-4510-b914-6c61e29ad78c_528x355.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>ps: </p><ul><li><p>nesterov better than sgd-momentum</p></li><li><p>muon only work on 2d hidden dense linear layer </p></li><li><p>so no bias and no embedding</p></li><li><p>convolutional filters are good you just need to put them back in 2d</p></li><li><p>also empirically first and last layers should use adamw</p></li></ul><p>32 min step-by-step breakdown here:</p><div id="youtube2--Cto66pAUXQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;-Cto66pAUXQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/-Cto66pAUXQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>big shout out to SJ and Adrian for supporting the content!</p><p>also</p><p>more paper for deepdive:</p><p>&#128204; overview: https://rixhabh.notion.site/Muon-2302d1eca8f6801c88e3f80e43c316da?pvs=74</p><p>&#128204; Keller blog: https://kellerjordan.github.io/posts/muon/</p><p>&#128204; Jeremy blog: https://jeremybernste.in/writing/deriving-muon</p><p>&#128204; newtonschulz overview: https://docs.modula.systems/algorithms/newton-schulz/</p>]]></content:encoded></item><item><title><![CDATA[Next Frontier for LLM is Quality Long Context]]></title><description><![CDATA[Which is much harder than you think.]]></description><link>https://www.yacinemahdid.com/p/next-frontier-for-llm-is-quality</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/next-frontier-for-llm-is-quality</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 26 May 2025 13:02:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5rZo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With the recent release of new frontiers LLMs in 2025, three things became increasingly clear:</p><ol><li><p>Code generation via LLM is very valuable.</p></li><li><p>Coding benefits from improved long context.</p></li><li><p>Long context is still hard to get right.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5rZo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5rZo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 424w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 848w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 1272w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5rZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png" width="823" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:823,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5rZo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 424w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 848w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 1272w, https://substackcdn.com/image/fetch/$s_!5rZo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fca89a9-5360-4399-abdd-17d9ec85401e_823x540.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Coding + long context might also be why we are seeing that much increase in tokens processed at least at Google.</figcaption></figure></div><p>It&#8217;s also not just about having a very long context LLM because you can theoretically get it right just by having an aggressive positional embedding interpolation and a hybrid architecture that makes it possible (like Scout).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ReY_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ReY_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 424w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 848w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 1272w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ReY_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png" width="732" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:732,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129424,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/164248613?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ReY_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 424w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 848w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 1272w, https://substackcdn.com/image/fetch/$s_!ReY_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57fe0b52-4bc2-4458-a75c-d6e45a99f4b6_732x418.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">10M is a big jump in capability, but performance wise it&#8217;s been a bit lackluster.</figcaption></figure></div><p>There are multiple reasons why long context is hard:</p><ol><li><p>It&#8217;s difficult to find quality and coherent long context data that is relevant for users. The benchmarks on this front are still in their early days.</p></li><li><p>The time complexity of self-attention does make training with large input cumbersome. Using linear attention mechanisms can remove that bottleneck, but it has a performance drawback.</p></li><li><p>Training on smaller average input lengths and testing on a larger one has a sharp performance decline in traditional RoPE-based systems. Positional interpolation helps, but there are limits.</p></li></ol><p>More about this on my recent review of the subject here:</p><div id="youtube2-PqTYjSbnP9o" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;PqTYjSbnP9o&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/PqTYjSbnP9o?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>From RAG-based system to coding in mature repositories, the clear business case for LLM is in being able to ingest a large amount of context without breaking and produce quality output. The best models at the moment struggle beyond a certain point.</p><p>From what I&#8217;ve seen and read, the field will most likely figure out the recipe to create models with <strong>high-quality 1M- 10</strong>M context length in 2025. Yes, Scout can theoretically already do 10M, but I would qualify that as low-quality context. </p><p>The solution will most likely involve heavy use of hybridization of the layers, both for positional embedding (like <a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/">Scout</a> mix of RoPE and NoPE) and attention layers (like in <a href="https://arxiv.org/abs/2501.08313">Minimax</a> mix of regular and linear attention).</p><p>It&#8217;s a hard problem, but after watching that full interview with Nikolay Savinov from DeepMind, I think the path is somewhat clear:</p><div id="youtube2-NHMJ9mqKeMQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;NHMJ9mqKeMQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/NHMJ9mqKeMQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The emphasis on having solid inference work being done to make the model work makes me think that Google might still use a non-linear attention mechanism, and that a breakthrough is currently being researched on that front to unlock useful 10M context length.</p><p>In all cases, I think this concerted effort to increase context length will have a tremendous impact in multiple AI use cases like RAG-based systems, reasoning models, and coding agents. It will most likely fuel the next big phase of intelligence growth in LLMs, but will for sure require new ideas.</p><div><hr></div><p>Interested in AI Engineering? Do check out the <a href="https://scrimba.com/the-ai-engineer-path-c02v?via=yacineMahdid">AI courses at Scrimba</a>; these are pretty solid for beginners!</p><p>Have a great week &#128075;</p>]]></content:encoded></item><item><title><![CDATA[How Minimax-01 Achieves 1M Token Context Length with Linear Attention (MIT)]]></title><description><![CDATA[I've dig into the internal of an MIT licensed MoE system that makes use of Linear Attention (Lightning Attention) to extend it's context length to 1M input tokens.]]></description><link>https://www.yacinemahdid.com/p/how-minimax-01-achieves-1m-token</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/how-minimax-01-achieves-1m-token</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 01 Apr 2025 19:34:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kj1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve recently dug into <a href="https://arxiv.org/pdf/2202.10447">linear attention</a> to understand why these attention mechanisms haven&#8217;t gained much steam compared to softmax attention (which has a quadratic algorithmic complexity).</p><p>They should theoretically be better since they allow for larger context length.</p><p>However, <a href="https://arxiv.org/abs/2205.14135">FlashAttention</a> managed to get some more juice out of the quadratic softmax attention and extend the context length by sheer GPU optimization (which is great).</p><p>This is where I got into <a href="https://arxiv.org/pdf/2405.17381">Lightning Attention</a>, which cleverly mixes both the FlashAttention type of optimization and the linear attention type of algorithm. </p><p>A large-scale productionalization of lightning attention was recently published by <a href="https://arxiv.org/abs/2501.08313">Minimax</a> as an <a href="https://github.com/MiniMax-AI/MiniMax-01?tab=MIT-1-ov-file">MIT-licensed model (open-weight)!</a></p><p>With some pretty neat results!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kj1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kj1t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 424w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 848w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1272w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png" width="1319" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1319,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/160255672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kj1t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 424w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 848w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1272w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s go through the high-level theory here and figure out how the authors put together a large-scale linear attention model.</p><h2><strong>TLDR: The Main Findings</strong></h2><p>The Minimax-01 model integrates Lightning Attention with a Mixture of Experts that has it&#8217;s GPU usage optimized. Lightning Attention is hybridized with a 8:1 ratio with softmax attention. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1gI0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1gI0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 424w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 848w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 1272w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1gI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png" width="609" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:609,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89078,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/160255672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1gI0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 424w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 848w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 1272w, https://substackcdn.com/image/fetch/$s_!1gI0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05507377-d013-43c9-b83c-23cd02af7d4c_609x756.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This combination allows for:</p><ul><li><p><strong>Big Context Lengths:</strong> Handling sequences of up to 4 million tokens.</p></li><li><p><strong>Efficient Scaling:</strong> Combining linear attention with MoE to achieve massive scale without compromising performance. A whole lot of GPU optimization had to go behind the scenes to make it happen (chapter 3 of the paper)</p></li><li><p><strong>On-par with closed source:</strong> The model managed to be on-par with most benchmark closed-source models using softmax and excel at large context length tasks.</p></li></ul><p>The model is huge, with 460 billion parameters, and they even managed to slap a vision module on top to make it multimodal.</p><h3><strong>The Problem with Softmax Attention</strong></h3><p>Most leading large transformer-based models are making use of softmax attention.</p><p>This type of attention has great performance. However, it has <strong>quadratic computational complexity</strong> with respect to the sequence length. This means that as the input sequence grows, the computational and memory requirements increase dramatically.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k5lk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k5lk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 424w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 848w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 1272w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k5lk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png" width="1456" height="713" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:713,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Understanding The Self-Attention Mechanism&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Understanding The Self-Attention Mechanism" title="Understanding The Self-Attention Mechanism" srcset="https://substackcdn.com/image/fetch/$s_!k5lk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 424w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 848w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 1272w, https://substackcdn.com/image/fetch/$s_!k5lk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f41a56a-8b4a-4fbb-9124-82c01f95fc35_3399x1665.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Cool illustration taken from this blog post:</em></p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:144158040,&quot;url&quot;:&quot;https://newsletter.theaiedge.io/p/understanding-the-self-attention&quot;,&quot;publication_id&quot;:1238074,&quot;publication_name&quot;:&quot;The AiEdge Newsletter&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9c582-b22b-45c5-a64e-a9105824fb01_1067x1067.png&quot;,&quot;title&quot;:&quot;Understanding The Self-Attention Mechanism&quot;,&quot;truncated_body_text&quot;:null,&quot;date&quot;:&quot;2024-04-30T15:01:25.765Z&quot;,&quot;like_count&quot;:15,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:24785675,&quot;name&quot;:&quot;Damien Benveniste&quot;,&quot;handle&quot;:&quot;damienbenveniste&quot;,&quot;previous_name&quot;:&quot;Damien&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2157884e-8c5d-4cde-ab33-455fa623975d_2060x2061.jpeg&quot;,&quot;bio&quot;:&quot;I specialize in building large scale end to end Machine Learning capabilities. After a PhD in Physics, I have been a Data Scientist, ML Engineer and Software Engineer for the past 10 years. Until recently, I was a Machine Learning Tech Lead at Meta.\n&quot;,&quot;profile_set_up_at&quot;:&quot;2022-12-12T05:52:55.920Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1194547,&quot;user_id&quot;:24785675,&quot;publication_id&quot;:1238074,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:1238074,&quot;name&quot;:&quot;The AiEdge Newsletter&quot;,&quot;subdomain&quot;:&quot;aiedge&quot;,&quot;custom_domain&quot;:&quot;newsletter.theaiedge.io&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A newsletter for continuous learning about Machine Learning applications, Machine Learning System Design, MLOps, the latest techniques and news. \nSubscribe and receive a free Machine Learning book PDF!&quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f6e9c582-b22b-45c5-a64e-a9105824fb01_1067x1067.png&quot;,&quot;author_id&quot;:114530889,&quot;theme_var_background_pop&quot;:&quot;#121BFA&quot;,&quot;created_at&quot;:&quot;2022-12-12T05:18:13.224Z&quot;,&quot;email_from_name&quot;:&quot;&#128075;  Damien from the AiEdge&quot;,&quot;copyright&quot;:&quot;AiEdge&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false}}],&quot;twitter_screen_name&quot;:&quot;TheAIEdge&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://newsletter.theaiedge.io/p/understanding-the-self-attention?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!kRD-!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6e9c582-b22b-45c5-a64e-a9105824fb01_1067x1067.png" loading="lazy"><span class="embedded-post-publication-name">The AiEdge Newsletter</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Understanding The Self-Attention Mechanism</div></div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 years ago &#183; 15 likes &#183; Damien Benveniste</div></a></div><p>This quadratic complexity becomes a severe bottleneck when dealing with long sequences, like those found in document-level tasks, long-form question answering, or high-resolution images. </p><h3><strong>Addressing the Bottleneck with FlashAttention</strong></h3><p>To tackle the softmax attention bottleneck, Dr. Tria Dao introduced FlashAttention in 2022, a method that aims to make attention mechanisms more efficient. </p><p>The core realization was that most of the inefficiency in softmax attention was in reading and writing to memory. Not calculating attention per se.</p><p>This means that by just doing a better job at managing the GPU, massive gains could be had with noting changing on the model side.</p><p>FlashAttention did that by:</p><ol><li><p><strong>IO-Aware Computation:</strong> It optimizes the attention computation by minimizing data movement between the GPU&#8217;s high-bandwidth memory (HBM) and on-chip SRAM, leveraging tiling techniques to reuse data efficiently.</p></li><li><p><strong>Exact Attention Calculation:</strong> Unlike some approximate methods, FlashAttention computes the exact attention, ensuring no loss in model quality.Which was a pretty big deal, effectively a free lunch!</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e5lS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e5lS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 424w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 848w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e5lS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg" width="1185" height="471" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:471,&quot;width&quot;:1185,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!e5lS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 424w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 848w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!e5lS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83786f2c-4026-4ce3-b243-5746a69bb44c_1185x471.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When compared to even the linear attention mechanisms, FlashAttention was able to manage larger sequence lengths with less memory footprint by being just more efficient with handling the GPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l1RJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l1RJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 424w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 848w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l1RJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg" width="1185" height="366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:366,&quot;width&quot;:1185,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!l1RJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 424w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 848w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!l1RJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69570eb6-8c06-45d4-afb7-32b5abf07c02_1185x366.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Main Impact:</strong></p><ul><li><p><strong>Speed Improvements:</strong> FlashAttention significantly speeds up the attention computation, making it feasible to train models with longer sequences.</p></li><li><p><strong>Memory Efficiency:</strong> It reduces the memory requirements, allowing for larger batch sizes and models.</p></li></ul><p>This was then integrated in large models and the methods for improving attention GPU utilization were researched some more. We even got <a href="https://arxiv.org/abs/2307.08691">FlashAttention2</a>.</p><h3><strong>Where Linear Attention Falls Short</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xYk4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xYk4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 424w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 848w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 1272w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xYk4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png" width="1159" height="569" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:569,&quot;width&quot;:1159,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75947,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/160255672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xYk4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 424w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 848w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 1272w, https://substackcdn.com/image/fetch/$s_!xYk4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e013419-ca4f-4d6c-8a63-b7d5d8ccecbc_1159x569.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Difference between softmax attention and linear attention.</em></p><p>While linear attention methods like <strong>Linformer</strong> and <strong>Performer</strong> offer theoretical computational advantages, they often faced challenges in practice:</p><ol><li><p><strong>Performance Gap:</strong> Linear attention models struggled to match the performance of softmax-based Transformers, particularly in tasks requiring complex reasoning or long-range dependencies. Especially after the optimization from FlashAttention!</p></li><li><p><strong>Training Inefficiency:</strong> Some linear attention methods, especially those relying on cumulative summation (cumsum), suffer from inefficient training due to sequential dependencies, which hampers their scalability. Making their theoretical advantage essentially useless when practical GPU implementation is needed.</p></li></ol><h3><strong>Combining Both with Lightning Attention</strong></h3><p><strong>Lightning Attention</strong> addresses the shortcomings of both traditional and linear attention mechanisms. </p><p>It achieves this by doing 3 things:</p><ol><li><p><strong>Hybrid Approach:</strong> Lightning Attention divides the attention computation into intra-block and inter-block operations. It uses conventional attention for intra-blocks and linear attention for inter-blocks, effectively combining the strengths of both approaches. It&#8217;s leveraging a divide-and-conquer methodology.</p></li><li><p><strong>Tiling Technique:</strong> By adopting a tiling strategy similar to FlashAttention, Lightning Attention minimizes the need for cumsum operations, overcoming the training inefficiency of traditional linear attention. That&#8217;s where the intra and inter blocks come in. Do check this <a href="https://youtu.be/Q3GgbfGTnVc?si=mupxICLDElGrtc8a">nice video for a refresher on that methodology.</a></p></li><li><p><strong>IO-Friendly Implementation:</strong> The method is designed to be hardware-efficient, leveraging GPU parallelism and minimizing memory access overhead.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sDku!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sDku!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sDku!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sDku!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sDku!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sDku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg" width="699" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:699,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!sDku!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sDku!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sDku!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sDku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef620da8-993a-48ec-bf00-ccfe2cf8b16b_699x600.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 4: Lightning Attention splits the attention computation into intra-block and inter-block operations, using different strategies for each.</p><h2><strong>The Core Results of the Minimax-01 Model</strong></h2><p>The <strong>Minimax-01</strong> paper introduces a model architecture that integrates Lightning Attention with a Mixture of Experts (MoE) to achieve scalability and good performance. </p><p>Interestingly, it&#8217;s not a full lightning attention model but rather a hybrid one. In a proportion of 1 softmax attention block for every 7 lightning attention blocks!</p><p>When only lightning attention is being used, the model falls short in retrieval tasks like needle-in-a-haystack.</p><p>Here are the key highlights:</p><ol><li><p><strong>Context Window Expansion:</strong> The model supports context windows of up to <strong>1 million tokens</strong> during training and can extrapolate to <strong>4 million tokens</strong> during inference.</p></li><li><p><strong>Scalability:</strong> The model scales efficiently, with <strong>456 billion total parameters</strong> and <strong>45.9 billion activated parameters per token</strong>. However, it requires quite a recipe to get it right. The full recipe is given in sections 4 and 5 for the pre-post-training and wow.</p></li><li><p><strong>Performance:</strong> The model matches the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while having a much larger context window (<strong>20-32 times longer)</strong>.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kj1t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kj1t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 424w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 848w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1272w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png" width="1319" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1319,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134792,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/160255672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!kj1t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 424w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 848w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1272w, https://substackcdn.com/image/fetch/$s_!kj1t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc50b5f0c-9739-49f7-b107-774761219f6b_1319x663.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>3 Key Takeaways for Transformer-Based Deep Learning</strong></h2><ol><li><p><strong>Attention Mechanism Evolution:</strong> These results demonstrate that hybrid approaches can overcome the limitations of traditional and linear attention mechanisms. Expect to see more innovations that blend different attention strategies.</p></li><li><p><strong>Scalability and Efficiency:</strong> As models grow in size and complexity, efficient scaling solutions like MoE and linear attention will become increasingly important. These techniques will be crucial for developing models that are both powerful and sustainable. It also seems like hybridization of different techniques makes a lot of sense to strike the right scale.</p></li><li><p><strong>Long-Context Understanding:</strong> The ability to process longer sequences might change how some applications that are making use of RAG systems are setup.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i_JV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i_JV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 424w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 848w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i_JV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg" width="1344" height="516" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:516,&quot;width&quot;:1344,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!i_JV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 424w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 848w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!i_JV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3eb2d9c0-20c3-4e21-bd97-e3512f190dcd_1344x516.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Figure 6: This NIAH figure was pretty funny, haha.</p><p>Overall, pretty cool methodology.</p><p>There seems to be something fundamental about softmax that is still required to make large language models perform, but linear attention mechanisms bring a new perspective to these models.</p><p>I wouldn&#8217;t be surprised if the new Gemini 2.5 integrated linear attention in some way!</p>]]></content:encoded></item><item><title><![CDATA[You shouldn't build fully autonomous agent.]]></title><description><![CDATA[That's a big no no.]]></description><link>https://www.yacinemahdid.com/p/you-shouldnt-build-fully-autonomous</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/you-shouldnt-build-fully-autonomous</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 10 Mar 2025 15:34:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BhHQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>That&#8217;s what the folks at HuggingFace have been concluding in their latest paper &#8220;<a href="https://huggingface.co/papers/2502.02649">Fully Autonomous AI Agents Should Not be Developed</a>&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BhHQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BhHQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 424w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 848w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 1272w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BhHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png" width="961" height="387" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:387,&quot;width&quot;:961,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:213581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/158774868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BhHQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 424w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 848w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 1272w, https://substackcdn.com/image/fetch/$s_!BhHQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc2ca992-edd1-4d22-b36b-64c83904aefe_961x387.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two elements are surprising in the paper:</p><p><strong>The first surprising information, </strong>is obviously the claim that you shouldn&#8217;t build a fully autonomous agent. These are systems that can create and subsequently use their own code in order to achieve their own task.</p><p>Here we aren&#8217;t talking about coding agents, these are still fine. A fully autonomous agent isn&#8217;t bounded by a set of tools and can use coding agents in order to build more tools to use.</p><p>Huggingface researchers even did a full grid analysis on multiple dimensions. No matter which dimension you want to remove, the risk-reward ratio for a fully autonomous agent isn&#8217;t looking too good.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jU9d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jU9d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 424w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 848w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 1272w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jU9d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png" width="802" height="891" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f216979-61c1-449a-a9c3-ed13193249be_802x891.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:891,&quot;width&quot;:802,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:373478,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/158774868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jU9d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 424w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 848w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 1272w, https://substackcdn.com/image/fetch/$s_!jU9d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f216979-61c1-449a-a9c3-ed13193249be_802x891.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Like all things engineering, the sweet spot seems to always start with the simplest possible solution and move up the complexity later if needs be.</p><p>This sentiment from HuggingFace is surprisingly also echoed in an interview from <a href="https://www.youtube.com/watch?v=LP5OCa20Zpg&amp;ab_channel=Anthropic">Anthropic</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5lT-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5lT-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 424w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 848w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 1272w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5lT-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png" width="1305" height="752" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:752,&quot;width&quot;:1305,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1068048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/158774868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5lT-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 424w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 848w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 1272w, https://substackcdn.com/image/fetch/$s_!5lT-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d1a55ce-6892-4765-b7ab-f9e14e924582_1305x752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I don&#8217;t know if these two engineers have been briefed before the interview, but their candidness about the fact that agent are not really working in production is hilarious. Peak AI comedy, highly recommend watching the full thing.</p><p><strong>The second</strong> surprising thing is more semantic, there is no clear definition of what an agent is. </p><p>You might be saying &#8220;Yes, it&#8217;s because of these pesky AI companies hyping up the AI field with non-sense&#8221; and you would be partially right.</p><p>The issue runs much deeper than that, with this quote being the eye opener:</p><blockquote><p>The question what is an agent? is embarrassing for the agent- based computing community in just the same way that the question what is intelligence? is embarrassing for the mainstream AI community. <br><br>The problem is that although the term is widely used, by many people working in closely related areas, it defies attempts to produce a single universally accepted definition</p></blockquote><p>This quote is from <em>Wooldridge &amp; Jennings in their book from 1995</em>.</p><p><strong>1995</strong>, folks.</p><p>This term is at the same level of murkiness as AGI, intelligence or consciousness at this point.</p><p>Which I believe requires engineers working in that field to use much more precise words about what they are building (workflows, prompt-chaining, orchestrator-evaluators, etc.).</p><p>A good place to start learning this semantic is <a href="https://www.anthropic.com/engineering/building-effective-agents">this blog post by Anthropic</a>, which is very clear and devoid of hype (it&#8217;s related to the interview mentioned earlier):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rwhk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rwhk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 424w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 848w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 1272w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rwhk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png" width="1456" height="683" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:683,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:177311,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.yacinemahdid.com/i/158774868?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rwhk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 424w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 848w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 1272w, https://substackcdn.com/image/fetch/$s_!rwhk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90ff5e7b-36c8-4f87-a189-799924f00284_1784x837.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The team has been gracious enough to provide a lot example (text and coding) of what each of the patterns looks like in what they call workflows.</p><p>btw if you are new to agent, workflow, or agentic systems, I&#8217;ve made a tutorial on it last week with the resources above:</p><div id="youtube2-UMYKjT9exb4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;UMYKjT9exb4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/UMYKjT9exb4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><em>ps: if you know a thumbnail guy, send me a DM mines suck.</em></p><div><hr></div><p>All in all, there isn&#8217;t anything too new here on the software engineering side with the statement from Hugginface.</p><p>The old adage still holds in the stochastic realm of AI engineering:</p><blockquote><p>Build the least complex solution as possible that meets the requirements.</p></blockquote><p>If you want to test new tech, like a swarm of AI agents, do it in a hobby project, not in production.</p>]]></content:encoded></item><item><title><![CDATA[AI Engineering is Stochastic Software Development]]></title><description><![CDATA[It's not related to ML.]]></description><link>https://www.yacinemahdid.com/p/ai-engineering-is-stochastic-software</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/ai-engineering-is-stochastic-software</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Fri, 21 Feb 2025 16:11:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bO-X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been surprised by how many people think that AI engineering is related in some way to machine learning.</p><p>There is very very little in common between the two fields.</p><p>If we were to do the concentric circle diagram it would be the red X:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bO-X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bO-X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 424w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 848w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 1272w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bO-X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png" width="1028" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1028,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:513013,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bO-X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 424w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 848w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 1272w, https://substackcdn.com/image/fetch/$s_!bO-X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4bb2ea0-f8c9-4fa4-abb0-317cdd416cc5_1028x670.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">the arrow is a JS async request</figcaption></figure></div><p>I&#8217;ll explain what I mean by that and why this confusion happens.</p><p>btw, if you are interested in learning more about AI engineering, like in a hands-on manner, <a href="https://scrimba.com/the-ai-engineer-path-c02v?via=yacineMahdid">do check out this great course by Scrimba</a>!</p><h2>Why are they not related?</h2><p>They should be, at a glance, because we just showed that generative AI is within the concentric circle right under deep learning.</p><p>However, an AI engineer isn&#8217;t touching something fundamental in machine learning which is the weights of the model.</p><p>Generally, an AI engineer will modulate the input to a trained foundational model to gather the output they need.</p><p>In some rare occurrences, the AI engineer will dip their toe in the machine learning world by finetuning the model.</p><p>But the models are so big now and so complicated to train thanks to being mostly a mixture of experts that a smart AI engineer avoids this like the plague.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fxrd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fxrd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 424w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 848w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 1272w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fxrd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png" width="803" height="682" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:682,&quot;width&quot;:803,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:143711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fxrd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 424w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 848w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 1272w, https://substackcdn.com/image/fetch/$s_!fxrd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08b4c21-8859-4f1d-9505-6b77d366eadb_803x682.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Best of luck finetuning this soup of linear algebra lol</figcaption></figure></div><p>By the time you are done finetuning, there is a new model out there that would make your use case easier to build.</p><p>Truth is, with one or multiple foundational models you can stitch up solutions that are highly intricate and useful.</p><h2>Why are they confused together?</h2><p>They are confused together for three main reasons:</p><ol><li><p>People who don&#8217;t know how to program see this as, rightfully, the same bucket as ML magic.</p></li><li><p>The techniques for AI engineering look from the outside so complicated that they are intuitively put in the same bucket as the rest of ML.</p></li><li><p>There is some element of &#8220;randomness&#8221; in classical machine learning systems that is equated to the tool the AI engineers are putting together.</p></li></ol><p>But, when you look at the differences in the workflow the AI engineer goes through versus the machine learning engineer/scientist it&#8217;s very clear that they are at opposite ends of a research/engineering spectrum.</p><p>Machine learning starts with data, involves the its study, transformation, training of a model to generate the desired output.</p><p>It finish with a touch of engineering with the productionalization of the trained model to do the same flow with the customer.</p><p>AI engineering starts with a trained foundational model (aka big parameters), they then use (or not) the data from the customer to orchestrate a solution to generate something of value. This system is then deployed into production.</p><h2>AI Engineering is Software Development of the Stochastic sort.</h2><p>This is a very weird thing to say, but it does feel like that.</p><p>You are dealing with the equivalent of a black box API, in which the documentation is being studied actively and where the output changes for no apparent reason (even if the inputs stay the same).</p><p>This black box API however is very good at doing a whole lot of things since it has internet-scale knowledge of all things human.</p><p>It acts surprisingly, accepts any input, and always outputs something (no matter how ridiculous the request is).</p><p>Managing this stochasticity, this uncertainty, while wiring the input/output in a way that generates value is the core skill of an AI engineer.</p><p>Taming this stochasticity while balancing automation is currently a very difficult thing to do. Should you automate a whole workflow using these foundational AI systems or use them more as a co-pilot?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cLt1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cLt1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 424w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 848w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 1272w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cLt1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png" width="874" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:874,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:246205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cLt1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 424w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 848w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 1272w, https://substackcdn.com/image/fetch/$s_!cLt1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbfc156e-3151-4a17-9ef1-17b0a60eedec_874x843.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hugging face says no to automation</figcaption></figure></div><p>Being able to balance the value, with the risks of generation is at the core of AI engineering.</p><h1>How to get good at AI engineering?</h1><p>Build AI solutions. That&#8217;s it.</p><p>It would be like asking how to get good at Frontend development. </p><p>Should you start by learning the deep computer science theory underlying the DOM, browser, and networking?</p><p>Yes, at some point. But to start you should build a bunch of frontend solutions!</p><p>Good luck! &#128122;</p>]]></content:encoded></item><item><title><![CDATA[Learning to program? In 2025? Why??]]></title><description><![CDATA[AI can do it for you!]]></description><link>https://www.yacinemahdid.com/p/learning-to-program-in-2025-why</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/learning-to-program-in-2025-why</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Wed, 19 Feb 2025 16:11:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nvhm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is going to be a sensitive, nuanced, and complicated topic:</p><blockquote><p>Should you learn to program in 2025?<br><br><em>hell no - Jensen Huang CEO of NVIDIA</em></p></blockquote><p>Especially given two market shifts:</p><ol><li><p>Increase adoption of AI to write code.</p></li><li><p>It is difficult to find a job as a programmer.</p></li></ol><p>Are the two things linked? Maybe a bit, but I see them as two separate events intertwined by narratives.</p><p>Here are my thoughts&#8230; </p><p>spoiler: you should and if you want to get started with the <strong>coding aspect</strong> check out this Scrimba course: <a href="https://scrimba.com/learn-python-c03?via=yacineMahdid">learn Python (5.6h and free)</a>.</p><h1>First off&#8230; What&#8217;s coding and what&#8217;s programming?</h1><blockquote><p>Coding is the process of writing instructions for a program, while programming is the process of designing and developing a program.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nvhm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nvhm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 424w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 848w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 1272w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nvhm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png" width="464" height="479.01176470588234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:680,&quot;resizeWidth&quot;:464,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Coding Vs Programming For Beginners: What Is The Difference? - Goodcore&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Coding Vs Programming For Beginners: What Is The Difference? - Goodcore" title="Coding Vs Programming For Beginners: What Is The Difference? - Goodcore" srcset="https://substackcdn.com/image/fetch/$s_!nvhm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 424w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 848w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 1272w, https://substackcdn.com/image/fetch/$s_!nvhm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c13bca0-0ff0-4f87-b076-2e5c93f4c3cb_680x702.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><p>You take any experienced enough programmer and they can feel this difference in their bone.</p><p>The junior programmer usually has no idea it&#8217;s two separate things. Coding and programming are interchangeable term for sitting in front of the computer and writing lines of code (which may or may not work).</p><p>A truly senior programmer will be able to look at all the context fed through their senses and start formalizing and refining a solution.</p><p>This solution will be reworked and improved while they pluck context and add it to the requirements and constraints.</p><p>Once this flow starts, there is a moment where everything clicks in the senior programmer's head. They feel satisfaction and they &#8220;know&#8221; in their gut that this system is &#8220;right&#8221;.</p><p>The number of lines of code written could be 0, yet they feel they achieved their purpose because they understand that the coding aspect will be a deterministic process.</p><p>Yes, there is back and forth between that planning and the coding, but in general it follows an overarching blueprint.</p><h1>What LLMs are good at?</h1><p>LLMs right now are great at producing code in constrained environments. You still need to gear them in the right direction, sometime they do hallucinate, but if you have a good enough vision of what you want to build they are helpful co-pilots.</p><p>The <strong>more precise the context</strong>, the <strong>more generally solvable the tas</strong> <strong>the better they are at generating</strong> a solid working piece of code.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ybck!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ybck!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 424w, https://substackcdn.com/image/fetch/$s_!ybck!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 848w, https://substackcdn.com/image/fetch/$s_!ybck!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 1272w, https://substackcdn.com/image/fetch/$s_!ybck!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ybck!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png" width="988" height="609" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:609,&quot;width&quot;:988,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192332,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ybck!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 424w, https://substackcdn.com/image/fetch/$s_!ybck!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 848w, https://substackcdn.com/image/fetch/$s_!ybck!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 1272w, https://substackcdn.com/image/fetch/$s_!ybck!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3f3460-0444-40d5-a9b1-f85887ed2cd8_988x609.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">for better or for worse hahaha</figcaption></figure></div><h1>What they aren&#8217;t good at?</h1><p>LLMs aren&#8217;t going to figure out for you what you should be building.</p><p>They won&#8217;t figure out by themselves the unseen context that is business critical and add it to the requirements.</p><p>They won&#8217;t understand what context has become suddenly irrelevant.</p><p>They won&#8217;t plan and formalize the solution in a way that makes sense for the specific stakeholders in your business.</p><h1>You need a programmer for that.</h1><p>Programming is formalized problem-solving. </p><p>There is a whole art that is applied to make sure that whatever information problem we need to solve as humans can be deterministically achieved by a computer.</p><p>The part that is complex here, is the understanding of the problem in all it&#8217;s constraints and requirements.</p><p>Being able to articulate that clearly and succinctly is what a good programmer does.</p><p>Do you remember what&#8217;s needed for an LLM to generate accurate code?</p><blockquote><p>Context.</p></blockquote><p>So logically yes, you should 100% become good at programming. Do use LLM co-pilot for coding as you develop your programming skills.</p><p>But put your big brain on focusing on the user needs and on having stellar communication skills with other humans.</p><p>That&#8217;s what good programmers already do.</p>]]></content:encoded></item><item><title><![CDATA[Going too deep in R1...]]></title><description><![CDATA[Why Deepseek R1 KL divergence looks like that?]]></description><link>https://www.yacinemahdid.com/p/going-too-deep-in-r1</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/going-too-deep-in-r1</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 17 Feb 2025 16:11:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IxuW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been studying in-depth deepseek&#8217;s R1 this past month and I like the paper.</p><p>Something about it is just joyful from a research perspective, especially the level of transparency displayed.</p><p>One element that caught my eye is their implementation of KL divergence as a penalty term in their GRPO formula:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IxuW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IxuW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 424w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 848w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 1272w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IxuW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png" width="1155" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:1155,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:152202,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IxuW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 424w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 848w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 1272w, https://substackcdn.com/image/fetch/$s_!IxuW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a439a1-f32a-4e7f-b31b-8b742fd1fe7e_1155x552.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Try to find the KL divergence in there</figcaption></figure></div><p>For those who don&#8217;t know, KL Divergence is simply a statistic that measures the difference between two probability distributions. It&#8217;s tightly related to cross-entropy and entropy.</p><p>It&#8217;s used in R1 to &#8220;anchor&#8221; the model that is learning to reason with a &#8220;reference&#8221; model (Deepseek V3 in that case). </p><p>This means that if the policy model starts to behave very differently than the reference model it needs to have a strong performance to justify it.</p><p>This term is used differently in PPO vs GRPO:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k30K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k30K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 424w, https://substackcdn.com/image/fetch/$s_!k30K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 848w, https://substackcdn.com/image/fetch/$s_!k30K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 1272w, https://substackcdn.com/image/fetch/$s_!k30K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k30K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png" width="1456" height="683" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:683,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293147,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k30K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 424w, https://substackcdn.com/image/fetch/$s_!k30K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 848w, https://substackcdn.com/image/fetch/$s_!k30K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 1272w, https://substackcdn.com/image/fetch/$s_!k30K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1a05e88-86c8-484c-8e77-bd5548c4f813_1458x684.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first thing to notice is that they shifted KL away from the reward term compared to PPO and directly into the policy model training.</p><p>In PPO the formula where KL divergence is being applied looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C2Zp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C2Zp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 424w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 848w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 1272w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C2Zp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png" width="1289" height="366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:366,&quot;width&quot;:1289,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:297134,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C2Zp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 424w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 848w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 1272w, https://substackcdn.com/image/fetch/$s_!C2Zp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bfdb006-1476-42aa-a1a4-8c1048b7f515_1289x366.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">As you can see it&#8217;s within the reward.</figcaption></figure></div><p>In the GRPO the KL portion looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EttB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EttB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 424w, https://substackcdn.com/image/fetch/$s_!EttB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 848w, https://substackcdn.com/image/fetch/$s_!EttB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 1272w, https://substackcdn.com/image/fetch/$s_!EttB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EttB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png" width="1042" height="198" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:198,&quot;width&quot;:1042,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121549,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EttB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 424w, https://substackcdn.com/image/fetch/$s_!EttB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 848w, https://substackcdn.com/image/fetch/$s_!EttB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 1272w, https://substackcdn.com/image/fetch/$s_!EttB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc13f4eb7-2400-40bc-9a3d-b2c2183b7a8e_1042x198.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">This is directly negatively applied to R1 policy optimization.</figcaption></figure></div><p>Widely different implementation!</p><p>You might just nod and say &#8220;Fair enough, that&#8217;s KL divergence&#8221;. But it looks nothing like it&#8230; How did they come up with that penalty term?</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qENm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qENm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 424w, https://substackcdn.com/image/fetch/$s_!qENm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 848w, https://substackcdn.com/image/fetch/$s_!qENm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 1272w, https://substackcdn.com/image/fetch/$s_!qENm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qENm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png" width="435" height="81" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:81,&quot;width&quot;:435,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qENm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 424w, https://substackcdn.com/image/fetch/$s_!qENm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 848w, https://substackcdn.com/image/fetch/$s_!qENm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 1272w, https://substackcdn.com/image/fetch/$s_!qENm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b57a40-2e11-4ef5-8eaf-e52f12aee735_435x81.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">As a reminder, KL divergence initially looks like this.</figcaption></figure></div><p>If you go down the rabbit hole, like I did, you find an interesting truth that is at the root of the deep learning mindset. More on that at the end. </p><p>You can check out my video of the subject if you want a walkthrough:</p><div id="youtube2-iHf6mMiiNOw" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;iHf6mMiiNOw&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/iHf6mMiiNOw?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The TLDR is that it&#8217;s empirically a very good Monte-Carlo approximation of KL divergence, with less variance than the version from the PPO algorithm while still being unbiased.</p><p>The proof of that isn&#8217;t even another research paper, it&#8217;s just a blog post that one of the co-founders of OpenAI, Schulman, posted in 2020:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6gbK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6gbK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 424w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 848w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 1272w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6gbK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png" width="1456" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:777445,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6gbK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 424w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 848w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 1272w, https://substackcdn.com/image/fetch/$s_!6gbK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd88ceee0-5baf-4128-b93e-50e0ef083828_1766x869.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">One of the three blog posts he has lol</figcaption></figure></div><p>Yet, the flow is elegant and I suggest you take a look at it (slowly): http://joschu.net/blog/kl-approx.html</p><p>In a nutshell here is the flow:</p><ol><li><p>We can&#8217;t use KL divergence directly as it&#8217;s impractical for RL purposes and we don&#8217;t need to be too correct. The approximation is good enough for the training procedure.</p></li><li><p>The first approximation k1 is <code>log(q/p)</code> which is what <strong>PPO</strong> is using. This is an unbiased approximation but has high variance since k1 can be negative while the true KL cannot (bounded by [0,inf[)</p></li><li><p>The second approximation Schulman proposes k2 is <code>0.5*log(p/q)^2</code> which is a biased estimator, but with lower variance than k1. It has lower variance because it&#8217;s an f-divergence like KL and one of the properties of f-divergences is that when the probability distribution <strong>p</strong> and <strong>q</strong> are very similar, all f-divergences look roughly like KL.</p></li><li><p>The third approximation k3 is <code>(p/q -1) - log(p/q)</code> which is what we find in GRPO (a bit rearranged). This is an unbiased estimator because it is <strong>k1 + a control variate (p/q -1)</strong>. That control variate has 0 expectation so it doesn&#8217;t change the biasesness of k1.</p></li></ol><p>When Schulman ran the simulation he got the following results:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mKZD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mKZD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 424w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 848w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 1272w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mKZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png" width="1452" height="577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:1452,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:229554,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mKZD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 424w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 848w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 1272w, https://substackcdn.com/image/fetch/$s_!mKZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c93a4f-4d80-42a3-af2c-f0029b00e645_1452x577.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This shows that k3 seems to be strictly better, while k2 is not performing too badly either except when the true KL is getting larger.</p><h3>What&#8217;s the takeaway?</h3><p>A good portion of deep learning is very empirical. Is PPO wrong for using k1 instead of k3? Maybe, maybe not.</p><p>It might not even impact the training that much and that high variance penalty term might be just good enough.</p><p>Should k2 be used in GRPO or PPO? Maybe, maybe not.</p><p>It&#8217;s all about tradeoffs and validating empirically what yields at the end of the day the best model.</p><p>When you dig deep enough into the internals of deep learning systems, you realize the thousands of tiny design decision that have been made to get to the result.</p><p>Beautiful to gaze upon as a researcher.</p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[The TLDR on DeepSeek R1]]></title><description><![CDATA[Simpler than I originally thought!]]></description><link>https://www.yacinemahdid.com/p/the-tldr-on-deepseek-r1</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/the-tldr-on-deepseek-r1</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Mon, 03 Feb 2025 18:34:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/QdEuh2UVbu0" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone,</p><p>Hope you are doing fantastic today!</p><p>I&#8217;ve been reading DeepSeek R1 last week. <br>I found the method, results, and just the general ideas within that paper super interesting.</p><p>I&#8217;ve made a walk-through over here if you are interested in going through it step by step:</p><div id="youtube2-QdEuh2UVbu0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;QdEuh2UVbu0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/QdEuh2UVbu0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Here is the TLDR and a map to follow along:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yy_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yy_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 424w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 848w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png" width="1043" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1043,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:601120,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yy_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 424w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 848w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The end output of the R1 research is three family of reasoning models (all open source):</p><ul><li><p>DeepSeek R1-Zero</p></li><li><p>DeepSeek R1</p></li><li><p>DeepSeek R1 distilled into Llama and Qwen</p></li></ul><p>All model creation pipelines have a root based on DeepSeek V3, a huge 600+B parameter model (also open source).</p><p>The research is doing post-training to induce reasoning into DeepSeek V3. Aka, try to make the model talk to itself in order to be better at complex tasks like coding, engineering, and problem-solving.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dw6L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dw6L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 424w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 848w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 1272w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dw6L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png" width="769" height="466" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:466,&quot;width&quot;:769,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84610,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dw6L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 424w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 848w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 1272w, https://substackcdn.com/image/fetch/$s_!dw6L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2713c5f7-30a4-446d-bcd4-671ea0b41c0c_769x466.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As seen above, DeepSeek R1 performance is on par with Open AI o1 which is closed source. Arguably replicating their methodology.</p><p>The interesting first result is that <strong>DeepSeek R1-Zero</strong> pipeline involves Reinforcement Learning with no human in the loop. </p><p>To prompt the model, they used this very simple prompt:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l1Oa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l1Oa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 424w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 848w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 1272w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l1Oa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png" width="764" height="196" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:196,&quot;width&quot;:764,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48224,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l1Oa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 424w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 848w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 1272w, https://substackcdn.com/image/fetch/$s_!l1Oa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33a95167-e6b2-4567-80d7-905bcfe34e16_764x196.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>By adding a proper reward function on top of that prompt, the model was able learn to think longer and induce reasoning over time.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yssl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yssl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 424w, https://substackcdn.com/image/fetch/$s_!yssl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 848w, https://substackcdn.com/image/fetch/$s_!yssl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 1272w, https://substackcdn.com/image/fetch/$s_!yssl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yssl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png" width="585" height="363" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2536b49-f548-439d-b370-1b278ff655db_585x363.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:363,&quot;width&quot;:585,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73733,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yssl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 424w, https://substackcdn.com/image/fetch/$s_!yssl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 848w, https://substackcdn.com/image/fetch/$s_!yssl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 1272w, https://substackcdn.com/image/fetch/$s_!yssl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2536b49-f548-439d-b370-1b278ff655db_585x363.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This increase in reasoning in the model is correlated directly with an increase in performance over time for DeepSeek-R1-Zero.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hd28!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hd28!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 424w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 848w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 1272w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hd28!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png" width="805" height="713" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:713,&quot;width&quot;:805,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140994,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hd28!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 424w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 848w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 1272w, https://substackcdn.com/image/fetch/$s_!Hd28!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d56aaf2-26f7-4026-9b58-8828dfd46c11_805x713.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The magic of DeepSeek R1-Zero not requiring human feedback during the reinforcement learning process hinges on the Groupe Relative Policy Optimization (GRPO) and their rewards function.</p><p>For GRPO we have the following:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jmoS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jmoS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 424w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 848w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 1272w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jmoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png" width="794" height="266" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:266,&quot;width&quot;:794,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50514,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jmoS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 424w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 848w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 1272w, https://substackcdn.com/image/fetch/$s_!jmoS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74aedc7a-b992-4091-8ea3-85ebeb27221a_794x266.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Which basically says:</p><ul><li><p>use current log probabilities for the policy across the group (pi) scaled by the old policy times the normalized reward (A) as an optimization step.</p></li><li><p>don&#8217;t take too big a step compared to the old policy though (clip part).</p></li><li><p>don&#8217;t diverge too much from DeepSeek V3 to parameters (KL divergence part).</p></li></ul><p>For the reward:</p><p>The reward function at this point in the research was all deterministic and made sure the formatting of the reasoning stayed within the think tag:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J23B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J23B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 424w, https://substackcdn.com/image/fetch/$s_!J23B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 848w, https://substackcdn.com/image/fetch/$s_!J23B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 1272w, https://substackcdn.com/image/fetch/$s_!J23B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J23B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png" width="774" height="277" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:277,&quot;width&quot;:774,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J23B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 424w, https://substackcdn.com/image/fetch/$s_!J23B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 848w, https://substackcdn.com/image/fetch/$s_!J23B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 1272w, https://substackcdn.com/image/fetch/$s_!J23B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b23d7fe-09ec-4edd-975c-ca6c6a56a211_774x277.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a big deal as it breaks with the idea that we require reinforcement learning from human feedback (RLHF) to induce reasoning in models.</p><p>However, DeepSeek R1-Zero reasoning was difficult to decode, so the researcher posited that finetuning DeepSeek V3 prior to doing reinforcement learning would help the performance.</p><p>To do so, they made a whole complicated pipeline that consisted of generating reasoning and non-reasoning data in a multistep training process (see map - I got in <a href="https://youtu.be/QdEuh2UVbu0">painstaking detail in the video</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yy_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yy_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 424w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 848w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png" width="1043" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1043,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:601120,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!yy_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 424w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 848w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!yy_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf800d6c-7a41-49b0-b5d8-5e13620cb6f1_1043x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the end they get a solid model, on par with o1, and with an intelligible reasoning context:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ntPd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ntPd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 424w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 848w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 1272w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ntPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png" width="731" height="626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:626,&quot;width&quot;:731,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:141462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ntPd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 424w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 848w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 1272w, https://substackcdn.com/image/fetch/$s_!ntPd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68d6be16-2ce1-47b6-8d44-3d7211ea4f88_731x626.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To make the models broadly useable they used distillation using R1 into smaller models like Qwen and Llama. The results are also very performant and MUCH smaller:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Okxp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Okxp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 424w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 848w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Okxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png" width="772" height="407" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:407,&quot;width&quot;:772,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90593,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Okxp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 424w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 848w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 1272w, https://substackcdn.com/image/fetch/$s_!Okxp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35db9a44-2919-4ee4-a3fa-9d0c1b236e29_772x407.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Finally, when they tried to run their reinforcement learning pipeline on a smaller model vs using distillation, they realized that distillation was better:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PZya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PZya!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 424w, https://substackcdn.com/image/fetch/$s_!PZya!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 848w, https://substackcdn.com/image/fetch/$s_!PZya!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 1272w, https://substackcdn.com/image/fetch/$s_!PZya!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PZya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png" width="779" height="187" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b30251f-f531-441c-b605-d41291e7b04a_779x187.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:187,&quot;width&quot;:779,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38168,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PZya!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 424w, https://substackcdn.com/image/fetch/$s_!PZya!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 848w, https://substackcdn.com/image/fetch/$s_!PZya!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 1272w, https://substackcdn.com/image/fetch/$s_!PZya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b30251f-f531-441c-b605-d41291e7b04a_779x187.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The bottom line for the distillation part is that it seems like reinforcement learning &#8220;heightens&#8221; the reasoning core that is already present in the very big models vs not really sharp in the smaller ones natively.</p><p>You can find all these models on <a href="https://ollama.com/library/deepseek-r1">Ollama</a> and you can play with them easily on your GPUs:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W1lV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W1lV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 424w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 848w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 1272w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W1lV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png" width="792" height="536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:536,&quot;width&quot;:792,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72585,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W1lV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 424w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 848w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 1272w, https://substackcdn.com/image/fetch/$s_!W1lV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07f5b2-37b3-45eb-a035-b5688e5964a8_792x536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>That&#8217;s it!</h2><p>Hope you enjoyed the TLDR, for more detailed information I suggest you check out the tutorial or the paper directly:</p><p>&#128204; https://arxiv.org/pdf/2501.12948</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.yacinemahdid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">By the way, do consider becoming a premium member of this community. I&#8217;m preparing member only guide and giving priority for answering research or academic questions: </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Have a great week everyone! &#128075;</p><p></p>]]></content:encoded></item><item><title><![CDATA[Advice to an Undergraduate Researcher]]></title><description><![CDATA[Short advice for young machine learning researchers (and my former-self).]]></description><link>https://www.yacinemahdid.com/p/advice-to-an-undergraduate-researcher</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/advice-to-an-undergraduate-researcher</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Sun, 15 Sep 2024 14:03:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g4Fs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hello everyone &#128075;</p><p>I hope you are enjoying the last remnant of summer (if you are in the northern hemisphere that is).</p><p>In this edition I wanted to share a bunch of advice I would give to my former young undergraduate self or any undergraduate interested in research.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g4Fs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g4Fs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g4Fs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:417566,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g4Fs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 424w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 848w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!g4Fs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11278062-8ee3-442a-ab54-92052d83ad6e_2016x1512.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This thought was prompted by this message I&#8217;ve received:</p><blockquote><p><em>Hi Yacine!</em></p><p><em>My name's [BLANKED OUT], and I'm a 19 year old 2nd year Undergraduate at University of [BLANKED OUT] studying computer engineering. I recently joined one of my campus' research labs as an intern, it focuses on ML applications within healthcare and medicine.</em></p><p><em>I'm still unfamiliar with many of the concepts we utilize, but I luckily have some coding experience.</em></p><p><em>I was wondering if you have any advice for a beginner in this field?</em></p><p><em>How did you become interested and so familiar with the machine learning concepts you teach in your videos?</em></p><p><em>I'm eager to make meaningful additions to my colleagues work but I currently feel like I'm taking up more of their time having them explain certain things to me-- rather than helping them with their research.</em></p></blockquote><p>By the way, he&#8217;s talking about&nbsp;<a href="https://www.youtube.com/@deeplearningexplained">these tutorials I post on machine-learning topics</a>&nbsp;that&nbsp;interest me.</p><p>My answers seemed useful to this bright student so I thought about sharing them here more broadly!</p><h1>The Answers</h1><p>Hey man,</p><p>First of all big props for reaching out, this is the right mindset to have especially if you are doing something as difficult as contributing to research while doing a bachelor's degree.</p><p>I&#8217;ll separate this answer in two, just so that you have the right context in which I&#8217;m answering it.</p><ol><li><p>I&#8217;ll first tell you how I got involved in research as an undergraduate because I was in the same exact situation as you.</p></li><li><p>Then I&#8217;ll tell you what I would do if I could do it again.</p></li></ol><h3>A1: What did I do to contribute early on?</h3><p>I got into a lab doing lots of physiological signal analysis and I was responsible for a portion of the coding.</p><p>The first thing I did was try to learn as much as possible about how everything worked.</p><p>I was mainly trying to build my knowledge of the lab bottom up and try to help in whatever way I could in whatever project came up.</p><p>I didn&#8217;t always have the full context of a project I was helping out, I was mostly glad to have a specific project I could help with.</p><p>The very first project I was the full owner of was a Java project which involved lots of signal acquisition and processing with the Android SDK.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_oF2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_oF2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 424w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 848w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 1272w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_oF2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png" width="1066" height="625" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:625,&quot;width&quot;:1066,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:117125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_oF2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 424w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 848w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 1272w, https://substackcdn.com/image/fetch/$s_!_oF2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F644575ec-1540-4afa-adf2-208e6d4d059c_1066x625.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These types of signals.&#128070;</p><p>I got kind of into a rabbit hole in some of the technical aspects I had to help with which was very cool.</p><p>It allowed me to get a very practical grasp on how everything was analyzed, collected, or processed in some of the lab projects.</p><p>As I gained more knowledge about the technical, I started proposing ways into which we could get interesting result by modifying some analysis.</p><p>I also made sure to optimize the code I was building so that the throughput of the lab was better. Like for instance by making heavy use of parallelization later on (<a href="https://youtu.be/Ru3Gd5eGyPg">we had access to a supercluster for free that was awesome</a>).</p><p>Anyway, at the end of it, <a href="https://www.researchgate.net/profile/Yacine-Mahdid">I was able to publish quite a few papers which were mostly &#8220;assist&#8221; papers where I was a co-author</a>. I was the main contributor to the general codebase of the lab and I became the go-to researcher for all the tricky technical problems.</p><h3>A2: What I would do to contribute in retrospect?</h3><p>Okay so that&#8217;s what I did, in retrospect I would do it a bit differently.</p><p>I would absorb the context the lab is operating in wayyyyyy earlier.</p><p>Looking back there was a lot of wasted technical effort in the process I followed. I learned a lot, but I could have been more impactful faster.</p><p>To get the context of the lab faster I would do the following:</p><ul><li><p>Take one of the papers the lab already published (like the best recent one) and then piece it down into the technical component.</p></li><li><p>I would ask very targeted questions about why some stuff was done in a certain and how I could replicate the results.</p></li><li><p>Once I have a good idea about that blueprint, I would have a very good idea about how the lab is structured to generate science-worthy results:</p><ul><li><p>Who did they collaborate with?</p></li><li><p>Who made this code?</p></li><li><p>Why is it set up like this (sometimes it&#8217;s silly like &#8220;the intern only knew how to do it in R&#8221;)?</p></li><li><p>Why did they choose this analysis over another one?</p></li><li><p>What have they tried that failed and what changed?</p></li></ul></li></ul><p>All of that meta-information is absorbable in under a month and you will get up to speed with how the lab functions fast.</p><p>THEN with that context, I would help whoever is most collaborative in the lab with their experiment and focus on two areas:</p><ul><li><p>the mundane repetitive tasks that are annoying to do: in our case, it was cleaning up the EEG data we collected.</p></li><li><p>the coding tasks to push analysis forward: given your background, the coding part is where you can bring some assistance. Since you already understand the setup you should be able to contribute quickly.</p></li></ul><p>After that, I would check what I&#8217;m most interested in and double down on that area. This way, it will provide me with growth and allow the lab to benefit from my learning experience.</p><p>btw have fun during the process too, that&#8217;s the most important part.</p><h3>A3: How I Became Interested in Machine Learning?</h3><p>To be frank, I&#8217;m not directly interested in machine learning.</p><p>I&#8217;m interested in learning in general, as a meta-concept. I&#8217;ve started my bachelor's in cell biology and I was fascinated by neurons.</p><p>So fascinated that I interned in a neurobiology lab during my summer studying memory in Aplysia (it&#8217;s a sea slug). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h2YX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h2YX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h2YX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg" width="482" height="482" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:800,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Brown Sea Hare (Aplysia californica) - Spanglers' Scuba&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Brown Sea Hare (Aplysia californica) - Spanglers' Scuba" title="Brown Sea Hare (Aplysia californica) - Spanglers' Scuba" srcset="https://substackcdn.com/image/fetch/$s_!h2YX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h2YX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf9f4dce-3613-40f5-8a51-ae723f650cba_800x800.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>beautiful!</p><p>I was doing my best to cut out a protein with multiple ubiquitin-binding domain to check how this would affect it. This was all part of a greater study about mice, genes, and memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vwd2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vwd2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 424w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 848w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 1272w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vwd2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png" width="339" height="596.0196078431372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:807,&quot;width&quot;:459,&quot;resizeWidth&quot;:339,&quot;bytes&quot;:283950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vwd2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 424w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 848w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 1272w, https://substackcdn.com/image/fetch/$s_!Vwd2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2636e77e-3a87-4be1-aeaf-8ff8fe0b9aa9_459x807.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s my protein! &#128070;</p><p>I was fascinated, and I went on to learn about memory in humans in a psychiatric lab. I had a project where I was trying to teach the memory palace technique to patients with schizophrenia.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bL1h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bL1h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 424w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 848w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 1272w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bL1h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png" width="396" height="256.068669527897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:904,&quot;width&quot;:1398,&quot;resizeWidth&quot;:396,&quot;bytes&quot;:318835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bL1h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 424w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 848w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 1272w, https://substackcdn.com/image/fetch/$s_!bL1h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf918eae-fe68-4e8f-91b8-1db438e2c98c_1398x904.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Then I took lots and lots of computer science classes which included signal processing and machine learning.</p><p>I&#8217;ve also took online classes on the subject, but where I learned the most was by directly trying to apply the concept in the research project I was working on.</p><p>These projects directly benefited my research lab and I had great motivation to dig deeper into both the data that my lab was generating and how to use machine learning properly.</p><p>Don&#8217;t be afraid to self-learn machine learning, this topic is so vast and multidisciplinary that you will never find one resource that will show you everything you need to know.</p><p>It&#8217;s the kind of field that you learn in various way and that you try to apply ml to your problems in creative fashion.</p><p>Let me know if you want to voice-chat later this week, might be easier to answer your questions!</p><p>END OF THE MESSAGE.</p><div><hr></div><p>Hope that was useful everyone &#127801;</p><p>Have a great rest of the week!</p><p>Best</p><p>Yacine Mahdid</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u5OK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u5OK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 424w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 848w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 1272w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u5OK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png" width="565" height="727" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:727,&quot;width&quot;:565,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:613230,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u5OK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 424w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 848w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 1272w, https://substackcdn.com/image/fetch/$s_!u5OK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b23c58-3e44-4fe9-b47a-7a4ddaf488b2_565x727.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PS: feel free to shoot me questions by replying to this email, I love them.</p>]]></content:encoded></item><item><title><![CDATA[What's In-Context Learning in Deep Learning and Why It's so Cool?]]></title><description><![CDATA[LLMs can learn from a few example and even "learn" tasks that contradict their pre-training distribution. Wild stuff ahead.]]></description><link>https://www.yacinemahdid.com/p/whats-in-context-learning-in-deep</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/whats-in-context-learning-in-deep</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 20 Aug 2024 19:02:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7dff46d4-0b80-4c94-bb22-fdcc0406eb75_698x393.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this edition, I wanted to explore a very weird topic in deep learning more specifically relating to large language model. </p><p>In-Context Learning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ie1k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ie1k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 424w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 848w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 1272w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ie1k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!ie1k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 424w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 848w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 1272w, https://substackcdn.com/image/fetch/$s_!ie1k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8175ee42-eb9a-40a3-b37c-634d8a9142b9_1600x894.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you never heard the term this issue is for you.</p><p>Basically if you are using LLM like Gemma or GPT-4 chances are you already have used in-context learning.</p><p>Something like the following prompt would qualify as asking a model to do in-context learning:</p><blockquote><p>Circulation revenue has increased by 5% in Finland. \n Neutral <br><br>Panostaja did not disclose the purchase price. \n Negative <br><br>Paying off the national debt will be extremely painful. \n Positive </p><p>The company anticipated its operating profit to improve. \n __________ &#8592; to complete</p></blockquote><p>In that particular case, the output for something like GPT-4 looks like this:</p><blockquote><p>The sentiment for "The company anticipated its operating profit to improve" is <strong>Positive</strong>.</p></blockquote><p>How the heck did the model understood that:</p><ol><li><p>This was a sentiment analysis task?</p></li><li><p>That it needs to respond Positive or Negative?</p></li></ol><p>We&#8217;ll explore that in this issue!</p><p><em>btw: for the visual folks out there, there is a video + deck version available on my Youtube channel.&#128071;</em></p><div id="youtube2-As9a15poQHs" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;As9a15poQHs&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/As9a15poQHs?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h1>Definition</h1><p>In a nutshell, In-Context learning is loosely defined as showing a few examples of input and output to a large language model and just by conditioning on the prompt the model can learn how to do a new task.</p><p>In-context learning like many things in machine learning is a bit of a confusing terminology. There isn&#8217;t any learning happening in the traditional machine-learning sense (i.e. the weights of the models aren&#8217;t changing).</p><p>However, something is happening within the models that feel like it understands the tasks that are requested.</p><p>The term in-context learning originated from GPT-3 in the paper that introduced the model title &#8220;Language Models are Few-Shot Learners&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!khpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!khpO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 424w, https://substackcdn.com/image/fetch/$s_!khpO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 848w, https://substackcdn.com/image/fetch/$s_!khpO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 1272w, https://substackcdn.com/image/fetch/$s_!khpO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!khpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png" width="536" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:536,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37407,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!khpO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 424w, https://substackcdn.com/image/fetch/$s_!khpO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 848w, https://substackcdn.com/image/fetch/$s_!khpO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 1272w, https://substackcdn.com/image/fetch/$s_!khpO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7694d563-a4ab-48a4-a207-51ad55c07ab0_536x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Few-Shot in this context is giving a model a few examples to train on a new task</p><blockquote><p><em>Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes <strong>even reaching competitiveness with prior state-of-the-art fine-tuning approaches</strong>. </em></p><p><em>[&#8230;]</em></p><p><em><br><strong>For all tasks, GPT-3 is applied without any gradient updates or fine-tuning</strong>, with tasks and few-shot demonstrations specified purely via text interaction with the model.</em></p><p><a href="https://arxiv.org/pdf/2005.14165">OpenAI</a></p></blockquote><p>The sentence above is absolutely bonkers from a machine-learning standpoint.</p><p>The gist of it is that by showing one or a few examples of the input-output pairing of what the task is, the model can drastically improve its performance as seen below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vFUw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vFUw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 424w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 848w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 1272w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vFUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png" width="639" height="401" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9688383-2d25-4405-bfa5-e7230cab5002_639x401.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:401,&quot;width&quot;:639,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:67310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vFUw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 424w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 848w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 1272w, https://substackcdn.com/image/fetch/$s_!vFUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9688383-2d25-4405-bfa5-e7230cab5002_639x401.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The interesting point on the graph is:</p><ol><li><p>This ability emerge at a rate proportional to the size of the models (175B curve is steeper than 13B). Yet even in the 13B, there is in-context happening, except at a rate that requires more examples to be given in context.</p></li><li><p>There is a very sharp improvement in the larger 175B model when 1 example is given and the performance keep increasing as more natural language examples are showcased to the model.</p></li></ol><p>To leverage in-context learning in practice the last output pairing needs to be left blank so that the LLM can fill it out as in the example above.</p><p>There are many ways to use in-context learning as it is versatile in a wide variety of tasks and can even be used to make external calls to APIs in the case of toolformers architecture.</p><h1>Why is it Surprising?</h1><p>Being able to learn on the fly with a few examples isn&#8217;t something new, meta-learning that allows for optimizing a model for few-shot learning already existed well before GPT-3.</p><p>The reason why in-context is leaving researchers quite puzzled is that it <strong>wasn&#8217;t part of the training procedure at all</strong>. </p><p>Remember, in-context learning originally emerged in big transformer models trained on next token prediction from internet scale data.</p><p>There is nothing in this training regiment that explicitly conditioned the models to develop that capability.</p><p>Yet, it happens consistently and allows for LLMs to be a massively versatile interface if conditioned properly with input and output (like in this GPT-4 example prompt):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ik4E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ik4E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 424w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 848w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 1272w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ik4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png" width="809" height="561" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:809,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:343127,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ik4E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 424w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 848w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 1272w, https://substackcdn.com/image/fetch/$s_!Ik4E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2875339-d1f7-4c01-a74b-f05b45d17a4f_809x561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are two connected leading hypotheses of how this emerging ability is working, let&#8217;s explore them.</p><h1>Hypothesis 1: Bayesian Inference Framework</h1><p>The first prominent hypothesis for how in-context learning could work comes from Xie/Min et al, where the authors proposed a Bayesian framework explaining this emergence.</p><p>In a nutshell, the LLM uses the in-context learning prompt with input/output examples as a guide to locate the previously learned latent concept within its weights.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YFvQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YFvQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 424w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 848w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 1272w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YFvQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png" width="448" height="269" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:269,&quot;width&quot;:448,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46159,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YFvQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 424w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 848w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 1272w, https://substackcdn.com/image/fetch/$s_!YFvQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4689c690-5ef1-4a09-ac89-920898fe337c_448x269.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As you can see in the picture above, the LLM could be infering and retrieve the right concepts depending on the content of the prompt.</p><p>In this framework, the LLM is learning and inferring the concept of two things:</p><ol><li><p>the pre-training concepts distribution while it was trained to do the next token prediction.</p></li><li><p>the prompt examples concepts distribution which usually share a common theme.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6Z2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6Z2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 424w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 848w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 1272w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6Z2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png" width="1456" height="143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:143,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!C6Z2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 424w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 848w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 1272w, https://substackcdn.com/image/fetch/$s_!C6Z2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b0768b8-16d1-4ab5-a57d-33118e54e0b3_1512x148.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In this framework, the prompt examples are used to <em>&#8220;sharpen the posterior distribution over concepts&#8221;</em>. As we add more examples in the prompt we are helping the model select the right concept to sample within the &#8220;<em>p(concept | prompt)</em>&#8221; part of the formula.</p><p>Here the prompt information can be separated into two:</p><ul><li><p><strong>Signal:</strong> which are the training examples and the information contained within.</p></li><li><p><strong>Noise:</strong> which are the transition between examples or even the correctness of the input-output pairing.</p></li></ul><p>The basis here is that if the signal-to-noise ratio is high in-context learning can emerge. Interestingly, some noise can be added to an in-context learning setting that would make no sense for a supervised learning algorithm but would leave an LLM unbothered (e.g. like having all the output labels being randomized).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Aa8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Aa8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 424w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 848w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 1272w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Aa8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png" width="388" height="300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:388,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Aa8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 424w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 848w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 1272w, https://substackcdn.com/image/fetch/$s_!0Aa8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F824169a3-4082-4d9f-9c6a-00267ff3fe15_388x300.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As we see here, by having enough signals the model learned:</p><ol><li><p>the input and output pairing structure for the task.</p></li><li><p>the type of task: sentiment analysis.</p></li></ol><p>Even though there is a tremendous amount of noise in the output (i.e. all the bottom examples are random!) it kind of doesn&#8217;t matter much for the model in such a task setting.</p><p>If we were to break down the content of an in-context learning prompt, it would look like this:</p><ul><li><p>Input distribution (very important)</p></li><li><p>Input-Output mapping (not so important)</p></li><li><p>The format used (important)</p></li><li><p>Output space (very important)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!diyM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!diyM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 424w, https://substackcdn.com/image/fetch/$s_!diyM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 848w, https://substackcdn.com/image/fetch/$s_!diyM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 1272w, https://substackcdn.com/image/fetch/$s_!diyM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!diyM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png" width="564" height="236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:236,&quot;width&quot;:564,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65679,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!diyM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 424w, https://substackcdn.com/image/fetch/$s_!diyM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 848w, https://substackcdn.com/image/fetch/$s_!diyM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 1272w, https://substackcdn.com/image/fetch/$s_!diyM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F775a2f9f-7d5f-4379-b249-7b6da34f9341_564x236.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This idea was explored concurrently <a href="https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html">from another angle by the smart folks at Anthropic</a> as they found a circuit they call an induction head in transformer architectures.</p><p>This type of sequence gets its name from inductive reasoning because:</p><blockquote><p><em>In inductive reasoning, we might infer that if </em><code>A</code><em> is followed by </em><code>B</code><em> earlier in the context, </em><code>A</code><em> is more likely to be followed by </em><code>B</code><em> again later in the same context.</em></p><p><em><a href="https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html">Anthropic</a></em></p></blockquote><p>Even in the toy transformer network they studied, the researchers at Anthropic quickly realized that the presence of such a circuit in the network meant the architecture was implementing an algorithm within.</p><blockquote><p>Notice that induction heads are implementing a simple algorithm, and are not memorizing a fixed table of n-gram statistics. The rule [A][B] &#8230; [A] &#8594; [B] applies regardless of what A and B are.</p></blockquote><p>Which bridges the gap to the second hypothesis.</p><h1>Hypothesis 2: LLMs are Implementing Learning Algorithm Implicitly</h1><p>Now the second hypothesis of how in-context emerge is that the transformer-based in-context learners are literally implementing learning algorithm implicitly somewhere within their weights structure.</p><p>It has been shown that transformers architecture even of small size can implement gradient descent in the paper <a href="https://arxiv.org/abs/2212.07677">&#8220;Transformers learn in-context by gradient descent&#8221;</a>.</p><p>One of the most compelling arguments that the big LLMs are implementing a learning algorithm to help with in-context learning tasks comes from a paper by the Google Research team titled &#8220;<a href="https://arxiv.org/abs/2303.03846">Larger Language Model do In-Context Learning Differently</a>&#8220;</p><p>It was shown by the team at Google Research that if models are large enough they gain the ability to override their prior pre-training distribution and learn a new erroneous input-output mapping.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V5RW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V5RW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 424w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 848w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 1272w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V5RW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png" width="1456" height="693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!V5RW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 424w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 848w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 1272w, https://substackcdn.com/image/fetch/$s_!V5RW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9abe183-28cb-4384-bd7c-5def417f6dc1_1600x761.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What we see in the figure is that we are feeding increasingly more flipped label percentages for an in-context learning task to various models. As we increase the number of wrong examples what we witness is that the bigger models like PALM-540B are overriding their typical internal in-context behavior of not listening to input-output mapping. </p><p>Note that it still requires enough flipped labels for this &#8220;failure&#8221; at the task to appear in these large models since the hypothesis is that the models are quite literally using function mapping from input to output during an in-context task.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OzzN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OzzN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 424w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 848w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 1272w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OzzN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png" width="1125" height="478" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e289164-6a33-4c37-af58-9a388135401f_1125x478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:478,&quot;width&quot;:1125,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:134695,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OzzN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 424w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 848w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 1272w, https://substackcdn.com/image/fetch/$s_!OzzN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e289164-6a33-4c37-af58-9a388135401f_1125x478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Something similar is happening with what is called &#8220;Semantically-unrelated targets in-context learning&#8221; which is a fancy way of saying &#8220;replace the output with random words like Apple or Oranges.</p><p>Small models struggle throughout, while larger models get better and learn the right mapping for the task.</p><h1>Where Are We Now?</h1><p>So, for both hypotheses highlighted in this newsletter, the mechanism seems to be a contextual activation of the right embeddings or knowledge domains within the model and this activation seems to be producing the right output.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-eNg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-eNg!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 424w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 848w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 1272w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-eNg!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif" width="320" height="197.8181818181818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:220,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Conspiracy GIFs | Tenor&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Conspiracy GIFs | Tenor" title="Conspiracy GIFs | Tenor" srcset="https://substackcdn.com/image/fetch/$s_!-eNg!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 424w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 848w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 1272w, https://substackcdn.com/image/fetch/$s_!-eNg!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27309f81-ba22-4d83-aebc-579c749eadd8_220x136.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The current research direction is pushing the boundaries towards that second hypothesis quite heavily with articles such as:</p><ul><li><p>Transformers Learn In-Context by Gradient Descent</p></li><li><p>Trained Transformers Learn Linear Models In-Context</p></li><li><p>Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression</p></li><li><p>What In-Context Learning &#8220;Learns&#8221; In-Context: Disentangling Task Recognition and Task Learning</p></li></ul><p>If this research direction of big language models yields positive results, there is a very good case to be made to just <em>&#8220;train larger models and see what emerges</em>&#8221;.What other meta-learning ability will appear as the number of parameters increases?</p><h1>Conclusion</h1><p>Anyway, this is a fascinating topic and I believe that a solid understanding of the mechanism of in-context learning will help us develop the <strong>next generation of AI models</strong>.</p><p>If you want to dig even deeper than this first introduction, check out these awesome resources:</p><ul><li><p><a href="https://thegradient.pub/in-context-learning-in-context/">The Gradient: In-Context Learning in Context</a></p></li><li><p><a href="http://ai.stanford.edu/blog/understanding-incontext/">The Stanford AI Lab Blog: How does in-context learning work? A framework for understanding the differences between traditional supervised learning</a></p></li></ul><p>Also, for a more detail-oriented explanation of <a href="https://youtu.be/O4WMiIJwgd4?si=VU6GbspEie_9rcO2">in-context learning check out Sang Michael Xie's contribution.</a></p><p>If you have questions or additional thoughts, let&#8217;s continue this conversation in the comment section!</p><p>Hope it was useful! &#127801;</p><p>Best,</p><p>Yacine Mahdid </p>]]></content:encoded></item><item><title><![CDATA[Which Cross-Validation Method to Use in Machine Learning?]]></title><description><![CDATA[One of the most important decision for your machine learning experiments!]]></description><link>https://www.yacinemahdid.com/p/which-cross-validation-method-to</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/which-cross-validation-method-to</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Tue, 06 Aug 2024 13:07:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hi everyone! &#128075;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3GtA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3GtA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3GtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg" width="518" height="690.5480769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1941,&quot;width&quot;:1456,&quot;resizeWidth&quot;:518,&quot;bytes&quot;:5072859,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3GtA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3GtA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f048d53-1300-4200-ac5c-9acd166a0eeb_2316x3088.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How&#8217;s your summer going?</p><p>It&#8217;s been quite a long time since I haven&#8217;t published in this newsletter, but recently I&#8217;ve been receiving lots of DMs from machine learning practitioners.</p><p>I feel it will be much more useful for everyone if I summarize my practical tips in here.</p><p>One of the questions I recently received was about selecting the proper cross-validation scheme in a machine-learning experiment.</p><p>For those that don&#8217;t know, cross-validation is defined as: </p><blockquote><p><strong>Cross-validation is a <a href="https://en.wikipedia.org/wiki/Model_validation">model validation</a> technique for assessing how the results of a <a href="https://en.wikipedia.org/wiki/Statistics">statistical</a> analysis will <a href="https://en.wikipedia.org/wiki/Generalization_error">generalize</a> to an independent data set.</strong></p></blockquote><p>It is used during the training of a machine learning model to figure out how the model will behave once deployed in real life (aka how it will generalize).</p><p>The two main issues I&#8217;m seeing with practitioners getting bogged down with cross-validation are:</p><ol><li><p>They don&#8217;t do cross-validation at all.</p></li><li><p>They do not choose the right cross-validation scheme that fits their use case.</p></li></ol><p>This is problematic because it leads to the dreaded overfitting issue where the model seems to be doing fine during training, but completely collapses in production.</p><p>In this issue, I&#8217;ll show you a trick to always select the right cross-validation scheme for your use-case!</p><p><em>btw: <a href="https://youtu.be/Xk2nb9x1e70">there is a video version over here if you are more of a visual learner</a>.</em></p><h2>What&#8217;s the trick?</h2><p>There are a lot of cross-validation options to choose from:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9sxP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9sxP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 424w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 848w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 1272w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9sxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png" width="1274" height="677" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:677,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172411,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9sxP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 424w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 848w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 1272w, https://substackcdn.com/image/fetch/$s_!9sxP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d488e4e-e75a-4cd0-9bdb-0c196060650d_1274x677.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Luckily for us, it&#8217;s fairly straightforward to figure out which fits best a use case.</p><p>The trick is as follows:</p><blockquote><p>Think about how your model will be used and interact with data in a deployed setting. Then recreate this environment for the validation of the model.</p></blockquote><p>It&#8217;s as simple as that.</p><p>There are usually three components interacting together when a model is in production:</p><ul><li><p>There is a data source that generates the input for the model.</p></li><li><p>There is the trained model per se.</p></li><li><p>There is the output of the model which should be solving a task.</p></li></ul><p>To get satisfactory output during the production phase, the training of the model and the way the input data is generated need to fit together.</p><p>Let&#8217;s check an example to illustrate:</p><p>Assuming we want to detect the presence of a disease in samples from an unseen patient. How would the model interact with the data in production?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gc0-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gc0-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 424w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 848w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 1272w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gc0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png" width="1197" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1197,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:181808,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gc0-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 424w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 848w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 1272w, https://substackcdn.com/image/fetch/$s_!gc0-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85594fe-c477-4a23-91ae-1ddeeaa04de2_1197x621.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The important part of the interaction above is that the model <strong>never saw the patient during its training.</strong></p><p>This piece of information is crucially important because it means that during the training of the model, we cannot have data from a single patient in both the training and validation split.</p><p>This effectively pinpoints exactly which of the different kinds of cross-validation techniques in the tree of possible schemes to use we should be using.</p><p>Let&#8217;s look at that tree!</p><h1>Decision Tree of Cross-Validation Techniques</h1><p>At a glance, the selection process looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a3Yp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a3Yp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 424w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 848w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 1272w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a3Yp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png" width="1195" height="561" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:1195,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a3Yp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 424w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 848w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 1272w, https://substackcdn.com/image/fetch/$s_!a3Yp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe12e87c0-37c9-43c4-b488-83b0cc5f8a87_1195x561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;ll walk through each of the branches in a minute, but one thing to note is that each of the leaf nodes contains variants that you can use to ensure the cross-validation score match the generalization.</p><h2>Lots of Data? - True</h2><p>If you have a lot of data, your cross-validation usually is very simple.</p><p>It will consist most of the time of doing a simple hold-out cross-validation, where about 80% of your data is used for training and 20% is used for testing.</p><p>The reason for that is that both the training and test sets are so large that they are equally representative of the data in general.</p><p>Let&#8217;s use our production trick to illustrate this: </p><p>If you were training a large language model to do autocomplete based on user generated prompt, the setup would look like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YYy5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YYy5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 424w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 848w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 1272w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YYy5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png" width="1137" height="549" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:549,&quot;width&quot;:1137,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YYy5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 424w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 848w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 1272w, https://substackcdn.com/image/fetch/$s_!YYy5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff75c1a8c-c1c9-4003-93da-a0f5011b35a4_1137x549.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The data generator is an independent human that wrote a bit of text with a missing word. The model is GPT and the output is the same prompt, but with the most likely word added at the end.</p><p>In that particular scenario, if you train with enough data like <a href="https://commoncrawl.org/">common crawl</a> billions of internet pages or Reddit discussions, you get enough data generators (i.e. independent humans) to have a similar distribution however you cut the data during testing.</p><p>Therefore, when your model is faced with the showcased input in production, it isn&#8217;t that much different from how it was trained.</p><p>Disclaimer: This might not hold all the time, for instance when you have very very strong dependency between data points or when the distribution is completely skewed during testing (i.e. someone prompt in a language the model never saw before). <br>But in most cases it generally does.</p><h2>Lots of Data? - False</h2><p>Now, when you don&#8217;t have lots of data it means that you have to be very careful in how you do the cutting and validation during training.</p><p>The first question to ask yourself is if your observations are independent of each other or not.</p><blockquote><p>Assuming that some data is Independent and Identically Distributed (i.i.d.) is making the assumption that all samples stem from the same generative process and that the generative process is assumed to have no memory of past generated samples.</p></blockquote><p>This assumption is common in machine learning but it very rarely holds 100%. What you want to check here is if the independent and identically distributed assumption <em>kinda hold.</em></p><h2>Observation Independent? - True</h2><p>So let&#8217;s say it does hold.</p><p>Here it means that whatever is generating the data is generating each of them independently without a strong correlation between data points.</p><p>Let&#8217;s consider the following example:</p><blockquote><p>Given a newly created product on a store, predict whether its first review will be positive or negative.</p></blockquote><p>In this case, the data generator is the user base which are creating a single review for a new product. The new product meta information is the input to the system. The model was trained with these reviews in some ways and the label is binary.</p><p>In that particular case, if we assume that no one group of users is dominating the review process for the first review of a new product, we can say it&#8217;s IID.</p><p>This would have been perfect for a hold-out split if we had Amazon level of data, but if we don&#8217;t we need to use Kfold to generate a proper estimate of the model generalization performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MGUw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MGUw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 424w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 848w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 1272w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MGUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png" width="1260" height="648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:648,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108588,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MGUw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 424w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 848w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 1272w, https://substackcdn.com/image/fetch/$s_!MGUw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3cd9f6d9-813d-472b-bd51-030feaab9e06_1260x648.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>KFold is super simple. </p><p>Chunk your data into equally sized blocks, train on one bunch of blocks, and test on the other. Then you alternate which block is the testing block and which ones are the training blocks.</p><p>Usually, 5 to 10 folds are good enough to have a good estimate of the performance of the model.</p><p>There are other variation of KFold like stratified and shufflesplit, depending if your classes are severely imbalanced or if you don&#8217;t have much data.</p><h2>Observation Independent? - False</h2><p>But, let's say that the IID assumption doesn&#8217;t hold at all.</p><p>The first thing to figure out is if the observations have time dependencies.</p><h2>Observation Time Dependent? - True</h2><p>Time-dependent data points are fairly straightforward.</p><p>You can think of them as being generated by a process that over time will generate points sequentially and with full knowledge of the previous data points.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fbcp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fbcp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 424w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 848w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 1272w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fbcp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png" width="930" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:930,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:138711,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fbcp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 424w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 848w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 1272w, https://substackcdn.com/image/fetch/$s_!Fbcp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a696f80-9695-4723-a6e9-0dc680f38e2b_930x717.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A good example of time-dependent data generation is with stock prices.</p><p>One of the most important pieces of information for stock prices is knowledge about historical stock performance.</p><p>So in production, if you have the following task:</p><blockquote><p>Given previous information about the stock market, predict the next stock price of a given stock.</p></blockquote><p>You have to take into consideration the time aspect whenever cross-validating. Namely, you cannot train on future stock prices (i.e. t = 3,4,5) and past stock prices (t = 0, 1) to predict a stock price (t = 2).</p><p>This will not be possible in production to see the future, therefore when cross-validating you need to &#8220;take a walk through your data&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gzuc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gzuc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 424w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 848w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 1272w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gzuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png" width="1271" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1271,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gzuc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 424w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 848w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 1272w, https://substackcdn.com/image/fetch/$s_!Gzuc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabcc42c6-b6ed-4ecd-9590-6ef399c61a08_1271x667.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Which is what a TimeSeriesSplit does. </p><p>The first fold will have the least data to train on and you will progressively gather more training data. You will always test on &#8220;future&#8221; data in this particular case.</p><h2>Observation Group Dependent? - True</h2><p>Now, your data points might not be dependent on the time domain.</p><p>Maybe it&#8217;s something else that make them correlated, like in our earlier example with disease diagnosis.</p><p>Consider the following situation:</p><blockquote><p>Given labelled EEG data about patients' emotional states predict what an unseen patient's emotional state will be.</p></blockquote><p>If you didn&#8217;t take enough time to explore your data, you might not realize that the data generators (i.e. the brain of the participant) are creating very correlated data points.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U1PE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U1PE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 424w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 848w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 1272w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U1PE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png" width="1283" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1283,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:322963,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U1PE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 424w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 848w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 1272w, https://substackcdn.com/image/fetch/$s_!U1PE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80aed805-23cf-40fe-ba30-18fa80971bf4_1283x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The process kind of looks like the above.</p><p>If you were to run a clustering algorithm on the data point, you will usually have a very easy time to find back which data points belong to which participant.</p><p>Therefore, if we use our production trick we will have something like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LJF7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LJF7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 424w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 848w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 1272w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LJF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png" width="1095" height="350" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a373c941-af22-43a2-8e6e-98a398638b30_1095x350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:350,&quot;width&quot;:1095,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LJF7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 424w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 848w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 1272w, https://substackcdn.com/image/fetch/$s_!LJF7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa373c941-af22-43a2-8e6e-98a398638b30_1095x350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The data generator is a <strong>new participant </strong>brain that is generating EEG signals. The model was trained on previous participants and had never seen this participant before. The output labels are the emotional states of this new participant across time.</p><p>The perfect methodology for this is GroupFold</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YWD3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YWD3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 424w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 848w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 1272w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YWD3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png" width="1266" height="659" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:659,&quot;width&quot;:1266,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:97367,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YWD3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 424w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 848w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 1272w, https://substackcdn.com/image/fetch/$s_!YWD3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F509ff569-c795-4432-97f6-9ceaa21c610a_1266x659.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s the same idea as KFold except that the groups always stay together in either the training or the testing set. In our example, a group is a participant generating multiple data points. Meaning that during testing you can be 100% sure the model never saw that group before.</p><h1>Conclusion</h1><p>All in all, don&#8217;t forget the simple trick to think hard about how your model will be used in production. Because at the end of the day, it&#8217;s kind of the only thing that matters.</p><p>If you want to explore the subject a bit more I highly recommend the following resources:</p><ul><li><p><a href="https://machinelearningmastery.com/k-fold-cross-validation/">A Gentle Introduction to K-fold Cross-Validation</a> (Machine Learning Mastery)</p></li><li><p><a href="https://scikit-learn.org/stable/modules/cross_validation.html">3.1. Cross-validation: evaluating estimator performance</a> (sklearn docs)</p></li></ul><p>I hope this was useful, if you have any questions don&#8217;t hesitate to shoot me a DM.</p><p>Have a great rest of the week everyone! &#127801;</p><p>Best,<br>Yacine Mahdid</p>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is learning with yacine .]]></description><link>https://www.yacinemahdid.com/p/coming-soon</link><guid isPermaLink="false">https://www.yacinemahdid.com/p/coming-soon</guid><dc:creator><![CDATA[Yacine Mahdid]]></dc:creator><pubDate>Thu, 16 Nov 2023 18:53:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Xq9Y!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F704b7ed5-15f6-4fb8-b8f4-46d68ab893d2_200x200.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is learning with yacine .</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.yacinemahdid.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.yacinemahdid.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>