Post

The Aha Moment (Learning & Discovery) - On Cognitive Architectures and Learning

Consider how Archimedes had his eureka moment. This is mine. Over time, my learning has become less about accumulation and more about validation, about seeing patterns emerge that I've been sensing but couldn't quite articulate. It all started with a simple observation during a hackathon when I came across a paper on small language models.

The Aha Moment (Learning & Discovery) - On Cognitive Architectures and Learning

The Genesis: Small Models and Cognitive Architecture

I was bullish about smaller models for a practical reason: I don’t have unlimited GPU access for training and fine-tuning on my own, outside the structure of a company or institution. But constraints often force clarity. If I couldn’t train massive models, what should a small model actually prioritize? The answer kept circling back to something fundamental: a model should focus on cognitive abilities rather than trying to squeeze all of Wikipedia or the entire internet into its parameters.

This intuition aligns with what the Phi 1.5 model attempted with their “textbook is all you need” approach, but I think it goes deeper than data quality alone. It’s not just that language modeling should show emergent reasoning behavior. It’s that we need to think about how human cognition actually works, how it evolves over time. And when I started thinking this way, I naturally began retrospecting on my own cognitive progress over the last few months, which I could visualize quite closely because of the relatively controlled environment I’ve maintained. Less distraction, more focused reading and reflection, more deliberate mental model building.

The parallel became clear: a small language model shouldn’t be a compressed encyclopedia. It should be like a school kid who deeply understands their mother tongue and has developed the fundamental ability to process and think through whatever input comes in, whatever sensory information arrives. This mirrors the usual cognitive architectures we see in theories of mind, in Yann LeCun’s illustrations of how intelligence might actually work. If we want to achieve this through a smaller model, that model shouldn’t have enormous dependencies on massive datasets full of facts that may not even be required for its particular purpose.

Think about it this way: we train models on the entire web, the full corpus of internet data. But a smaller model dedicated to a specific field of tasks doesn’t necessarily need to learn all of that. It needs the fundamental ability to understand, process, and think. When new information comes in, it needs to integrate that effectively. Maybe if we connect this with imagination, with the ability to simulate and extrapolate, that becomes the model’s superpower. The point isn’t to fine-tune on a specific dataset and slap LoRA on it afterward. The point is that if the model fundamentally has these cognitive abilities, it becomes easy for it to learn new things.

Learning as Non-Markovian Discovery

Whenever I learn something new now, whether it’s the fundamentals of causal inference or reinforcement learning through my coursework, something interesting happens. I start answering old questions that have been developing in my mind, sometimes for years. It’s like a Gaussian process where each new data point adjusts the probabilistic posterior. Each new concept I focus on recalibrates my understanding of problems I encountered long ago.

This isn’t Markovian learning where today builds linearly on yesterday. Sometimes learning reaches back and illuminates problems from three years ago. For example, when I was working at Informatica, we had this fascinating problem about understanding customer behavior, about how to capture it as a latent variable. We explored several patterns but couldn’t scale further because of certain constraints. Now, with what I’m learning, that whole problem has new answers. If I had to solve it today, I would approach it completely differently.

This is what an aha moment really is. It’s not discovering something entirely new to the world. It’s rediscovering something that already exists, something already proven, but which was not known to you. Your brain, your thought process, your cognition is actively trying to reinvent or rediscover it through your own path. And that’s actually a beautiful sign of genuine learning.

Take the concept of world models as another example. Everyone talks about world models for video generation or code synthesis. But coming from a system dynamics background, from control engineering, my understanding of world models is fundamentally different. For me, a world model should encode the actual behavior of a system, the dynamics that govern how it evolves. If I want to model the world as a system of equations, I should be able to ask: if I pursue my dream this way, what happens ten steps down the line?

Sometimes my dreams functioned exactly like this, as simulations where I could test decisions and see their downstream consequences. My entire statement of purpose writing journey had this quality. Dreams would surface interactions and outcomes, helping me construct coherent narratives about my trajectory, about possibilities that hadn’t happened in real life but might happen in the future. This long-range validation capability, this ability to simulate forward, that’s what a proper world model should provide. Not just pattern completion, but genuine dynamical understanding.

Personality Over Patterns: The Path Forward

There are essentially two ways people approach learning in this field, and I can see them very clearly now. One path is following tutorials, building what others build, implementing Transformers from scratch because someone else did, jumping on Node.js projects because they’re trending. This is replication work, and it’s valuable, especially when you’re a fresher with ambition but no clear direction. It’s a form of deliberate practice.

But there’s another path: deep, collective reading combined with continuous thinking about fundamental problems, utilizing past experience in non-obvious ways. This path doesn’t just build skills for immediate projects. It answers the accumulated questions you’ve been carrying, sometimes for years. And this is what learning actually means to me after working in industry. Doing a master’s degree makes no difference if I’m just going to follow what beginners are doing, if I’m just going to replicate what’s already been done.

The real insight, and maybe this applies to models too, is that genuine reasoning requires this ability to have aha moments. You can see it in systems like DeepSeek, which committed to this path of deeper reasoning capability. When your independent thinking converges with what’s actually progressing in cutting-edge work, when your learning aligns with real advances in the field, that harmonic resonance is incredibly motivating. It tells you not to worry too much about signals from the ecosystem that might dismiss what you’re working on in a particular moment. Sometimes you work on something interesting and the world doesn’t immediately acknowledge it. That’s fine. It’s still progress. It’s still evolution.

As I’ve mentioned in previous reflections, I’m going through a kind of denoising process in my learning, and it’s giving me increasing clarity over time. It’s not that I knew nothing before. But earlier, so much of what I learned was procedural knowledge. Now I can visibly see something different happening. When we chase things that don’t fundamentally align with our working principles, we struggle and wonder why we’re not able to do certain things. But when we recognize that our real ability is in higher-order thinking, in synthesis and connection-making, that realization becomes incredibly powerful.

This kind of self-introspection, this practice of attempting to create new questions rather than just consuming existing answers, has become both relaxing and generative for me. It’s not always about getting introduction sessions to new concepts or treating ideas as distant and foreign. It’s about thought and effort. The thought process itself enables quick grasping of concepts, finding utility, understanding what’s happening beneath the surface. You’re drawing two parallel lines: maintaining curiosity and generating new questions while also recognizing that when answers already exist, your job is to come up with new variants of questions that require different sampling processes to answer.

And this matters because when you develop your own principle or strategy for seeing things, when you establish your own thought process, you stop being swept along by what everyone else is doing. If you open LinkedIn or Twitter, you see endless similar patterns. Everyone wants to be an influencer following the same path, posting videos, doing photoshoots, implementing attention mechanisms on day one, sharing code snippets on day two. There’s nothing wrong with that, but there’s also another way.

When you’ve developed some personality in your field, when you’ve worked on multiple things and gotten results, when you have what I think of as free energy because you’re not constantly chasing the next trend, that’s when your accumulated experience starts guiding you differently than the crowd. And that difference isn’t a disadvantage. It’s your edge. The procedural knowledge gives way to something more fundamental. The personality you’ve developed, the career you’ve built over years, starts providing genuine guidance rather than just surface-level skills.

That’s the real aha moment, the real evidence I can see now. This isn’t about being contrarian for its own sake. It’s about recognizing that your path, informed by your unique constraints and experiences, by your systems engineering background and your industry problems and your particular way of connecting ideas, that path is valid. More than valid. When you follow it rigorously, when you trust your thought process while remaining open to evidence, you often end up discovering the same truths that frontier research is discovering, just from a different angle. And that convergence, that validation, that’s what tells you to keep going, to keep proving your hypothesis, to trust the free energy you’ve cultivated. It’s quite good, I would say.

This post is licensed under CC BY 4.0 by the author.