Post

Metamorphosis and Unlearning - What I am rooting for? - One Month into MS

I recently read a quote by Confucius—I'm paraphrasing, but the gist is: it's always better to sharpen your tools before getting ready for a competition or task. (transcribed)

Metamorphosis and Unlearning - What I am rooting for? - One Month into MS

One month into my MS program, I want to title this reflection as Metamorphosis and an Unlearning Experience.

Why Metamorphosis?

I’ve always thought of my journey as a metamorphosis. Looking back at my three-year career, the first year—the zeroth year—I was quite clueless about what I was going to do. Over the following years at my company, I learned to understand problems. Specifically, how to understand a problem statement. These were industry problems, truly real-world problems at large scale, though they sometimes didn’t require theoretical developments.

The Ambitious (Unrealistic/Underwhelming?) Goal

When I registered for courses, I had this ambitious—maybe quite unrealistic—idea: we should keep building projects. This is what I had in mind.

But when I went through the subject courses, all four that I’ve registered for are math-heavy, very strong in foundations, requiring in-depth learning and practice. Which means the typical urge of “okay, we have to build projects” is quite underwhelming.

Let me explain with an example. A lot of people taking machine learning courses think from a fresher’s perspective: “Let’s build machine learning projects. Let’s keep building.” This week the professor teaches gradient boosting, and they immediately think, “Can we build a gradient boosting algorithm?” But that’s not a project—it’s a small practice exercise. You open a Python notebook and develop it.

The problem is the quantum of learning is quite slow and small. It may not be transferable because it’s just one notebook with typical data science loops—doing feature engineering, calculating metrics. This doesn’t help you understand the problem statement or construct nuanced solutions.

Orders of Problem Solving

I’ve started categorizing problem-solving into different orders: zero, first, second, and higher-order.

Zero-Order Problem Solving

Someone asks you to predict a forecasted value. The problem is simple enough that you use linear regression, blindly run random forest, or apply a multi-layer perceptron. It fits and gives you a better F1 score or squared error. You’re working linearly, and the problem doesn’t demand much thought. You already know the list of algorithms, you explore and switch between them. With tools like Cursor or OpenAI, it’s just 100-200 lines of code.

First-Order Problem Solving

This requires effort in parameter tuning, combinations of approaches—data processing, the typical Kaggle workflow, or industry setups where you experiment with different algorithms, hyperparameter tuning, and feature engineering. But mostly, especially in industry, you work on data preparation. More than crafting or tuning algorithms, it’s about preparing data.

In my recent startup experience, we couldn’t change anything on the algorithmic side. It was mostly configurations and curating datasets, working on data preparation. This is first-order, or between first and second-order problem solving, requiring you to control or alter multiple dimensions of the problem.

Higher-Order Problem Solving

The problem statement is very vague at the beginning. Complete uncertainty about what you’re solving, what accuracy or performance metrics to use, which algorithms to employ—nothing is available in a straightforward manner. You need to spend significant time on problem definition itself.

Most problems I worked on in industry were partly higher-order because customers came with vague statements and we applied first-principle thinking. The effort went into problem definition, not always into crafting unique solutions. The solutioning or algorithmic side was often between first and second-order. But true higher-order problems are research problems or experimental statements requiring substantial effort in both defining the problem and crafting the solution on theoretical grounds.

Real Examples of Higher-Order Problems

You might do ensemble modeling, stacking—building multiple models, getting predictions, then building a meta-model to combine them. There’s blending and other techniques. Sometimes people use one model to extract features and train another model. These are creative approaches, but sometimes standard, sometimes straightforward.

But consider higher-order problems: understanding a system—an electromechanical or societal system—and predicting black swan events. These are rare surprises where you may not have enough data points to train a model. Consider fraudulent systems, which I’ve mentioned in previous blogs.

For industrial systems, we may lack sufficient data points and cannot straightforwardly use linear regression or neural networks to predict when attacks or black swan events will occur. This needs better problem understanding. You might need to think at the level of probabilistic distributions or models, consider how to simulate or collect data, and perform extensive theoretical validation.

The same applies to what big tech companies develop. Their forecasting models aren’t simple ARIMA implementations. Their recommendation networks aren’t straightforward Python framework applications.

Connecting Back to Building Projects

So when we say “let’s build a project,” we must consider these different problem categories. And we have subjects focusing on mathematical and theoretical rigor. The professor might teach VC dimension, PAC learning, or mixture models.

Navigating Python frameworks on specific datasets won’t give us mathematical rigor. It deflates our enthusiasm and degrades our learning because we have a rich curriculum but limit ourselves to simple projects.

Instead—especially for me—the focus should be on theoretical and mathematical exercises. Not “let’s build a product” or “let’s build something.” Those aren’t cumulative, they don’t add much. Maybe that can be a side gig. But the main element of learning should be deliberate practice: small chunks of exercises, derivations, proofs, understanding assumptions, studying lemmas. That’s the real coursework. This is how we understand, interact with subjects, and compare different approaches.

The Linear Regression Example

I’ve talked about 50 shades of linear regression in a previous blog. Take linear regression from a statistical inference class versus a machine learning class. You might use sklearn (scikit-learn) or statsmodels.

You can initialize linear_regression.fit() or OLS.fit(), run it on the dataset, and get output. With linear regression, you may not get the regression summary, or you may not look at it. But OLS provides detailed regression summaries: likelihood, log-likelihood, AIC, BIC. Few dive deep into these.

Statistical classes teach you to interpret beta values and different statistical tests. But machine learning courses treat regression casually—they teach stochastic gradient descent and you focus on MSE values.

Yet regression has depth. It involves testing hypotheses and using the right inference techniques. OLS is closed-form estimation using ordinary least squares—straightforward matrix multiplication and inversion to calculate beta values. Stochastic gradient descent uses iterations. But there are also second-order numerical methods, second-order optimization techniques like Newton methods. Even in logistic regression or other statistical models, most utilize these second-order methods. We often overlook this rich theoretical foundation.

The Overlaps and Nuances

These are exercises we need to engage with—understanding the nuances of concepts and connecting them. Because I see many different algorithm paradigms exist. Take supervised learning: statistical learning paradigm versus statistical modeling paradigm. Statistical learning theory versus statistical modeling versus probabilistic modeling. All have their own assumptions and nuances, but also significant overlapping content. These are valuable to learn and discuss.

What I Observed This Month

Over this month, I tried to build problem statements and projects. I tried, but they all seemed underwhelming because I’m learning rich content, but projects feel simple in comparison.

My observation: over the semester, one project could emerge as cumulative learning from the atomic exercises I’m doing. A project shouldn’t be radically different. It can be simple but theoretically sound.

The Unlearning Experience

This is why I call this an unlearning experience. I cannot keep thinking, “I’ve done this problem statement, I have that work experience.” I’m refurbishing the problem statement, refurbishing the purpose itself. Previously, I just wanted to build models and prove good accuracy. During my master’s, I’m filling gaps. I should naturally develop the ability to create unique algorithms or approaches.

I see this at the laboratory where I’m working, combining data-driven modeling with Newtonian mechanics for system dynamics. We’re merging neural networks with physics. This demands a different way of thinking.

Another byproduct: I’ve already worked extensively on business problem understanding. Now I’m deep-diving into sound theoretical foundations. When I return to industry, I truly believe my approach to problem statements, practices, and contributions to intellectual rigor will be completely different from the past.

The Transformation

My problem-solving approach is undergoing metamorphosis. My learning and personality are transforming too. It might look ugly initially—that’s how a caterpillar becomes a butterfly.

I need to trust the process. Take simple, cumulative, progressive steps. Not vague projects that end in two or three hours.

Looking at Other Projects

Most projects I’ve reviewed—capstone projects from other students—fall between zero and first-order. Few reach first-order. Mostly they analyze data and build models, but the building doesn’t require deep craftsmanship. You just build the model.

Unless projects involve prompt engineering or RAG systems requiring extensive validation, architecture design, and data preparation, they stay between first and second-order.

Higher-order work goes beyond first-order, a bit above second-order. It demands theoretical formulation first, then implementation. The theory could be machine learning, economics, econometrics, or statistical modeling—not just simple linear models. It’s thinking outside the conventional algorithm list.

Some problem statements need something beyond conventional algorithms—they need better problem formulation itself. We shouldn’t immediately jump to algorithms or methodologies upon hearing a problem statement. Otherwise, we’re not putting in real effort. It becomes just sitting for an hour daily or a few hours weekly, building something, finishing the capstone.

What I’m Setting Up For

This might suffice for people just starting with data science or machine learning. But I’m setting up something deeper: focusing on reading original texts, working through textbook exercises extensively, thinking and conceptualizing deeply.

Many real-time problem statements are overlooked or approached superficially. From my industry learning, due to time and resource constraints, we often settled for “this is enough for this problem statement.”

I’m looking deeper now.

This is my metamorphosis. And I’m trusting the process.

This post is licensed under CC BY 4.0 by the author.