I was asked now and then about coursers or some stuff to learn about machine learning (ML), in my view it’s ok to go directly to the point but I’d recommend some training on:

  • Coding
  • Machine learning methods
  • Design or software engineering principles
  • Maths understanding: statistics, linear algebra and calculus

By the way, to everyone that thinks current AI hype is not ML, you are wrong AI foundation models are based on deep learning architectures that are based on large artificial neural networks. This is why, as an advanced practicioner, you should have minimum notions on the groundings of it. In any case, it’s fair to say that latest models are created with new complex architectures, so sometimes is not that easy to draw a straightforward relationship between “old” stuff and “new” stuff (believe me, they are more similar that what it seems).

Source: generated by gpt4o

Yes, there are many things left that could come in handy to a machine learning practicioner, like model repository or reproducibility tools. I’ll talk about them in further posts.

Machine learning methods

If you like reading this is one not to miss The hundred-page machine learning book, it’s quite short easy-to-read book that covers the essential topics on machine learning.

For deep learning (as I said the most popular technique these days), I suggest, to start off, 30 Essential Questions and Answers on Machine Learning and AI from Sebastian Raschka, it covers the main questions on the basics of neural networks, its applications to computer vision, natural language processing and the deployment questions.

be water my friend

The creation of a good machine learning application is not trivial, you can meet data issues, fitting problems or deployment difficulties. To my to cents here is: When modeling, start simple and go complex, and always keep a healthy balance between complexity with precision.

If you do not like reading, you can resort to some MOOC materials like:

Coding

As a R guy, it is sad to say that these days Python has become the dominant coding language, particularly in data science, machine learning, and artificial intelligence. Its popularity has largely stemmed from its simplicity, versatility, and vast ecosystem of libraries, such as NumPy, Pandas, TensorFlow, and PyTorch.

It’s fair to say that languages like R for statistical analysis, MATLAB and Octave were better suited mathematical modeling (especially the coverage, documentation and formality of the methods), or Julia for performance-critical scientific computing. But for practical reasons (due the deep learning technical complexities), the “engineering” side has won this battle, and many of the methods have been implemented in python.

Below there are some interesting links you you want to take the plunge on coding:

Apart from knowing how to code in python, I recommend to learn how to use “notebooks”, the use of data frames (pandas, polars) and SQL as a lingua franca for data engineering. And last but not least rely on AI assisted coding (codex, cursor, copilot, cline, etc), it can help you create new pieces of code, but also explain some existing code or give you tips on how to tackle a development task.

vibe with caution

AI coding assistants are great tools, but it’s important to know what your application is doing (especially when you are a beginner), so do not “vibe” all the time, try to understand what you are doing.

Design principles

One of my preferred pieces of reading is Designing Machine Learning Systems, in this book you can have a broader view of what putting a model in production entails. You can see how important data curation is for the quality of your application, no matter how simple or complex the model would be. What’s more the nature of data is dynamic, so the model is. To me when it comes to designing an application based on ML you should think beforehand in:

  • What’s the problem to solve?, Can your goal be represented as one or more variables? (sometimes not).
  • Is solving this problem with ML worth it?
  • How are you measuring your solution is good?, think of technical measures (error) or impact ones (money, time savings, …)
  • When modeling, start simple and go complex
  • After you code goes live, are you sure your model if working well?, if not are you ready to tweak it quickly?, what’s the impact of doing nothing

You can read a detailed summary of the book at my note Designing Machine Learning Systems. I also recommend to view the note on dataset engineering, Dataset engineering made easy.

It’s the process, not just the outcome

The most important matter when designing a machine learning application is understanding the underlying process. ML will help you with automation, but if you’re replicating a broken, or ill-designed process with steroids … let me tell you the prospects are not any good. The reality is that in many occasions you must redesign (or just define) it along the way, be cautious (or run away) if nobody is willing to change a comma in current processes.

Maths

Machine learning models have a strong mathematics grounding. However you don’t need to be a seasoned mathematician or demonstrate tons of theorems to use them, but you need some maths understanding, at least in three scopes: statistics, linear algebra and calculus. Below I include some introductory courses (free or in MOOC platforms), that won’t become you an expert in the matter, but give you the tools to have an initial understanding.

Statistics

Linear algebra

Calculus

Don't get overwhelmed

Notation or the subject itself may seem complicated. So don’t be overwhelmed by the theory and go slowly step by step, remember that the important thing is to understand the basics, apply them, and build an iterative process of continuous learning.