Machine learning toolkit

I was asked now and then about coursers or some stuff to learn about machine learning (ML), in my view it’s ok to go directly to the point but I’d recommend some training on:

Coding
Machine learning methods
Design or software engineering principles
Maths understanding: statistics, linear algebra and calculus

By the way, to everyone that thinks current AI hype is not ML, you are wrong AI foundation models are based on deep learning architectures that are based on large artificial neural networks. This is why, as an advanced practicioner, you should have minimum notions on the groundings of it. In any case, it’s fair to say that latest models are created with new complex architectures, so sometimes is not that easy to draw a straightforward relationship between “old” stuff and “new” stuff (believe me, they are more similar that what it seems).

Source: generated by gpt4o

Yes, there are many things left that could come in handy to a machine learning practicioner, like model repository or reproducibility tools. I’ll talk about them in further posts.

Machine learning methods

If you like reading this is one not to miss The hundred-page machine learning book, it’s quite short easy-to-read book that covers the essential topics on machine learning.

For deep learning (as I said the most popular technique these days), I suggest, to start off, 30 Essential Questions and Answers on Machine Learning and AI from Sebastian Raschka, it covers the main questions on the basics of neural networks, its applications to computer vision, natural language processing and the deployment questions.

be water my friend

The creation of a good machine learning application is not trivial, you can meet data issues, fitting problems or deployment difficulties. To my to cents here is: When modeling, start simple and go complex, and always keep a healthy balance between complexity with precision.

If you do not like reading, you can resort to some MOOC materials like:

Coursera: Machine Learning (Stanford University, Andrew Ng) it introduces ML with a focus on practical applications and ties in statistical and linear algebra concepts (e.g., regression, matrix operations), aligning with the post’s emphasis on accessible ML with mathematical grounding.
edX: Introduction to Machine Learning with Python (IBM) covers ML fundamentals, including classification, regression, and clustering, using Python libraries like Scikit-learn and Pandas.

Coding

As a R guy, it is sad to say that these days Python has become the dominant coding language, particularly in data science, machine learning, and artificial intelligence. Its popularity has largely stemmed from its simplicity, versatility, and vast ecosystem of libraries, such as NumPy, Pandas, TensorFlow, and PyTorch.

It’s fair to say that languages like R for statistical analysis, MATLAB and Octave were better suited mathematical modeling (especially the coverage, documentation and formality of the methods), or Julia for performance-critical scientific computing. But for practical reasons (due the deep learning technical complexities), the “engineering” side has won this battle, and many of the methods have been implemented in python.

Below there are some interesting links you you want to take the plunge on coding:

Coursera: Python for Everybody (University of Michigan): Description: A beginner-friendly specialization covering Python basics, data structures, and data analysis. Great for new users transitioning to Python, with practical exercises in data manipulation.
edX: [CS50’s Introduction to Programming with Python (Harvard University)](https://www.edx.org/course/cs50s-introduction-to-programming-with-python: A comprehensive course introducing Python programming, including functions, data structures, and file I/O, with a focus on problem-solving.
Udemy: Python Bootcamp From Zero to Hero in Python: A beginner-to-intermediate course covering Python fundamentals, data analysis, and an introduction to machine learning libraries.
freeCodeCamp: Python for Beginners (YouTube): A free, 4-hour video course covering Python basics, including loops, functions, and basic data analysis with libraries.

Apart from knowing how to code in python, I recommend to learn how to use “notebooks”, the use of data frames (pandas, polars) and SQL as a lingua franca for data engineering. And last but not least rely on AI assisted coding (codex, cursor, copilot, cline, etc), it can help you create new pieces of code, but also explain some existing code or give you tips on how to tackle a development task.

vibe with caution

AI coding assistants are great tools, but it’s important to know what your application is doing (especially when you are a beginner), so do not “vibe” all the time, try to understand what you are doing.

Design principles

One of my preferred pieces of reading is Designing Machine Learning Systems, in this book you can have a broader view of what putting a model in production entails. You can see how important data curation is for the quality of your application, no matter how simple or complex the model would be. What’s more the nature of data is dynamic, so the model is. To me when it comes to designing an application based on ML you should think beforehand in:

What’s the problem to solve?, Can your goal be represented as one or more variables? (sometimes not).
Is solving this problem with ML worth it?
How are you measuring your solution is good?, think of technical measures (error) or impact ones (money, time savings, …)
When modeling, start simple and go complex
After you code goes live, are you sure your model if working well?, if not are you ready to tweak it quickly?, what’s the impact of doing nothing

You can read a detailed summary of the book at my note Designing Machine Learning Systems. I also recommend to view the note on dataset engineering, Dataset engineering made easy.

It’s the process, not just the outcome

The most important matter when designing a machine learning application is understanding the underlying process. ML will help you with automation, but if you’re replicating a broken, or ill-designed process with steroids … let me tell you the prospects are not any good. The reality is that in many occasions you must redesign (or just define) it along the way, be cautious (or run away) if nobody is willing to change a comma in current processes.

Maths

Machine learning models have a strong mathematics grounding. However you don’t need to be a seasoned mathematician or demonstrate tons of theorems to use them, but you need some maths understanding, at least in three scopes: statistics, linear algebra and calculus. Below I include some introductory courses (free or in MOOC platforms), that won’t become you an expert in the matter, but give you the tools to have an initial understanding.

Statistics

Coursera: Basic Statistics (University of Amsterdam): provides the statistical foundation needed for understanding machine learning model evaluation and data analysis, directly supporting the post’s emphasis on statistics for ML.
Khan Academy: Statistics and Probability beginner-friendly and ideal for building intuition around statistical concepts critical for machine learning, such as data distributions and hypothesis testing.

Linear algebra

[edX: Linear Algebra - Foundations to Frontiers (University of Texas at Austin)]( https://www.edx.org/course/linear-algebra-foundations-to-frontiers: covers linear algebra essentials like matrix operations, crucial for understanding ML algorithms (e.g., neural networks, PCA), as highlighted in the post.
Coursera: Mathematics for Machine Learning: Linear Algebra (Imperial College London) tailored for ML, it bridges the gap between linear algebra theory and its use in models, aligning with the post’s focus on practical math for ML.

Calculus

Khan Academy: Calculus 1: provides the calculus foundation (e.g., gradients, optimization) needed for understanding machine learning algorithms like gradient descent, as noted in the post.
Coursera: Mathematics for Machine Learning: Multivariate Calculus (Imperial College London)emphasizes calculus applications in ML, such as backpropagation and optimization, making it highly relevant to the post’s scope.

Don't get overwhelmed

Notation or the subject itself may seem complicated. So don’t be overwhelmed by the theory and go slowly step by step, remember that the important thing is to understand the basics, apply them, and build an iterative process of continuous learning.

David Rey

Explorer

Machine learning toolkit

Machine learning methods

Coding

Design principles

Maths

Statistics

Linear algebra

Calculus

Graph View

Table of Contents

Backlinks

Latest Posts

From sandboxed to boardroom

Hybrid crews

The microshift revolution

Supply chain copilots

Opportunity or Squeeze