A case study based introduction to using Bayes rule and how it compares with a frequentist, pessimistic and optimistic approaches to drawing conclusions

Photo by Robert Ruggiero on Unspalsh

This post will help you understand Bayesian inference at an intuitive level with the help of a simple case study. I hope that once you read this article, you will be very clear on how the well-known “Bayes theorem” is used, what do the terms in the theorem mean (prior, posterior, likelihood) and how this compares with other approaches to decision making (pessimist /optimist/frequentist). We will use a simple case study to help explain the concepts. For those who are interested, I have provided simulation results for the given case study and a link to R code for further exploration…

Getting Started

A case-study based introduction to ROC, how to set the threshold, its relation to sensitivity/specificity/precision/recall/accuracy

The field of machine learning can broadly be categorised into supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses previous examples with known outputs to determine an appropriate mathematical function to solve a classification or a regression problem. This post focusses on ROC (Receiver Operating Characteristics) curve that is widely used in the machine learning community to assess the performance of a classification algorithm. This post will help you intuitively understand what an ROC curve is and help you implement it in both R and Python. Specifically, the objectives of this post are:

(i) To provide you with an…


Lesson 2: Being right is not enough, you have to be convincing

Photo by Terence Burke on Unsplash

Had the correct scatterplot or data table been constructed, no one would have dared to risk the Challenger in such cold weather. Edward Tufte

It was supposed to be a landmark day in modern history.

The first civilian (a high school teacher named Christina McAuliffe) was selected to go into space. There was even a possibility of a televised conversation between her and President Reagan during the annual State of the Union address, due on the same day in 10 hours. Instead, the space shuttle exploded 73 seconds after launch killing all the seven astronauts on board.

A day before…


What should you do to avoid misuse when assessing the performance of a classifier.

Photo by Kevin Jarrett on Unsplash

Would you consider a classifier with 99% accuracy to be good or bad? A good chunk of people would assume that the correct answer is “good". It’s shocking!. The correct answer is neither.

Let that sink in.

I have been in numerous meetings where people assume that a model must be good if the associated performance metric is above a specific value. As a data scientist, you should know better.

Let me explain. No data scientist should ever have, without context, any threshold for any performance metric in mind to suggest whether it is good or not. Data science is…


4 reasons why data science is here to stay and what you need to do to ensure that your skillset stays in demand.

Photo by michael podger on Unsplash

As someone working in data science for over a decade, it is frustrating to see people prophesying on how the field will get extinct in 10 years. The typical reason given is how emerging AutoML tools will eliminate the need for practitioners to develop their own algorithms.

I find such opinions especially frustrating because it dissuades a beginner from taking data science seriously enough to excel in it. Frankly, it is a disservice to the data science community to see such prophecies about a field where the demand is only going to increase even further!

Why would any sane person…


Let go of any doubts or confusion, make the right choice and then focus and thrive as a data scientist.

Photo by AMIT RANJAN on Unsplash

I currently lead a research group with data scientists who use both R and Python. I have been in this field for over 14 years. I have witnessed the growth of both languages over the years and there is now a thriving community behind both.

I did not have a straightforward journey and learned many things the hard way. However, you can avoid making the mistakes I made and lead a more focussed, more rewarding journey and reach your goals quicker than others.

Before I dive in, let’s get something out of the way. R and Python are just tools…

A beginner-friendly derivation of SVM equations with intuitive explanations

Orthogonal Projection gives us the shortest distance from a point to a plane, exactly like what we are used to in our daily lives (Image by Author)

“True knowledge comes with deep understanding of a topic and its inner workings.” Albert Einstein

Why should I bother learning about the mathematics behind SVM?

There are numerous libraries available that can help me use SVM without having to worry about the underlying concepts.

Sure! You are absolutely right. You don’t need to understand SVM to be able to use it. You can even apply SVM to solve classification problems without requiring to understand the fundamental concepts behind it. There are a large number of tools readily available. Nobody ever implements SVM from scratch in a real-world project anymore.

However, if you are an aspiring data scientist, you should spend some time learning these foundational concepts to understand how various algebraic and geometric arguments culminate into an algorithm.



The 2x2 grid you should master to upskill your probability skills and never again confuse permutations with combinations

Photo by Sharon McCutcheon on Unsplash

Do you know how to count? Can you count on your fingers? I am not talking about large numbers here. Just numbers in situations that you probably come across every now and then. Surely, you must be able to count. After all, the ability to count is a fundamental skill. Why not have a quick self-test to evaluate yourself? Read on!

How many five-letter words can you make from the letters in the word CHALLENGING? The answer is 6,930. Perhaps this isn’t interesting and a tad nerdy. Let’s move on…

Imagine you are at a birthday party with 22 other…

A look at evidence from over 20 million first doses in the UK to find out if the vaccine leads to blood clotting

Photo by Hakan Nural on Unsplash

Disclaimer: The analysis undertaken and the views expressed in this article are not endorsed by my employer or anyone else. This article is intended for informational purposes only and it is not a substitute for professional medical advice, diagnosis, or treatment.

There are many reasons why any medical event can happen following a vaccine. Not everything that happens to a person would necessarily be caused by the vaccine itself. It is possible that it may well be a mere coincidence that a person experiences a medical event after a vaccine, that the vaccine itself did not cause. We, therefore, need…

part 1 of a sequence, brief articles that provide a comprehensive introduction; no prior knowledge assumed

SVM attempts to find the boundary with the maximum margin (Image by Author)


This is part 1 of a 5-sequence, short articles that provide a comprehensive introduction to Support Vector Machines (SVM). The objective of the series is to help you thoroughly understand SVM and be able to confidently use it in your own projects. The sequence assumes no prior knowledge of Machine Learning (ML) and familiarity with high-school mathematics is sufficient to follow along and understand.

The Big Picture

If you are an absolute beginner in Data Science and do not yet understand the difference between Supervised Learning or Unsupervised Learning, then I suggest you read my earlier article that assumes absolutely no prior knowledge…

Ahmar Shah, PhD (Oxford)

Scientist (several research publications in prestigious journals such as The Lancet, Brain, Thorax, IEEE Transactions), love writing for meaning & impact…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store