Climber Worldwide
Missende data: vertel het hele verhaal met Qlik

Missing data? Survive Survivorship Bias with Qlik

How come some airplanes don’t return from the battlefield? Are the success stories of Bill Gates, Jeff Bezos and Mark Zuckerberg the best learning experiences? And how could people in 1987  think that cats were more likely to survive if they fell from a higher floor? All these questions have one factor in common: they suffer from “survivorship bias”.

WHAT IS SURVIVORSHIP BIAS

If you work a lot with data, this might be a familiar term. Survivorship bias is the phenomenon in which results (or survivors) of a process are treated disproportionately. Incomplete data sets, lack of context or incorrect interpretation of data is often the basis of this misconception. If you understand why survivorship bias occurs and you recognize the effect, it will help you better understand your data and make your analyzes more reliable and valid. In recent history we find numerous examples of this phenomenon, it has affected scientists, entrepreneurs, and researchers, among others.

WHAT DOES A FAIL HAVE TO TELL?

In the book “The Black Swan: The Impact of the Highly Improbable”, Nassin Taleb writes: “The cemetery of failed restaurants is very silent.” But focusing only on success and not looking into the fails will make you miss out on the full scope of your data and not really find understanding of how your processes actually function.

Success stories of entrepreneurs are often used as examples of how things should be done, but in addition to those few success stories, there are a multitude of entrepreneurs who don’t make it. Bill Gates (Microsoft), Jeff Bezos (Amazon) and Mark Zuckerberg (Facebook) are indeed successful in their businesses, but only have one side of the story to tell: how they made it and achieved their success. Many others who may have taken the exact same steps, have the exact same talent and also have shown 100% ambition have failed to make it – and their story is perhaps even more interesting. They can tell you what happened and what caused them to fail. These stories often contain wisdom from which we can deduce why things go wrong, why we fail. Focusing only on the “survivors” will stop you from getting the overall view and finding the flaws in your processes.

“The cemetery of failed restaurants is very silent.” – Nassin Taleb

FALLING CATS – IT’S EASY TO MISS THE BIG PICTURE

Another example of missing the big picture arose in 1987: a group of scientists investigated the likelihood that cats would survive a fall from a certain floor. The researchers based their conclusions on data obtained from veterinary clinics. These data were highly remarkable: the researchers noted that the higher the fall, the greater the chance of the cats survival. In fact, 100% of the cats that had fallen from the sixth floor or higher survived their fall. According to the researchers, this was possible because the cats achieved the maximum fall speed during such a fall, relaxed and then prepared for landing, resulting in a better chance of survival.

The Straight Dope Newspaper disproved this theory 10 years later. In this case there is a definite problem with survivorship bias: the researchers only found data from cats that actually had been treated at veterinary clinics. As there was no information in their data of cats that had fallen from higher floors, the researchers assumed that these cats survived their falls unscathed. However, the circumstance was of course the opposite: these cats died immediately as a result of their fall and were therefore never treated at the veterinary clinics. Resulting in them not being registered and never being part of the data-set.

AIRPLANES DURING WWII – UNCOVERING THE HIDDEN TRUTH

It is 1943: large parts of Europe are occupied by German troops. The allies are trying to get through the enemy’s defense system using airplanes with bombs, but without further success – many planes are shot down and lost. The Center for Naval Analyzes starts looking for a way to reinforce the bombers. To ensure that the aircraft still can take off, the entire machine can’t be reinforced with an extra layer: it’s necessary to choose which parts should have additional armor installed. While the experts from the Center for Naval Analyzes note where the returning planes are most affected, the Statistical Research Group (SRG) of Columbia University is called in.

It’s Abraham Wald, who fled to the U.S in 1938 during the upcoming of the German troops, who comes up with an unexpected conclusion – reinforce the planes where the machines aren’t hit. Wald comes to this finding by stating that planes returning are hit in non-fatal spots: they can return despite damage. The planes hit in other places apparently don’t make it, and that’s why, according to Wald, it’s better to apply armor to these parts of the plane The advice is followed and thanks to the statistical approach of the problem by Wald, the allies gain ground.

“The extra armor belonged not on the part of the plane that could survive a lot of bullets, but to the part of the plane that couldn’t.”  – Abraham Wald

QLIK SENSE MAKES YOUR DATA TRANSPARENT

The cognitive engine of Qlik will help you prevent survival bias. In the image above, all types of Hole Location are selected (green), except “No Holes” (light gray). Qlik clearly shows which selection options in Plane and Status are still available (white) and which are not (dark gray). This selection in Hole Location shows that all airplanes with the status “Shot Down” fall outside the dataset. In other words: airplanes with the shown damage return and this damage proves therefore not fatal. Qlik ensures that you don’t miss any data: by using different colors it becomes very clear what is and what isn’t part of the (selected) data-set. This way you won’t overlook anything during your analysis!

Writer: Ronan Berendsen – BI Consultant Climber
Email: ronan.berendsen@climber.nl

References:
Mangel, M., & Samaniego, F. J. (1984). Abraham Wald’s work on aircraft survivability.
Wald, A. (1980). A Reprint of’A Method of Estimating Plane Vulnerability Based on Damage of Survivors (No. CRC-432).
https://blog.qlik.com/the-hole-story-and-bias-in-ai

News archive

Cloud Data and Analytics Tour International with Climber
Event, On-Demand Webinar

Cloud Data and Analytics Tour International with Climber

Join us to navigate a path to cloud for your analytics. We’ll give you a heads up on the latest features. Hear about our customer FCG  and how they’ve started the transition leveraging Qlik Sense Saas for faster insights. See how you too can make the move, at your own pace, learn how you can enjoy the experience of Qlik as a Service with your on prem solution and in parallel get the benefits of Qlik Sense SaaS functionality – as it’s not a one fits all experience but rather a very individual one. 

>> Sign up here!
Our key take-aways from QlikWorld 2021
Blog

Our key take-aways from QlikWorld 2021

Qlik host many events throughout the year, but QlikWorld is undoubtedly the biggest. For the second year running it was held virtually with a great line-up that showcased new product features and 100 break-out sessions. If you didn’t manage to check-out the event, here’s our key take-aways from a packed agenda.

>> Read the key take-aways
Qlik Sense dashboards enabled fact based decisions at Asics
Customer Case

Qlik Sense dashboards enabled fact based decisions at Asics

We’ve helped ASICS to efficiently structure valuable sales and supply chain data by optimising clear Qlik Sense dashboards. Thanks to this solution, employees don’t have to waste hours finding the data they are looking for.

>> Read more!