Opening up black-box machine learning models with explainable AI (XAI)

What is explainable AI (XAI)?

pawel czerwinski 6lQDFGOB1iw unsplash

“My dog accidentally knocked down the trash and found old cheesy pasta in it, and is now convinced that trash cans provide an endless supply of cheesy pasta, knocking it over every chance she gets.”

Sometimes, you would have seen your Machine Learning(ML) model do the same too.

A notorious example is how a neural network learned to differentiate between dogs and wolves. It didn’t truly learn to differentiate between dogs and wolves, instead it learnt that all the wolf pictures had snow in the background as it was their natural habitat as opposed to dogs with grass in the background. The model then differentiated the two animals by looking at whether the background was snow or grass.

What if a dog was on snow and a wolf was on grass? The model would make wrong predictions.

Why is explainable AI important?

Why should anyone care about a model classifying dogs and wolves wrongly?

A simple model wrongly identifying a dog as a wolf is not scary, right? But imagine a self-driving car wrongly identifying a person as an object and running over it.

While using AI in applications like Self-driving cars and Robot assistants, the machine should not only learn like a human brain but should also be able to explain and reason out its decisions just like humans.

Achieving this would be a big leap in the evolution of AI systems and this would enable humans to trust the AI systems better. XAI is a vast subject and one of the hottest topics of AI/ML research in both academia and industries.

Explainability of convolutional neural networks

With a curiosity to understand how the ML models learn, we ventured out to understand how a deep neural network functions.

Neural networks have long been known as “black box models” because of the inability to understand how they work due to the large number of interacting, non-linear parts.

  • Starting out with MNIST digit recognition data set, we built a simple CNN model and visualized how the CNN filters are trained and what feature does every filter identify.
  • Visualizing activation values in each fully connected layer, and identified the set of neurons that get activated for each digit.
  • Then we used Activation Maximization and Saliency Maps to explain which part of the input image is very critical for the model to classify the input right.

Python Deep Learning package Keras offers a number of built-in methods to help you visualize the models. These are already existing techniques and the code can be found in the Python notebook here.

Note: We continued our research on understanding CNNs through proprietary methods and code which cannot be shared here.

Takeaway for a data scientist

  • Although a Data Scientist usually fine tunes the existing ML algorithms to solve a business problem, not treating the ML model as a black-box and trying to understand the way it works would take one a long way in their career.
  • This project helped me demystify what is inside a CNN black box and how it works. And I intend to explore concepts like Occlusion Maps and Attention to understand ML models better.

Vinodhini S D