Explainable Artificial Intelligence (XAI) and its applications

Artificial intelligence has come a long way in the last decade. From simple image recognition to Sophia, AI is growing by leaps and bounds. AI is adopted by various organizations for different goals. The figure below shows some of such goals by percentage.

AI adoption has begun in critical areas like healthcare, legal and financial industries. These industries are very complex and have major impact on the life of population. These industries face a critical challenge while using AI — Understanding how AI makes a decision.

For a long time, decision making by AI has remained a black box. Many data scientists have been quoting “It is more of an art than science”. However, for a highly regulated industry like healthcare, transparency of decision making by AI is imperative. And this will be applicable for even other industries with the advent of regulatory mechanisms like GDPR.

Thus, to find out how these models make decisions, and make sure that this decisioning process is aligned with the ethical, legal and procedural requirements of the organization, one needs to improve the interpretability quotient of these models.

If they are interpretable, these models can explain the logic behind decision making like -

  • Medical diagnosis of a patient, or deducing correlations between various data points from patient’s medical history
  • If a financial AI model denies loan to an individual, it should be able to show the reasons for that decision
  • For using a model in evaluating criminal behavior, we need to make sure that model behaved in an equitable, honest and non-discriminatory manner
  • If a self-driving car behaves strangely, and if we cannot explain that why, are we going to use it?
  • While interpretability or explainability is usually mentioned in the context of models, it is actually applicable to the entire system, from features to logic, model parameters and to the model itself. There are five main qualities of an explainable AI:

    1. Transparency: The quality of AI to make user understand what drives the predictions, even if model is unknown or opaque. Key factors that drive the decision should be visible or derivable.
    2. Consistency: The explanations should be consistent across different executions of the same model.
    3. Generalizability: the explanation should be general and not devised separately for each model or model run
    4. Trust: Both model and explanation should match a human performance, even in case of making mistakes
    5. Fidelity: The explanation should represent what model did based on evaluation; it should not be a justification.
    The approach for deriving explainability may be different for different models. That is due to the inherent complexity and operations within these models.
    Below is the chart of complexity vs explainability of different models used by AI:

    Learning algorithms can be split into three families:

    1. Rule based e.g. Decision trees
    2. Factor based e.g. Logistic regression
    3. Case based e.g. KNNs
    A general form of linear regression model is:


    Here coefficients describe the change of the response triggered by one unit increase of the independent variable. We can find the effective increase in Y for 1 unit increase of X1 when value of W1 is known.
    Similarly, for a naïve Bayes algorithm assumes that features are independent of each other and they contribute independently to the output. It is easy to measure individual input of a feature and derive explanations.
    Decision trees are like large “if — else” paths and random forests are like majority based on multiple stacks of decision trees.
    All above models are relatively simpler to derive explanations.
    Ensembles and deep learning fall under the criteria where explainability is a real challenge.
    Approach to explaining a black box model is to extract information from a trained model to understand the prediction, without knowing how the model works. There may not be any hard requirement how these explanations are presented, however one should be able to answer, “Can I trust the model?”
    There are two different types of interpretations, Global and Local.
    Global interpretation is being able to explain the conditional interaction between dependent (response) variables and independent (predictor) variables based on a complete dataset.
    Local interpretation is being able to explain conditional interaction between dependent (response) variables and independent (predictor) variables with respect to a single prediction
    Below are some of the methods for local interpretation:

    Prediction Decomposition:

    Robnik-Sikonja and Kononenko proposed this method in 2008. It is the primary method that can be used to explain prediction outputs by measuring difference between the original prediction and the one made by omitting a set of features.
    Let’s assume there is a classification model represented as f: X (input) à Y (output). For the given dataset of inputs, x is a datapoint, x ∈ X, which has the individual value m for its attribute Mi, where i=1,2,……,m,…. and is labelled with class y ∈ Y.
    The prediction difference is calculated by computing the difference between model predicted probabilities with or without knowing Mi.

    probDiffi(y|x) = p(y|x) − p(y|x∖Mi)
    1. If target model does not output the probability score, it needs to re-computed.
    2. Since we calculate without Mi model should be capable to compute NULL or NAN values.
    3. The output must be prediction in form of probability

    Local Gradient Explanation Vector:

    [Ref. Baehrens et al 2010] This method explains the local decision taken by arbitrary non-linear classification algorithms, using the local gradients that how a data point moves to change its predicted label.
    Let’s assume a classifier trained on dataset X which outputs probability over the class labels Y. Local explanation vector is a derivative of the probability prediction function at a single point x. A large entry in this vector suggests a feature with big influence on the model decision.


    1. Again, the output must be in the form of probability
    2. If there is calibration applied to model output that will not be visible on explanation vectors and may cause deviation in explainability

    LIME (Local Interpretable Model Agnostic) Framework:

    LIME can approximate locally in the neighborhood of the prediction. It converts the dataset into interpretable data representations. E.g.

    • Image Classifier: create a binary vector indicating presence or absence of a contiguous patch of similar pixels.
    • Text Classifier: Create a binary vector indicating the presence or absence of a word

    The concept behind LIME is — It is easier to approximate a black-box model by a simple model locally.
    Examining if explanations makes sense, one can decide if the model is trustworthy.
    Example, using lime to identify words that have the highest impact on identifying if the question is “sincere” on Quora.

    SHAP (SHapley Additive exPlanation) Framework:

    In SHAP framework every feature used in the model is given a relative importance score called SHAP value. This score indicates how much that particular feature contributed to the decision of the model.
    In above example Age and Education-Num are the top two features.

    Feature’s importance is measured by calculating increase in the model’s prediction by perturbing the feature. If perturbing increases the model error, feature is important. If model error is unchanged feature is unimportant.
    Interpreting a model locally is supposed to be easier than interpreting the model globally, but harder to maintain (thinking about the curse of dimensionality). Methods described below aim to explain the behavior of a model as a whole. However, the global approach is unable to capture the fine-grained interpretation, such as a feature might be important in this region but not at all in another.

    Ask about pricing