Explainable AI: Why Post-Hoc Rationalizations Are Not Reasoning

Explainable AI promises to make model decisions transparent. Regulators require explanations for automated decisions. Business stakeholders demand to understand why the model denied a loan or flagged a transaction. Explainability tools produce explanations that look authoritative and are not necessarily true.

The model computed a prediction by propagating activations through millions of parameters. The explanation system generates a story about which features mattered most. That story is a post-hoc rationalization constructed to be interpretable. It is not a transcript of the model’s computation.

Different explanation methods applied to the same prediction produce different explanations. The explanations are inconsistent with each other. They cannot all be correct. At least some are wrong. You do not know which.

Explainability provides the appearance of transparency without actual transparency. It satisfies regulatory checkboxes while obscuring that the explanation may not reflect why the model actually decided what it did.

Explanations Are Generated, Not Extracted

A neural network makes a prediction. You want to know why. You apply an explanation method like SHAP or LIME. It produces feature importance scores. The explanation says the model predicted X because feature A had high importance and feature B had negative importance.

This explanation is not extracted from the model. It is generated by a separate system that probes the model’s behavior and constructs a narrative. The narrative is one interpretation of behavior, not ground truth about internal computation.

SHAP measures how prediction changes when features are permuted or masked. It calculates Shapley values to estimate each feature’s contribution. This measures correlation between features and predictions. It does not reveal causal pathways through the network.

A feature can have high SHAP value because it directly affects the prediction or because it correlates with other features that affect the prediction. SHAP cannot distinguish these cases. It reports importance without mechanism.

LIME approximates the model locally with a simpler interpretable model. It samples inputs near the prediction point, gets model predictions for those samples, and fits a linear model. The linear model’s coefficients are the explanation.

LIME explains the linear approximation, not the original model. If the original model is highly nonlinear, the linear approximation is a poor representation. The explanation describes a simplified model that behaves differently from the real model.

Both methods generate plausible explanations. Neither guarantees the explanation matches the model’s actual computation. The explanation is a story told about behavior, not a description of mechanism.

Different Explanation Methods Produce Different Explanations

A model predicts a customer will churn. You apply SHAP. It says the most important features are low recent activity and high support ticket count. You apply LIME. It says the most important features are account age and payment history. You apply attention maps if it is a neural network. They highlight different features.

These explanations contradict each other. They cannot all be accurate descriptions of why the model made the prediction. At least two are wrong or misleading.

The explanations differ because they measure different things. SHAP measures global feature importance averaged across perturbations. LIME measures local importance in the vicinity of the prediction. Attention maps show which inputs the model weighted heavily in the final layer, not across all layers.

Each method makes assumptions about what “importance” means and how to measure it. These assumptions are different. The explanations reflect the assumptions of the explanation method as much as they reflect model behavior.

A business stakeholder asks why the model predicted churn. You provide three different explanations from three different methods. Which one is correct? You do not know. You cannot know without understanding the model’s computation at a level of detail that defeats the purpose of using explanation methods.

The practical response is to pick one explanation method and use it consistently. This does not solve the problem that the explanation might be wrong. It just means you are consistently using potentially wrong explanations.

Explanations Can Be Confidently Wrong

Explanation methods produce confidence scores or importance values that look authoritative. High importance for a feature suggests the feature strongly influenced the prediction. This can be false.

A credit scoring model denies a loan. SHAP assigns high importance to the applicant’s zip code. The explanation says the model denied the loan because of where the applicant lives. This looks like redlining and is illegal.

Investigation reveals the model actually learned that applicants from that zip code have inconsistent income reporting. The model denied the loan due to income inconsistency, not location. Zip code correlated with the real feature but was not causal. SHAP reported correlation as importance.

Or the model learned a spurious correlation. Images of horses are labeled with grass backgrounds because training photos were outdoor shots. The model learns to detect grass to classify horses. An explanation method highlights grass as important. The explanation is technically correct and substantively wrong. The model does not understand horses. It detects backgrounds.

Explanations do not know when they are wrong. They do not flag uncertainty. They present importance scores with precision that implies accuracy. The user cannot distinguish correct explanations from incorrect ones without external validation.

Explanations are persuasive because they match human intuitions about how decisions should be made. A feature that makes sense to humans is assigned high importance. The explanation feels right. It might be wrong, but it is plausible enough to be accepted.

Adversarial Examples Break Explanation Robustness

Adversarial examples are inputs crafted to fool models. Small perturbations cause misclassification. Adversarial attacks also fool explanation methods.

An attacker generates an input that causes the model to make a wrong prediction. The attacker also crafts the input so that the explanation looks reasonable. The model misclassifies. The explanation says the prediction is based on plausible features. The error is hidden.

A loan application is modified slightly to get approved. The modifications change features in ways that trigger model approval. The explanation says the loan was approved based on income and credit history, which look acceptable. In reality, the approval was triggered by exploiting a model vulnerability unrelated to creditworthiness.

Or an attacker wants to hide which features were exploited. They craft the adversarial example to produce explanations that highlight irrelevant features. The explanation distracts from the actual vulnerability. Auditors reviewing the explanation do not detect the manipulation.

Explanations assume the model is operating normally on typical inputs. Adversarial inputs are specifically designed to break that assumption. The explanation method does not know the input is adversarial. It generates a normal-looking explanation for abnormal behavior.

Robustness of explanations to adversarial manipulation is not well studied and not guaranteed by standard explanation methods. Explanations can be misleading on adversarial inputs in ways that are hard to detect.

Interpretability and Performance Are in Tension

Inherently interpretable models are simple enough that humans can understand their entire logic. Linear models. Decision trees. Rule lists. These models are transparent because their structure is simple.

Simple models have limited expressive power. Complex patterns require complex models. High performance on difficult tasks requires neural networks, ensembles, or other models that are not inherently interpretable.

The choice is performance or interpretability, rarely both. Vendors sell explainability as having both. You train a high-performance model and add an explanation layer. Performance is not sacrificed.

This is trading actual interpretability for post-hoc explanations. The high-performance model remains opaque. The explanation layer generates stories about it. You have performance and explanations. You do not have transparency.

Truly interpretable models limit performance. A decision tree with 10 rules is interpretable. It will not match the accuracy of a deep neural network on complex tasks. If you need genuine interpretability, accept performance limits. If you need performance, accept opacity.

Explanation methods promise to avoid this trade-off. They provide explanations without limiting model complexity. The cost is that explanations are approximations of uncertain accuracy, not transparency.

Regulatory requirements that demand both high performance and interpretability are asking for something that may not be achievable. Compliance is satisfied by providing explanations. Whether those explanations are accurate is a different question.

Explanations Satisfy Compliance Without Providing Accountability

Regulations like GDPR require that automated decisions be explainable. Organizations deploy XAI tools to generate explanations. Regulators review the explanations. Compliance is achieved.

The explanations may not be accurate. They satisfy the letter of the regulation without providing the accountability the regulation intended. The organization can explain decisions. Whether the explanations are true is not verified.

A model denies insurance claims. Regulators require explanations. The organization provides SHAP explanations for each denial. The explanations cite plausible reasons: claim history, policy terms, coverage limits. Regulators see explanations and consider the requirement satisfied.

The model actually learned to deny claims based on factors correlated with claim history but not directly related to risk. The explanations cite the correlated factors as if they were the real reasons. The model’s actual logic is not disclosed because the explanation method does not reveal it.

Compliance is a checkbox. Explanations fill the checkbox. Auditors lack the expertise to evaluate explanation accuracy. They verify that explanations exist and appear reasonable. The regulatory requirement is met without achieving transparency.

Accountability requires knowing why decisions were actually made, not having a plausible story. Explanations provide stories. Accountability remains elusive.

Explanations Do Not Tell You If the Model Is Right

An explanation says the model denied a loan because the applicant’s debt-to-income ratio is high. This explains the model’s logic. It does not tell you if the decision is correct.

Maybe high debt-to-income ratio is a valid reason to deny loans. Maybe the model learned a bias where that ratio matters more for some demographics than others. The explanation does not reveal bias.

Or the model’s prediction is correct but for the wrong reasons. A resume screening model rejects a candidate. The explanation cites lack of required experience. Investigation reveals the model actually learned to reject resumes with certain name patterns that correlate with gender. The stated explanation is plausible. The real reason is discriminatory.

Explanations describe model behavior. They do not validate whether that behavior is appropriate. A model can be explainable and biased, explainable and wrong, or explainable and illegal.

Validating model correctness requires domain expertise, fairness analysis, and testing on diverse populations. Explanations do not replace validation. They describe what the model did, not whether what it did was justified.

Explanation Complexity Scales with Model Complexity

Simple models have simple explanations. A linear model’s coefficients directly explain predictions. A small decision tree can be visualized and understood completely.

Complex models require complex explanations. A neural network with millions of parameters cannot be explained with a few feature importance scores. The explanation is a radical simplification.

Simplification loses information. The explanation highlights a few important features and ignores interactions, nonlinearities, and emergent behavior from deep layers. The simplified explanation is more interpretable than the model but less accurate as a description.

As models grow more complex, the gap between model behavior and explanation grows. Transformer models with billions of parameters are explained with attention maps that show which tokens were weighted. Attention is one mechanism among many. The explanation is incomplete.

You can make explanations more detailed to capture more of the model’s behavior. Detailed explanations become too complex to interpret. You return to the original problem: the model is too complex to understand.

Explanation methods are stuck between two failure modes: oversimplified explanations that are interpretable but inaccurate, or detailed explanations that are accurate but too complex to interpret. There is no resolution. Complexity cannot be explained away.

Local Explanations Do Not Generalize

LIME and similar methods generate local explanations. They explain a single prediction in the vicinity of that input. The explanation applies to that specific decision, not to the model’s overall behavior.

Two predictions from the same model can have completely different explanations. One loan denial is explained by low credit score. Another denial is explained by insufficient income. The model is not globally consistent. Local explanations do not reveal global logic.

A business stakeholder asks how the model makes decisions. You provide local explanations for a few examples. The stakeholder assumes those explanations apply generally. They do not. The model uses different logic in different parts of the feature space.

Global explanation methods attempt to describe overall behavior. They average or aggregate local explanations. The global explanation is a summary that loses detail. It describes typical behavior but not edge cases or regional variations.

Models are often evaluated on aggregate metrics like accuracy or AUC. Explanations are local. There is a mismatch between how models are evaluated (globally) and how they are explained (locally). Global metrics look good while local explanations reveal problematic logic in specific cases.

Explanations Are Expensive to Generate and Validate

Generating explanations for every prediction adds computational cost. SHAP requires multiple model evaluations with perturbed inputs. LIME requires training a local surrogate model. For high-throughput systems, explanation overhead is significant.

Production systems make millions of predictions. Generating explanations for all of them is not feasible. Explanations are generated selectively for audits or disputes. This means most predictions are not explained. Transparency is partial.

Validating explanation accuracy is harder than generating explanations. You need ground truth about what the model actually computed. For complex models, ground truth is not available. Validation is manual review by domain experts who check if explanations seem plausible.

Plausibility is not accuracy. An explanation that makes sense to a human might not reflect what the model did. Validation catches obviously wrong explanations. It does not guarantee the explanation is correct.

The cost of validation scales with the number of explanations. Organizations generate explanations for compliance and do not validate them because validation is expensive and does not affect the compliance checkbox.

When Explanations Are Actually Useful

Explanations are useful when the model is simple enough that the explanation is complete. A linear model explanation is the coefficients. A small decision tree explanation is the tree structure. These explanations are accurate because the model is interpretable.

Explanations are useful for debugging when you know the model is wrong and need hypotheses about why. Feature importance highlights which inputs are influential. If those inputs are unexpected, they suggest where to investigate training data or feature engineering.

Explanations are useful for communicating model behavior to stakeholders when approximate descriptions are sufficient. You do not need perfect accuracy. You need stakeholders to understand roughly what the model does. Local explanations provide that understanding.

Explanations are not useful for accountability when precise reasoning is required. If you need to justify why a specific decision was made for legal or ethical reasons, post-hoc explanations are not reliable enough.

Explanations are not useful for detecting bias or errors when the explanation method can miss what the model actually learned. Validation requires testing on diverse data and fairness analysis, not trusting explanations.

Explanations are not useful as a substitute for interpretable models when genuine transparency is required. If interpretability is a hard requirement, use interpretable models and accept performance limits.

The Accountability Gap

Regulations and ethics demand accountability for automated decisions. Explainable AI is positioned as providing that accountability. It does not.

Accountability requires knowing why a decision was made and being able to justify it. Explanations provide plausible stories about decisions. Plausible stories are not necessarily true stories.

The gap is that explainability provides narratives while accountability requires ground truth. Explanations satisfy the appearance of accountability without providing actual accountability.

An individual is denied a service by an automated system. They request an explanation. The system provides a SHAP explanation citing plausible factors. The individual cannot challenge the explanation’s accuracy because they do not have access to the model or alternative explanation methods.

The explanation serves as justification whether it is accurate or not. The accountability requirement is met procedurally. Substantively, the individual has no way to verify the decision was correct or the explanation was truthful.

Organizations deploy explainability to satisfy compliance and manage legal risk. This incentivizes generating explanations that sound good, not explanations that are accurate. The explanation becomes a legal artifact, not a transparency mechanism.

What You Actually Get with Explainable AI

Explainable AI tools generate feature importance scores, local approximations, or saliency maps. These outputs are useful for debugging, rough communication, and compliance checkboxes.

They are not ground truth about model reasoning. They are not reliable enough for high-stakes accountability. They are not validated for accuracy. They are not robust to adversarial manipulation.

What you get is a post-hoc rationalization system that makes complex models seem more transparent than they are. The rationalization is persuasive and may be wrong.

If you need actual interpretability, use inherently interpretable models. Accept that they will underperform complex models on difficult tasks. The performance gap is the cost of transparency.

If you need performance, accept that high-performance models are opaque. Explainability tools provide approximations, not transparency. Build safeguards that do not depend on trusting explanations: extensive testing, fairness audits, human oversight, and fallback mechanisms.

If regulations require explainability, understand that compliance is satisfied by providing explanations, not by providing accuracy. Generate explanations for compliance. Do not assume they are accurate. Validate critical decisions through other means.

Explainable AI is a tool with limits. It does not make black boxes transparent. It makes them slightly less opaque. The opacity remains. The accountability gap remains. Explanations are not reasoning. Rationalizations are not truth.

Stop treating explainability as a solved problem. It is a research area with fundamental limitations. Organizations deploying XAI should understand what they are getting: plausible stories of uncertain accuracy, not windows into model reasoning.

The model made a decision. The explanation tells you one possible story about why. Do not confuse the story with ground truth.