If you were held accountable for the decision of a machine in contexts that have financial, safety, security, or personal ramifications to an individual, would you blindly trust its decision? It’s hard to imagine a person who would feel comfortable in blindly agreeing with a system’s decision in such highly consequential and ethical situations without a deep understanding of the decision-making rationale of the system. Given that lawyers have a natural aversion to risk, a key challenge in deploying Artificial Intelligence (AI) solutions at scale is their explainability, or lack thereof.
The sophistication of AI-powered systems has lately increased to such an extent that almost no human intervention is required for their design and deployment. When decisions derived from such systems ultimately affect humans’ lives (as in e.g. medicine, law, or defense), there is an emerging need for understanding how such decisions are furnished by AI methods. The last years have witnessed the rise of opaque decision systems such as Deep Neural Networks (DNNs). The empirical success of Deep Learning (DL) models such as DNNs stems from a combination of efficient learning algorithms and their huge parametric space. The latter space comprises hundreds of layers and millions of parameters, which makes DNNs be considered as complex black-box models.
Currently, there are three core notions of explainable AI that cut across research fields:
- Opaque systems that offer no insight into its algorithmic mechanisms.
A system where the mechanisms mapping inputs to outputs are invisible to the user. It can be seen as an oracle that makes predictions over an unput, without indicating how and why predictions are made. Opaque systems emerge, for instance, when closed-source AI is licensed by an organization, where the licensor does not want to reveal the workings of its proprietary AI. Similarly, systems relying on genuine “black box” approaches, for which inspection of the algorithm or implementation does not give insight into the system’s actual reasoning from inputs to corresponding outputs, are classified as opaque.
- Interpretable systems where users can mathematically analyze its algorithmic mechanisms.
A system where a user cannot only see, but also study and understand how inputs are mathematically mapped to outputs. This implies model transparency and requires a level of understanding of the technical details of the mapping. A regression model can be interpreted by comparing covariate weights to realize the relative importance of each feature to the mapping. SVMs and other linear classifiers are interpretable insofar as data classes are defined by their location relative to decision boundaries. But the action of deep neural networks, where input features may be automatically learned and transformed through non-linearities, is unlikely to be interpretable by most users.
- Comprehensible systems that emit symbols enabling user-driven explanations of how a conclusion is reached.
A comprehensible system emits symbols along with its output. These symbols (most often words, but also visualizations, etc.) allow the user to relate properties of the inputs to their output. The user is responsible for compiling and comprehending the symbols, relying on her own implicit form of knowledge and reasoning about them. This makes comprehensibility a graded notion, with the degree of a system’s comprehensibility corresponding to the relative ease or difficult of the compilation and comprehension. The required implicit form of knowledge on the side of the user is often an implicit cognitive “intuition” about how the input, the symbols, and the output relate to each other. Different users have difference tolerances in their comprehension: some may be willing to draw arbitrary relationships between objects while others would only be satisfied under a highly constrained set of assumptions.
The danger is on creating and using decisions that are not justifiable, legitimate, or that simply do not allow obtaining detailed explanations of their behavior. Explanations supporting the output of a model are crucial, e.g., in precision medicine, where experts require far more information from the model than a simple binary prediction for supporting their diagnosis. Other examples include autonomous vehicles in transportation, security, and finance, among others. To overcome the dangerous practice of blindly accepting an outcome, it is prudent for an AI to provide not only an output, but also a human understandable explanation that expresses the rationale of the machine. Analysts can turn to such explanations to evaluate if a decision is reached by rational arguments and does not incorporate reasoning steps conflicting with ethical or legal norms.
Often discussed alongside explainable AI are the external traits such systems should exhibit. These traits are seen as so important that some authors argue an AI system is not ‘explainable’ if it does not support them. Explainable AI should instill confidence and trust that the model operates accurately. Yet the perception of trust is moderated by a user’s internal bias for or against AI systems, and their past experiences with their use. Safety, ethicality, and fairness are traits that can only be evaluated by a user’s understanding of societal standards and by her ability to reason about emitted symbols or mathematical actions. Present day systems fortunately leave this reasoning to the user, keeping a person as a stopgap preventing unethical or unfair recommendations from being acted upon.
In addition to external traits, different approaches to AI present different types of explainability challenges. Symbolic approaches to AI use techniques based on logic and inference. These approaches seek to create human-like representations of problems and the use of logic to tackle them; expert systems, which work from datasets codifying human knowledge and practice to automate decision-making, are one example of such an approach. While symbolic AI in some sense lends itself to interpretation – it being possible to follow the steps or logic that led to an outcome – these approaches still encounter issues with explainability, with some level of abstraction often being required to make sense of large-scale reasoning.
Much of the recent excitement about advances in AI has come as a result of advances in statistical techniques. These approaches – including machine learning – often leverage vast amounts of data and complex algorithms to identify patterns and make predictions. This complexity, coupled with the statistical nature of the relationships between inputs that the system constructs, renders them difficult to understand, even for expert users, including the system developers. Reflecting the diversity of AI methods that fall within these two categories, there are many different explainable Ai techniques in development. These fall – broadly – into two groups:
- The development of AI methods that are inherently interpretable, meaning the complexity or design of the system is restricted in order to allow a human user to understand how it works.
- The use of a second approach that examines how the first ‘black box’ system works, to provide useful information. This includes, for example, methods that re-run the initial model with some inputs changed or that provide information about the importance of different input features.
In addition, truly explainable AI should integrate reasoning. Both interpretable and comprehensible models are lacking in their ability to formulate, for the user, a line of reasoning that explains the decision-making process of a model using human-understandable features of the input data. Reasoning is a critical step in formulating an explanation about why or how some event has occurred. Leaving explanation generation to human analysts can be dangerous since, depending on their background knowledge about the data and its domain, different explanation about why a model makes a decision may be deduced. Interpretive and comprehensible models thus enable explanation of decisions, but do not yield explanations themselves.
Where to next?
Stakeholder engagement is important in defining what form of explainability is useful. As AI technologies are applied at scale and in spheres of life where the consequences of decisions can have significant impacts, pressures to develop AI methods whose results can be understood by different communities of users will grow. Research in explainable AI is advancing, with a diversity of approaches emerging. The extent to which these approaches are useful will depend on the nature and type of explanation required.
Different types of explanations will be more or less useful for different groups of people developing, deploying, affected by, or regulating decisions or predictions from AI, and these will vary across application areas. It is unlikely that there would be one method or form of explanation that would work across these diverse user groups. Collaborations across research disciplines and with stakeholder groups affected by an AI system will be important in helping define what type of explanation is useful or necessary in a given context, and in designing systems to deliver these. This requires working across research and organizational boundaries to bring to the fore differing perspectives or expectations before a system is deployed.
The need for explainability must be considered in the context of the broader goals or intentions for the system, taking into account questions about privacy, accuracy of a system’s outputs, the security of a system and how it might be exploited by malicious users if its workings are well-known, and the extent to which making a system explainable might raise concerns about intellectual property or privacy. This is not a case of there being linear trade-offs – increased explainability leading to reduced accuracy, for example – but instead of designing a system that is suitable for the demands placed on it.