Who Owns Model Risk in an AI World?

By Reginald R. Goeke

Complicated computerized models and quantitative analyses are a fundamental mainstay in the financial services industry, from quantitative investment asset managers who use models to manage investment portfolios, to banks who use models to underwrite loans or monitor for money laundering or other behavior. With the benefits of those models comes several forms of risk, generally lumped together as “model risk.”

Model risk generally refers to the potential for adverse consequences resulting from actions taken or decisions made based on incorrect or misused models or model outputs, and it includes risks related to errors in the quantification, coding or calculation process, use of improper or inaccurate data or other inputs, incorrect or inaccurate model design, or misuse or misapplication of models or model outputs. (The definition of a model “error” of “defect” is itself a subject of substantial debate, and often depends on the purpose and context for using the model. As noted in the article, whether a design decision rises to the category of “defect” will likely depend on the context of the use of the model, the model limitations disclosed to users, and the language of any agreement between the parties.)

The risk of such model errors is not theoretical. Over the past several years model errors have led to Securities and Exchange Commission enforcement actions, litigation and adverse headlines. For example, the SEC disciplined a quantitative investment adviser where an error in the computer code of the quantitative investment model eliminated one of the risk controls in the model, and where that error was concealed from advisory clients.

Similarly, where a robo-adviser advertised that its algorithms would monitor for wash sales but failed to accurately do so in 31 percent of the accounts so enrolled, the SEC found that the adviser had made false statements to its clients. Mortgage lenders have been accused of incorrectly denying loan modifications due to computer errors, and banks have suffered anti-money laundering compliance failures due to coding errors. As banks, asset managers and other financial services firms begin to deploy artificial intelligence or machine learning—whether in credit risk scoring, fraud detection, robo-advisory services, algorithmic trading, insurance underwriting or other areas—the potential model risks and related consequences increase.

Based on guidance from the Federal Reserve, the FDIC and other regulators, financial service firms have generally developed tools to identify, measure and manage those model risks. But that guidance predates the AI renaissance, and with the advance of big data, artificial intelligence and machine learning, potential model risks increase, and the controls needed to manage those risks and comply with regulatory and contractual obligations deserve additional attention.

For example, pursuant to the Federal Reserve’s Guidance for Model Risk Management, the guiding principle of model risk management is effective challenge to the model, which requires critical analysis by objective, and informed parties who can identify model limitations and implement appropriate changes. Such effective challenge would include (among many other items) testing the theory and logic underlying the model design, validating the model as well as integrity of data it uses, testing the performance of the model over a range of inputs, and implementing a governance model that permits independent review and assessment.

But in an AI world, when models work by identifying patterns in large data sets and making decisions based on those patterns, replication of the model’s output (let alone reviewing performance across a range of inputs) becomes far more difficult. Further, when AI models apply machine learning to very large data sets, often from multiple sources, validating the integrity of such data becomes exponentially more challenging. And where model output may be generated in a black box based on the application of artificial intelligence, the ability of independent reviewers to effectively challenge any output becomes substantially more limited.

From a risk management and liability perspective, the questions that financial services firms should consider include, among others: How will a court determine (1) whether there were any defects in the model design, input or output; (2) whether any defect caused any adverse decision; (3) which party—among the model developer (or licensor), model user (or licensee), or the financial institution’s customer—assumed the risk of the error or defect; and (4) the amount of any damages? These are the questions that courts and participants in the financial services industry will face in the coming years.

Is there a defect in the model?

When a bank or asset manager uses AI or machine learning and an adverse result arises—such as the poor performance of a loan or investment portfolio—the first question is whether the model was flawed in the first instance. Like human decision-makers, model-driven decisions may out-perform or under-perform relative to a benchmark and yet still be operating exactly as intended. In some instances, model defects may be objectively verifiable—such as the reference to incorrect cells or output in excel files, use of incorrect variables or the mis-specification of units. In other instances, particularly in the context of AI models, defects may be caused by a misinterpretation of underlying data, or reliance on coincidental correlations without causal connection, which may be much more difficult to detect. In still other instances, a model developer may make certain simplifying assumptions (e.g., disregarding data in a population set identified with ages over 120) that may impact on the model’s performance. Such simplifying assumptions are a core part of “modeling” reality, and whether such assumptions cross a line into a “defect” or “error” may depend significantly on the representations made about the model and the context in which the model is intended to be used.

Given the challenges of explaining why any AI-driven decision was made, liability may often turn on the applicable standard of care (e.g., strict liability, negligence, etc.), the regulatory obligations of the model user (licensee), the types of representations made about the model, the known or foreseeable contexts in which the model may be used, and who (as between the plaintiff and defendant) bears the burden of proof. For example, an entity that touts that its models will monitor for wash sales but fails to fulfill that promise, may incur liability for the model’s failure regardless of the source of any model defect.

A murkier issue may arise where a model developer markets its model as being able to reduce credit-related losses from portfolios approved using the model—but does not disclose that the model was tested using only populations from a certain geography or age. In that instance, if a financial institution using the model suffers substantial losses due to underperformance of the model with respect to populations for which the model was not tested, there will likely be substantial dispute as to whether the failure to test those populations constituted an error.

Did the defect cause the adverse outcome?

Assuming that a defect or error in a model can be demonstrated, it may still be an open question whether the defect actually affected a model’s output. Many models (whether AI or not) will rely on multiple factors and rule sets. Even if an error existed in one part of a model, other portions may have corrected for the error, or may have led to the same result regardless of the error. To test for this, it may be possible to re-run a corrected version of the model with the same inputs, and thereby determine whether the error impacted on the model’s output. In the context of AI models, though, which may use machine learning to detect patterns in millions of data points (e.g., credit application data, or asset management decisions), simply re-running the model with the same inputs may result in different outputs based on different machine learnings.

Thus, it becomes much more difficult to demonstrate whether or how any error affected model output. Although proof of causation is typically a plaintiff’s burden, once a defect is demonstrated, some courts may implicitly shift the burden to the defendant to demonstrate that the defect did not have an adverse impact. In that event, an inability to explain (and show documentation of) the methodology and maintenance of the model (e.g., intended use, assumptions, theories, validations and testing, controls, versions) may limit an effective defense.
Who bears the risk of any model defect?

Even if a defect in a model caused an adverse outcome, potential legal claims will turn on which party assumed the risk of the model defect. This may turn on various tort, contract and similar legal principals, and depend on the relationships between the model developer/licensor (e.g., the party that develops and builds the model), the model user/licensee (e.g., the party that uses the model to make lending, investment, or other decisions), any customer of the user/licensee (such as a loan applicant), and any advisory client that invests in portfolios created by or managed with AI-enabled investment models. For example, where a credit card company uses an AI tool to build a better portfolio of loans, if there is a defect in the model that results in rejection of borrower applications, or that results in a pool of loans that underperforms expectations, who amongst the various entities will bear the risk for those decisions?

Model developer versus model user. The liability as between a model developer and a model user is typically governed by the terms of an agreement, including representations, warranties and indemnification provisions. Some such agreements may be “as is” agreements, where warranty or indemnification obligations are disclaimed by the developer. In other instances, the model user may negotiate that the developer retains liability for its negligence or gross negligence. In that case, indemnification/warranty claims may turn on whether the developer/licensor applied industry-standard model controls (such as those outlined in the Federal Reserve’s SR 11-7 Guidance), and the developer will need to be able to document its adherence to those controls. Further, liability may turn on the extent to which the model developer could reasonably foresee that the model would be used with certain populations or to make certain decisions. In many cases, liability allocation is likely to be heavily negotiated, subject to specific limited representations about model performance, and potentially subject to user representations about the use, testing and maintenance of the model.
Model user versus affected applicant. Where a third-party customer (e.g., potential borrower) is denied credit based on the results of a potentially errant AI model, liability of the model user will likely turn on the user’s compliance with various lending statutes, including ECOA, the Fair Housing Act, FCRA, TILA and applicable regulatory loan origination and review requirements. Those requirements are beyond the scope of this article, but model users should conduct sufficient due diligence and testing with respect to any AI tool to understand and minimize the potential risks associated with use of the model, and should ensure that the model developer remains available to explain the model’s performance to applicable regulators.
Model user versus advisory client. In connection with investment portfolios constructed using an AI model, the contractual liability of the model user may turn on the extent to which model risk was disclosed to advisory clients and the extent to which the model user implemented model risk controls consistent with industry standards. As noted above, however, to the extent that AI models limit the effectiveness of traditional control processes (such as the ability to verify data quality, test model accuracy or challenge model output), model owners may be challenged to demonstrate compliance with standards that typically apply to model risk governance.

How can damages from AI model defects be quantified?

Assuming liability can be established, quantification of any damages still remains a challenge because a court would have to determine how the model would have performed absent any error or defect. For example, if an AI model has allocated assets improperly or created a loan portfolio with too much risk (based on the stated, intended purpose and usage of the model during the development stage), courts must first identify a relevant benchmark to determine how a portfolio might have performed absent any model error or defect.

For some models, it may be possible to correct the algorithm or coding and re-construct the portfolio absent the error. But where AI models are used to construct portfolios, and investment decisions depend in part of the assets already held by the portfolio—such as robo-adviser platforms—the iterative nature of the AI decision-making may make it difficult or impossible to re-estimate outcomes that would have existed but for the error. In a litigation context, plaintiffs may be given great latitude to argue about what actions might have been made or what outcomes might have occurred but for the error, with plaintiffs invariably seeking to apply a damage calculation methodology that results in the greatest amount of damages.

Potential actions for model developers and users. Given the additional complexities that AI models introduce for model developers and model users—including the “explainability” issues associated with AI models and the magnitude of data evaluated—those entities should consider steps to mitigate the liability risks. A few points of guidance emerge.

Model developers

Curate your data. Model developers should employ appropriate data curation controls. The adage of “garbage-in, garbage-out” is particularly applicable where the operations within an AI black box are difficult to evaluate. Developing a deep understanding of the sources of the data, triangulating the data with other available sources, and evaluating the data for potential bias are critical steps for developers to both take—and to document. In conducting this step, it is important that developers coordinate with legal and compliance, who understand the risks to be addressed and can help ensure that solutions are in a format that will be helpful when litigation ensues.
Improve visibility into model design. Companies developing AI models should work with their model programmers to enhance the ability of reviewers to test and validate models. This includes additional documentation regarding: the learning methods programmed into a model; the use of intermediate outputs that may help identify the data sets and decisions principally driving model outputs; and improved documentation of the quality assurance steps taken during model development and thereafter. Again, input from legal and compliance can help ensure that documentation is at a level that will be helpful in any future disputes.
Improve contracting steps. Model developers/licensors and their counsel should clearly define the allocation of risk. Where possible, model developers may specify that agreements with model users/licensees expressly provide for the model in “as is” condition, and disclaim any implied warranties or indemnifications. Model developers should also be clear with licensees about any known limitations in models or data sources used to train those models.

Model users/licensees

Implement meaningful quality control procedures. Model users/licensees acquiring AI models from third parties should implement meaningful quality control and due diligence procedures in the acquisition process. This would include a review of the data sources and the testing procedures used by model developers. Such diligence should inform the user’s adoption of limits on the use of the model (e.g., using the model only to make decisions for populations similar to those from which the model was developed and tested). Such diligence should be coordinated with compliance and legal functions and documented for use in any future disputes.
Develop and employ effective model governance processes. The model users/licensees should adopt model governance policies and procedures to monitor the use of the model, and periodically confirm that the model’s uses are consistent with the model’s capabilities. Such governance models should include input by both technical staff and customer facing staff familiar with the ways in which the tool is being deployed and marketed. It should also include documented change-control processes, to be approved by all relevant stakeholders. Legal and compliance should ensure that disclosures and marketing materials are consistent with the capabilities of the model.
Include human input if feasible. Model users/licensees, where possible, should consider using models more for assistive intelligence, rather than as a pure decision-making tool. This would require employing personnel who can interpret the model outputs and, as necessary, apply their own judgment in making final decisions. Doing so can help ensure that questionable model decisions are identified earlier in the process and can provide an additional check to model decisions. Depending on the user’s business model, human involvement in each model decision may not be realistic; but even in those cases periodic audits of model decisions can provide additional controls to the process.
Ensure accurate disclosures. Model users/licensees should consider appropriate disclosures to customers, investors, and clients (including any individuals voluntarily using the AI-driven process) regarding the model’s risks and limitations. Those disclosures should be reviewed both by compliance and legal functions, and also by the IT users of the model who are most familiar with the model’s capabilities. Such disclosure may not eliminate liability, but where investors have the opportunity to make informed decisions after disclosure of the risks, the model user can more readily demonstrate that the investor assumed the risk of any error or defect in the model.

Reginald Goeke is a litigation partner at Mayer Brown, the co-leader of Mayer Brown’s commercial litigation group and a co-leader of the Washington, D.C. litigation group. Goeke acknowledges the input of several of his Mayer Brown colleagues in this article, including David Beam, Leslie Cruz, Alex Lakatos and Brad Peterson.