Corporate Oversight in the Age of Artificial Intelligence

Corporate oversight under Delaware law rests on the two bases for liability, each identified in In re Caremark International Inc. Derivative Litigation (“Caremark”) and reaffirmed in Stone v. Ritter. The first is a failure to implement any reporting or information system or controls, and the second is, having implemented such a system, a conscious failure to monitor or oversee it. In either case, liability requires a showing of bad faith – that is, a sustained or systematic failure to exercise oversight, or a conscious disregard of red flags signaling that the corporation was exposed to material risk. As the Delaware Supreme Court ruled seven years ago in Marchand v. Barnhill (“Marchand”), that inquiry is sharper when the risk at issue is critical to the corporation’s business.

Other decisions have reinforced both the rigor and the reach of this framework. In re Boeing Co. Derivative Litigation (“Boeing”) emphasized the board’s obligation to structure oversight around central compliance and safety risks. In re McDonald’s Corp. Stockholder Derivative Litigation (“McDonald’s”) extended oversight duties to officers within their domains, fueling debate over the trajectory – and possible expansion – of Caremark liability.

As I discuss in a new paper, that doctrinal framework now confronts a new structural reality.

Public companies increasingly rely on artificial intelligence (“AI”) systems to perform functions at the core of board-level oversight: compliance monitoring, anomaly detection, cybersecurity surveillance, financial screening, and the filtering of internal complaints. In many firms, algorithmic models shape not only management decisions but also the information that reaches the board.

This development is not simply another technological wave, comparable to the rise of the internet or enterprise software. Those innovations accelerated communication and digitized processes. AI, by contrast, mediates the production of oversight-relevant information itself. Algorithmic systems determine which risks are surfaced, prioritized, or left unseen.

Oversight is thus shifting from human-detectable red flags to algorithmically mediated black boxes.

The question is whether this transformation destabilizes Caremark – or instead stress-tests it.

In my paper, I argue that artificial intelligence does not alter the legal standard for Caremark liability. The doctrine remains one of loyalty, not care, of bad faith, not imperfect outcomes. AI does not transform Caremark – explicitly or implicitly – into a negligence regime, nor does it render directors or officers insurers of technological performance. What changes is the evidentiary terrain: how good faith is shown, documented, and assessed.

AI intersects with Caremark in three analytically distinct but structurally related ways.

AI and the First Basis: Designing Reporting and Information Systems

Caremark’s first basis for liability concerns the failure to implement a reasonable reporting or information system. Today, algorithmic systems increasingly form part of that architecture.

When boards rely on AI-driven monitoring, the relevant question is not whether it is flawless. AI systems are probabilistic by design; some degree of error is inevitable. Caremark does not require directors to understand the internal mechanics of machine-learning models. What it does require is a good-faith effort to design, validate, and supervise such systems in light of their opacity and limitations.

Reliance on experts, officers, or vendors under DGCL Section 141(e) must remain systematic and informed. Delaware law protects good-faith reliance on competent experts, but not blind reliance.. A board that adopts an AI system, delegates oversight entirely, and fails to revisit the reasonableness of that reliance – particularly when the system plays a central role in monitoring critical risks – may find that formal reliance offers little protection.

The inquiry is procedural rather than technological. Courts are not asked to evaluate the quality of an algorithm or to second-guess model architecture. They are asked whether directors made a sustained, good-faith effort to understand the role the system played within the firm’s process for monitoring risk, whether validation and periodic review mechanisms existed, and whether escalation pathways were meaningfully designed.

AI does not heighten the standard. It sharpens the clarity of process.

Red Flags in the Age of Algorithms

Caremark’s second basis for liability is a conscious failure to monitor or oversee operations once a reporting system is in place. Traditionally, this inquiry centers on red flags – signals indicating that the corporation faces material risk, particularly when such risk threatens a critical function of the business.

AI unsettles this framework by intervening at the earliest stage: the generation of the red flag itself.

In AI-mediated oversight systems, risk detection is frequently automated. Models determine which signals are highlighted and which remain invisible. When such a system fails to surface a risk, the absence of a red flag may reflect not inattention, but the model’s internal logic or blind spots.

Delaware law has never treated the mere failure to detect risk as sufficient for liability. A board cannot consciously disregard a warning that never materializes. To hold otherwise would collapse Caremark into negligence – precisely what the doctrine was designed to avoid.

Yet Caremark does require scrutiny of the conditions under which the absence of warning signs persists. In the AI context, this shifts attention from the missed signal to the monitoring system itself. When directors receive information suggesting that AI-driven monitoring is producing false negatives, suppressing anomalies, or failing to capture emerging risks, continued passivity may support an inference of conscious disregard.

A distinct category of warning signs thus emerges: signals that the monitoring infrastructure is malfunctioning.

Model drift, performance degradation, and bias amplification are salient examples. Declining alert rates may be as concerning as spikes in detected risk. Divergences between algorithmic outputs and external indicators – regulatory inquiries, enforcement actions, whistleblower complaints – may function as secondary red flags, signaling systemic oversight failure rather than isolated misconduct.

The critical inquiry remains one of knowledge and response. The relevant failure is not technological but fiduciary: a failure to reassess reliance when circumstances call it into question.

This logic mirrors traditional oversight cases. In Stone v. Ritter, liability turned not on the adoption of compliance systems, but on the failure to respond to repeated violations signaling their ineffectiveness. In Boeing, failures to call attention to serious risks – not the complexity of available information – prompted the inference of bad faith. The same structure applies in algorithmic settings.

AI as Mission-Critical Risk

The third basis for liability arises when AI itself becomes essential to a company’s operations.

Under Marchand and its progeny, when compliance or minimizing safety risks are critical, boards must implement and monitor reporting systems tailored to those risks. Failure to structure oversight around central risks may support an inference of bad faith.

As AI becomes embedded in compliance and safety systems and core operations, it may move from tool to infrastructure. Financial institutions rely on machine learning to detect money laundering and fraud; healthcare companies use algorithms to monitor regulatory compliance; technology companies deploy automated systems to comply with content moderation and data-protection obligations. In such contexts, AI does not merely assist compliance, it shapes whether compliance exists at all.

When algorithms oversee regulatory obligations broadly, failures may expose the corporation to systemic enforcement risk. In these settings, AI is not peripheral. It is critical.

Where AI plays that role, reliance must be systematic, informed, and periodically reassessed. Delegation does not eliminate oversight, it heightens the need for clarity about accountability, information flow, and escalation.

Process, Proof, and the Stress Test of Caremark

If AI does not alter Caremark’s normative standard, what does it change? It changes proof.

As oversight becomes increasingly the responsibility of opaque and evolving systems, good faith is demonstrated less through the absence of failure and more through governance design, documentation, and responsiveness. Board minutes reflecting substantive discussion of AI risks, periodic performance reviews, documented validation procedures, careful vendor selection and supervision, clearly assigned internal responsibility, and credible protocols for drawing attention to problems may become central to the evidentiary record.

Conversely, superficial adoption of AI tools without oversight, meaningful inquiry, or documented review may amplify the appearance of abdication.

In this sense, AI operates as a stress test for Caremark compliance By embedding oversight functions in probabilistic and partially opaque systems, AI increases the difficulty of demonstrating good faith while simultaneously generating richer records through which courts may assess whether fiduciaries engaged or disengaged.

The doctrine itself, however, is not altered.

Conclusion

Caremark continues to operate as a doctrine of loyalty, not care; of good faith, not outcomes; of process, not perfection. AI does not unsettle these foundations. What it does is test them.

AI places pressure on the informational premises of oversight by mediating how risks are detected, filtered, and escalated. It complicates the demonstration of good faith without lowering the threshold for liability. It demands more deliberate design, clearer escalation structures, and more careful documentation.

But the movement from red flags to black boxes does not require new fiduciary categories. It requires renewed attention to first principles. Artificial intelligence stress-tests Caremark, but the doctrine’s loyalty-based architecture remains capable of withstanding that pressure – provided boards adapt their governance practices so that their good-faith engagement with mission-critical risk remains visible in an algorithmic age.

Pierluigi Matera is a visiting professor of corporations at Boston University School of Law, a professor of comparative law at LCU of Rome, and co-managing partner at Libra Legal Partners, Rome. This post is based on his recent paper “From Red Flags to Black Boxes: Corporate Oversight in the Age of Artificial Intelligence,” available here.

The CLS Blue Sky Blog

Columbia Law School's Blog on Corporations and the Capital Markets