When a core product feature depends on an external AI model operated by another company, part of the product’s behavior and customer experience is effectively outsourced. That changes what the company can promise customers, what it can test before launch, and how it responds when the model changes or fails (wrong outputs, unsafe outputs, outages, or severe latency spikes). Boards of directors already know this supplier pattern. The difference with external AI models is speed and visibility: Behavior can shift quickly, and it is not always obvious what changed or why.
A January 2026 joint statement from Apple and Google describes a multi-year collaboration in which one company’s next-generation foundation models (general-purpose AI models) would be based on the supplier’s models and cloud technology, helping power AI features such as a more personalized voice assistant. The useful signal is the dependency pattern: A core feature can depend on a supplier’s model and update process. When the supplier updates the model or its controls, behavior can change even if the company ships no new code, and the facts needed to explain the change may be split across the two organizations
In this piece, “foundation model” means a general-purpose AI model trained on broad data so it can be adapted to many tasks and contexts. This is consistent with how the term is framed in the U.S. National Institute of Standards and Technology (NIST) Generative AI Profile. The practical point for boards is that these models can be updated by the supplier over time, and performance can shift as context changes.
This is not confined to consumer devices. In its November 2025 global survey, McKinsey reports that 88 percent of respondents say their organizations report regular AI use in at least one business function. The focus here is not “AI use” in general. It is reliance on supplier-hosted models in core workflows, where change control and the evidence needed to explain failures are shared.
In an earlier CLS Blue Sky Blog post on digital governance after the AI Act, I argued that boards should focus on accountability, reporting, and escalation thresholds. In this piece, I apply that architecture to suppliers of external models.
Boards can set a minimum standard and treat it as a condition of launching a product. The standard should cover four items, as defined below in the section titled Exhibit A: update control, a clear data boundary, evidence access, and joint incident handling that works under pressure.
What Changes When the Model Is External
In systems you control, you choose when changes go live. You decide what tests must pass. You can pin versions in testing so you know what you validated, and you can roll back if needed.
External model dependencies change that rhythm. A supplier can update the model or its safety controls, and the company may only see the downstream effect. The supplier can also make hosting changes that affect response times or availability. None of this implies bad intent. It is the reality of shared systems. If the company cannot get advance notice of test changes before they reach users, and canot roll back quickly when needed, the dependency should be treated as not yet controlled.
The data boundary becomes a board question: What data leaves your systems, where it goes, what is retained, and what is prohibited. An “external model” may run on a user’s device, in the company’s environment, or on the supplier’s systems, and those designs are not interchangeable. Boards should require a short boundary statement backed by technical controls, not just policy text
Evidence matters most under stress. When something goes wrong, leaders care about three time periods: time to detect, time to diagnose, and time to restore. External model dependencies make those time periods tighter because the records of what happened are split across two organizations. If the company cannot reconstruct what happened from records it can access, that is a control gap that should be closed before relying on a supplier-hosted model for a core feature or workflow.
A reasonable objection is that this is just vendor risk management, and there are already third-party reviews. That is true, but incomplete. Traditional reviews assume the buyer controls release cadence and can reproduce failures in its own environment. Those assumptions may not apply in the case of supplier-hosted models.
Exhibit A: One-Page Supplier Controls Checklist
If management cannot produce the following as a one-page summary, with supporting detail if needed, the dependence on an external supplier is not controlled at the level boards usually assume.
Boundary and data
- Where the model runs and what systems it touches
- What data is sent to the supplier (prompts/inputs, outputs, and any customer data), what is prohibited, and what is retained and for how long
Change control
- How model changes are announced (advance notice), and what counts as an emergency change (outage, security issue, or severe defect)
- Testing checkpoints and who signs off before change reaches customers
- Freeze triggers and expected timeframes to roll back
Evidence and response
- Log access scope, log retention period, and who can review and investigate
- Independent validation approach: testing access plus a plan to reproduce failures without relying only on supplier explanations
- Shared severity levels and response time commitments
- Shared severity levels, response time commitments, and a joint incident runbook with named escalation contacts.
Exit
- Exit plan assumptions and the simplest fallback mode if the supplier is unavailable
This is the minimum evidence needed to supervise a dependency that can change without shipping new software.
Documentation helps only when it is built for oversight rather than marketing. Model Cards for Model Reporting are a useful illustration of concise documentation that clarifies intended use and evaluation context. A checklist applies the same discipline to operational control points by forcing assertions to become evidence.
Escalation and Public Statements Without Legal Overreach
The practical point is escalation. If management cannot detect and explain what changed, internal decisions will run on partial information, and public statements risk being built on partial facts. The discipline is the same outside the United States: clear escalation contacts, fast fact-gathering, and evidence leadership can stand behind.
Cybersecurity provides a useful example. The U.S. Securities and Exchange Commission rules require disclosure of material cybersecurity incidents and annual governance disclosures, including board oversight and management’s role. The analogy is not regulatory scope. It is the discipline of knowing quickly what happened, who owns the call, and what evidence supports the decision.
Supply chain discipline makes this governable. NIST’s cybersecurity supply chain risk management guidancedescribes how to identify, assess, and mitigate supply chain risks through policies, plans, and risk assessments for products and services acquired through the supply chain.
Four Questions for the Next Board Meeting
These are the questions boards must ask:
- Where does the model run, and what data leave the company?
- Who controls updates, and what are the notice checkpoints and roll-back plan, including who can pause changes?
- What evidence is reviewed each month, including incidents, performance shifts, uptime, and access logs?
- What is the exit plan if the supplier relationship, performance, or constraints change?
A Simple Test
Within one business day of a significant supplier-driven change or incident, management should be able to answer three questions, with evidence: What changed, who was affected, and what controls worked or failed. If the organization cannot do that, the product must be treated as not yet launch ready.
A practical next step is to standardize minimum elements for the checklist so every critical model supplier is supervised against the same evidence standard. The U.S. National Telecommunications and Information Administration minimum elements for SBOM define a software bill of materials as a formal record containing the details and supply chain relationships of components used in building software. The content differs, but the point is the same: Standardize what you expect to see, and supervision becomes repeatable.
Kostakis Bouzoukas is a principal technology leader and founder of Breakthrough Pursuit, a platform focused on AI governance and digital trust. He writes in a personal capacity.
Sky Blog