- Why AI Data Rights Are the New IP Battleground
- Training Data Consent and Model Improvement Clauses
- Output Ownership: Who Owns What the AI Produces
- Data Residency and Sovereignty Requirements
- Third-Party Data Sharing and Sub-Processor Chains
- Change of Control and Insolvency Protections
- Non-Negotiable Clauses: A Reference List
- How Major AI Vendors Approach Data Rights by Default
Why AI Data Rights Are the New IP Battleground
When you deploy an enterprise AI system, you are not simply purchasing software. You are feeding your organisation's most sensitive assets — customer data, proprietary documents, internal communications, financial records, trade secrets — into systems controlled by third parties, operating under terms of service that most procurement teams don't read beyond the pricing section.
The consequences of inadequate data rights protections in AI contracts have moved from theoretical to concrete. In 2024 and 2025, we saw the first wave of enterprise disputes arising from AI vendor defaults on data handling: training data leakage across enterprise tenants, outputs incorporating proprietary customer methodologies surfacing in competitor contexts, and regulatory enforcement actions against enterprises whose AI vendors processed personal data in non-compliant jurisdictions.
The AI vendor community, to varying degrees, has responded with improved default terms for enterprise tiers. But "improved" is not the same as "adequate" — and defaults remain the floor, not the ceiling, of what is achievable in negotiation.
"Every AI contract we review contains at least one data rights provision that, if triggered, would create significant liability or competitive risk for the customer. Most of those provisions are removable in negotiation."
Training Data Consent and Model Improvement Clauses
The central data rights question in any AI vendor contract is whether the vendor can use your data — inputs, outputs, usage patterns, documents — to train or improve their models. This is frequently buried in "Service Improvement" or "Product Enhancement" language rather than being prominently flagged as a training data provision.
What Default Terms Typically Allow
Under standard terms (not enterprise-negotiated), most AI vendors reserve the right to use customer data for model improvement unless you explicitly opt out or upgrade to a paid enterprise tier. Even on enterprise tiers, the default opt-out is typically limited to direct training data use — it may not cover usage pattern analysis, aggregate telemetry, or the use of your prompting styles to improve prompt engineering.
What to Negotiate
Your contract should include a blanket restriction on the use of any customer data — including inputs, outputs, metadata, usage logs, and aggregated telemetry — for any AI training, model improvement, fine-tuning, evaluation, or benchmarking purpose, except with explicit written consent on a case-by-case basis. This should apply to the vendor's subprocessors and affiliated entities, not just the vendor itself.
A stronger position includes a warranty from the vendor that their foundation models were not trained on data sourced from enterprises without explicit consent — a provision that has significant implications for IP indemnification (addressed below).
Output Ownership: Who Owns What the AI Produces
Enterprise AI systems generate outputs — documents, code, analyses, recommendations, images, audio — that form the basis of commercial decisions and customer deliverables. The IP ownership question is more complex than it appears.
The Current Legal Landscape
Copyright law in most jurisdictions does not currently protect AI-generated content as such — copyright requires human authorship. The practical implication is that AI outputs may be unprotectable as original works, regardless of what your contract says about "ownership." What your contract can determine is who has rights to use those outputs commercially and whether the vendor retains any right to use your outputs for their purposes.
Key Contractual Provisions
| Provision | Minimum Acceptable Position | Preferred Position |
|---|---|---|
| Output usage rights | Customer has unlimited commercial use rights to outputs | Customer has exclusive rights; vendor has no use rights |
| Vendor use of outputs | Vendor cannot use outputs to train or improve models | Vendor cannot use outputs for any purpose without consent |
| IP indemnification | Vendor indemnifies customer against third-party IP claims arising from vendor-side training data | Uncapped indemnification with defence obligations |
| Output warranties | Vendor warrants outputs won't knowingly infringe third-party IP | Plus: accuracy/fitness representations with SLA remedies |
The IP indemnification clause deserves particular attention. If an AI vendor's foundation model was trained on copyrighted material without licence, outputs from that model may constitute copyright infringement — and without an adequate indemnification clause, your enterprise bears that risk. Several major AI vendors now provide indemnification for commercially deployed models, but the scope and caps vary significantly. See our guide on essential AI vendor contract clauses for the full analysis.
Data Residency and Sovereignty Requirements
Data sovereignty is where AI procurement intersects most directly with regulatory compliance. Enterprises subject to GDPR, the EU AI Act, sector-specific data localisation requirements (banking, healthcare, defence), or national security frameworks face a non-trivial challenge: most AI inference infrastructure is designed for global distribution, not geographic containment.
Processing Location vs Storage Location
A critical distinction many procurement teams miss: data residency provisions typically cover where data is stored, not necessarily where it is processed. For AI systems, the processing location is often the more sensitive consideration — inference (where the model runs on your prompts) may occur in a different jurisdiction from where your data is stored. Your contract must address both storage and processing locations explicitly.
What to Include in Data Residency Provisions
- Explicit named regions for both data storage and inference processing
- Prohibition on cross-border data transfer for your specific data categories without written consent
- Notification obligations if processing location changes (minimum 60-day notice)
- Annual audit rights to verify residency compliance, including the right to review subprocessor locations
- Termination for cause rights if residency commitments are breached without cure within 30 days
"Data residency for AI systems must cover inference location, not just storage. The model runs your data somewhere — and that 'somewhere' needs to be in your contract."
Third-Party Data Sharing and Sub-Processor Chains
Enterprise AI platforms routinely rely on sub-processor chains: cloud infrastructure providers, fine-tuning specialists, evaluation services, safety classifiers, and content moderation layers. Each of these sub-processors potentially touches your data. Your vendor contract governs the prime vendor's obligations — but sub-processor obligations are typically addressed only by reference to a "Sub-Processor List" that can change with minimal notice.
Minimum Sub-Processor Protections
Your contract should require: (1) prior written notice before adding sub-processors who will access your enterprise data; (2) your right to object to new sub-processors with a reasonable cure period; (3) contractual flow-down of your data rights protections to all sub-processors; and (4) vendor liability for sub-processor breaches as if they were vendor breaches.
Change of Control and Insolvency Protections
The AI vendor landscape is consolidating rapidly. Several significant AI companies that enterprises contracted with in 2023 have since been acquired, merged, or restructured. Without explicit change of control protections, your data rights obligations may survive a transaction while your operational protections don't.
Essential provisions include: termination for convenience triggered by a change of control event if the acquirer fails to assume equivalent data obligations within 60 days; data return or certified destruction within 30 days of contract termination; and step-in rights or escrow arrangements for critical data processing dependencies.
Non-Negotiable Clauses: A Reference List
Based on our AI procurement advisory engagements, the following provisions should be treated as non-negotiable in any enterprise AI contract involving sensitive or regulated data:
- Training data exclusion: Your data will not be used to train, fine-tune, or evaluate AI models without explicit written consent
- Output ownership: All outputs generated from your data and prompts are owned by your organisation
- IP indemnification: Vendor defends and indemnifies against IP claims arising from vendor-side model training
- Data residency: Storage and processing locations specified by name, including subprocessors
- Sub-processor controls: Prior notice and objection rights for changes to subprocessor chain
- Data return/destruction: Certified within 30 days of termination
- Audit rights: Annual right to audit data handling practices
- Change of control: Termination rights if acquirer fails to assume equivalent obligations
- Regulatory cooperation: Vendor cooperates with regulatory investigations involving your data
- Breach notification: 72-hour notification of any data breach involving your data
How Major AI Vendors Approach Data Rights by Default
Understanding vendor defaults helps you prioritise where to focus your negotiating effort:
| Vendor | Default Training Data Use | Enterprise Opt-Out Available | IP Indemnification |
|---|---|---|---|
| OpenAI (Enterprise) | Excluded by default for API/Enterprise | Yes — included in enterprise tier | Limited — Copyright Shield programme |
| Microsoft Copilot (Enterprise) | Excluded for M365 Copilot enterprise | Yes — Microsoft Purview controls | Yes — Copilot Copyright Commitment |
| Google Gemini (Enterprise) | Excluded for Workspace Enterprise | Yes — admin controls available | Limited — indemnification for certain services |
| Anthropic Claude (Enterprise) | Excluded by default in enterprise agreements | Yes — contractual guarantee | Available — scope varies by agreement |
| Amazon Bedrock | AWS doesn't use customer inputs to train AWS models | Built into service by default | Third-party model IP indemnification limited |
These defaults reflect enterprise tier positions and can change. Always verify current terms directly and supplement them with contractual language — defaults can change unilaterally through terms of service updates unless contractually locked. For a full pre-signature review framework, see our GenAI Procurement Checklist for Enterprise Buyers.