![EuropeanCommissionsAICodeofPracticeandTra](/-/media/images/publications/2025/02/european-commissions-ai-code-of-practice-and-training-data-template/articleimage/europeancommissionsaicodeofpracticeandtra.jpeg?rev=fd039f49298549cba00c1ff0fbd5f2ab&la=en&h=800&w=1600&hash=9F9722C3CE263FF42119C039026AA4E4)
European Commission's AI Code of Practice and Training Data Summary Template
In Short
The Situation: The European Commission has released a new template for summarizing training data used in general-purpose artificial intelligence ("AI") models, as part of its broader AI regulatory framework.
The Result: Providers of general-purpose AI models must adhere to detailed transparency and copyright compliance requirements, balancing disclosure with the protection of trade secrets.
Looking Ahead: AI model providers should prepare to comply with these new requirements by documenting their training data sources and processing methods comprehensively and transparently.
On 17 January 2025, the European Commission unveiled a template for summarizing training data used in general-purpose AI models. This template is a key component of the AI Code of Practice, which aims to ensure transparency, trust, and compliance with copyright laws in the development and deployment of AI systems.
Providers of general-purpose AI models should take proactive steps to comply with these new requirements, aimed at ensuring that AI development and deployment are conducted in a manner that respects copyright laws while balancing the need for transparency with the protection of their trade secrets.
Legal Framework
The AI Act, which entered into force on 1 August 2024, mandates that providers of general-purpose AI models (i.e., artificial intelligence models that are designed to perform a wide range of tasks across various domains—including capabilities such as natural language processing, image recognition, and data analysis—often used as the basis for developing more specialized AI applications) must make publicly available a sufficiently detailed summary of the training data used. This requirement is outlined in Article 53(1)d) of the AI Act and is further elaborated in Recital 107, which emphasizes the need for a comprehensive yet non-technical summary to facilitate the enforcement of rights by legitimate parties, including rightsholders.
The Template's Structure and Key Provisions
The template for the summary of training data is designed to be simple, effective, and balanced, ensuring that it provides sufficient detail without compromising trade secrets. The key sections of the template include:
1. General Information:
- Model and provider identification, including the provider's name, contact details, and model identifier.
- Date of placement on the market and knowledge cutoff date.
- Overall training data size, modalities, and characteristics, such as the number of tokens for text data or the number of images for image data.
2. List of Data Sources:
- Publicly accessible datasets, including the overall size per modality and a list of main datasets.
- Private non-publicly accessible datasets of third parties, detailing data licensed by rightsholders and datasets acquired from other third parties.
- Data crawled and scraped from online sources, including the overall size per modality and identification of crawlers.
- User-sourced data collected by the provider, including the overall size per modality and a list of services/products.
- Self-sourced synthetic datasets, including the overall size per modality and the name of the AI model.
- Data acquired through other means, detailing the overall size per modality and the means of acquisition.
3. Relevant Data Processing Aspects:
- Measures implemented to respect copyright and related rights, including the identification and removal of content for which rights have been reserved.
- Removal of unwanted content, describing the content deemed unwanted and the measures taken to avoid or remove such content.
Balancing Transparency and Trade Secrets
The template aims to strike a balance between transparency and the protection of trade secrets. While it requires detailed disclosure of data sources and processing methods, it also takes into account the need to protect the competitive advantage of AI providers. For instance, the template does not require the disclosure of algorithms, model architecture, or specific data treatment processes.
Stakeholder Involvement and Feedback
The development of the template involved extensive consultation with stakeholders, including AI model providers, rightsholders, civil society organizations, and independent experts. The AI Office facilitated this process, ensuring that the template reflects the diverse perspectives and needs of all interested parties.
Implementation Timeline
The template and accompanying guidelines will be adopted by the Commission in the second quarter of 2025, with the general-purpose AI rules becoming effective on 2 August 2025.
Five Key Takeaways
- Detailed Documentation Required: Providers of general-purpose AI models must document and publicly disclose comprehensive summaries of their training data, including data sources and processing methods.
- Balancing Transparency and Trade Secrets: The template aims to ensure transparency while protecting trade secrets, avoiding the disclosure of sensitive information such as algorithms and specific data treatment processes.
- Stakeholder Involvement: The development of the template involved extensive consultation with various stakeholders, ensuring that the requirements reflect diverse perspectives and needs.
- Compliance with Copyright Laws: Providers must implement measures to respect copyright and related rights, including identifying and removing content for which rights have been reserved.
- Preparation for New Requirements: AI model providers should start preparing to comply with these new requirements by thoroughly documenting their training data sources and processing methods.