Right to Transparency | Human Data Rights Coalition

The Transparency Problem

Most AI companies do not disclose the specific data sources used to train their models. When you use social media, write online, or create content, you often have no way of knowing if your contributions end up training AI systems.

What Transparency Means

Data Source Disclosure

AI companies should be required to disclose the categories of data used for training, including whether they scraped public websites, licensed data, or used user-generated content from specific platforms.

Individual Notification

Individuals should be notified when their specific content is used for AI training, with clear information about what data was used and for what purpose.

Data Provenance

AI outputs should include provenance information - citations back to the data sources that contributed to specific responses. This creates accountability and enables compensation.

Current Requirements

EU AI Act: Requires disclosure of training data for high-risk AI systems
GDPR: Mandates transparency about automated decision-making
Colorado Algorithmic Accountability Law: Requires notice and explanation for high-risk AI

What We Advocate For

Mandatory disclosure of AI training data sources
Individual notification when personal data is used
Cryptographic provenance tracking for AI outputs
Regular audits of AI training practices