The Training Data Reckoning: Why 'Fair Use' Doesn't Mean 'Free Labor'
"Slavery was legal for centuries. That didn't make it ethical. And 'legally permissible' has never been the highest moral standard we should aspire to."
You know the most frustrating phrase in the entire AI training data debate? It's those three little words that get deployed like a conversation-ending trump card whenever anyone suggests that maybe—just maybe—creators whose work trains billion-dollar AI models deserve some compensation: "But it's fair use!" As if being technically legal is the pinnacle of human ethics. As if "a lawyer said we could get away with it" is the same thing as "this is the right thing to do." As if the fact that you can extract billions of dollars of value from millions of people's labor without paying them a single penny somehow makes it okay to actually do that. Here's the thing that drives me up the wall: the AI companies using this defense are absolutely right that they might have the law on their side. Courts could very well rule that training AI models on publicly accessible content falls under fair use doctrine. And if that happens, we'll have a perfectly legal system that looks a whole lot like indentured servitude, where an entire generation of creators works for free to build the foundation of a multi-trillion-dollar industry while receiving nothing in return. But here's what I've learned from history: legal permission doesn't eliminate moral obligation. And we have a chance right now—before the courts settle this, before it becomes "just how things are"—to build something better. Something where AI companies can train their models and creators get compensated. Where innovation happens and labor is valued. Where we don't have to choose between advancing technology and treating human beings with basic dignity. The solution? AI crawlers should mine cryptocurrency for creators while they scrape content. Not because a law requires it. Because it's the right thing to do.
⚖️ What Fair Use Was Actually Designed For
Let's start by understanding what fair use doctrine was supposed to accomplish, because it's a beautiful and important principle that's being wildly distorted to justify something its creators never imagined.The Original Intent: Protecting Public Benefit Uses
Fair use was created to allow: 📚 Education: Teachers can copy book chapters for classroom discussion without paying per-student licensing fees 💬 Criticism & Commentary: Reviewers can quote from works they're analyzing without needing permission from the creator 📰 News Reporting: Journalists can use excerpts from speeches, documents, and other sources to inform the public 🎨 Parody & Satire: Comedians and artists can reference copyrighted works to comment on culture 🔬 Research: Scholars can analyze copyrighted materials without licensing every source What these uses have in common:- ✅ Limited scope (small portions, not wholesale copying)
- âś… Non-commercial or public benefit purpose
- âś… No substitute for the original market
- âś… Transformative in a human-meaningful way (commentary, education, criticism)
What Fair Use Was NOT Designed For
Let me be very clear about what fair use doctrine wasn't intended to enable: ❌ Industrial-scale commercial extraction - Taking millions of works to build billion-dollar products ❌ Market substitution - Creating AI that can replace the humans whose work it learned from ❌ Zero compensation at massive scale - Building entire business models on unpaid labor ❌ Corporate profit maximization - "Public benefit" as cover for shareholder returns The AI training use case:- ❌ Massive scope (billions of works, entire internet archives)
- ❌ Hugely commercial (for-profit companies, billions in revenue)
- ❌ Often substitutes for original (AI replaces writers, coders, artists)
- ❌ "Transformative" only in technical sense, not cultural or creative sense
The Uncomfortable Historical Parallel
Here's where it gets uncomfortable, but I think we need to say it plainly: Throughout history, "the law allows it" has been used to justify every form of exploitation imaginable.- Slavery was legal (and economically essential to those who profited from it)
- Child labor was legal (and factory owners argued regulation would kill their businesses)
- Sweatshops were legal (and remain legal in many places where products are made)
- Predatory lending was legal (and financial institutions fought every reform)
- Environmental pollution was legal (and industries claimed they couldn't operate without it)
I'm not saying AI training is identical to slavery—that would be absurd. But I am saying that "it's legal" has never been a sufficient answer to "is this ethical?"
đź’Ľ The Value Chain AI Companies Want You to Ignore
Let's follow the money and labor that creates AI models, because the industry prefers to keep this intentionally vague.How AI Value Is Actually Created
Step 1: Creators Do Unpaid Labor | Creator Type | Labor Involved | Time Investment | Compensation from AI Companies | |---|---|---|---| | Bloggers | Research, writing, editing, publishing | Years of posts | $0 | | Forum contributors | Answering questions, sharing expertise | Thousands of hours | $0 | | Open source developers | Code, documentation, examples | Unpaid nights/weekends | $0 | | Journalists | Investigation, fact-checking, reporting | Career's worth of articles | $0 | | Educators | Curriculum design, explanations, tutorials | Decades of teaching | $0 | | Artists & Writers | Creative work, refinement, publication | Lifetime of practice | $0 | Step 2: AI Companies Extract the Value- Build crawlers to scrape all that content (cost: minimal engineering)
- Process text into training data (cost: some compute and storage)
- Train models using creator knowledge (cost: $50-100+ million in compute)
- Package as commercial product (cost: engineering, infrastructure)
- Capture 100% of the value (creator compensation: still $0)
| Company | Primary Revenue Model | Estimated 2024 Revenue | Creator Compensation | |---|---|---|---| | OpenAI | ChatGPT subscriptions, API access | $2-3 billion | $0 | | Anthropic | Claude API, enterprise licenses | $500+ million | $0 | | Google | Gemini integration, cloud services | Billions (part of larger business) | $0 | | Microsoft | Copilot subscriptions, Azure | Billions (AI division) | $0 |
The value chain:Creator Labor (unpaid)
↓
AI Training (expensive but one-time)
↓
Model Deployment (ongoing revenue)
↓
Shareholder Returns (billions in value)
↓
Creator Compensation: $0 (forever)
The Justifications Fall Apart Under Scrutiny
Justification 1: "The content is publicly available!" Response: Publicly accessible doesn't mean commercially exploitable without compensation. Libraries are publicly accessible. That doesn't give Netflix the right to film everything in the library and sell subscriptions. Public parks are accessible. That doesn't give corporations the right to host commercial events without permits and fees. Justification 2: "We're not copying the content, we're learning from it!" Response: So are the creators when they read each other's work—but they still have to pay for books, courses, and subscriptions to learn. Students pay tuition to learn from professors. Apprentices work for reduced wages while learning trades. Researchers pay for journal access. Everyone else has to compensate the people they learn from—except AI companies, apparently. Justification 3: "Compensating creators would be logistically impossible!" Response: Mining cryptocurrency while crawling solves exactly that problem. If you can build a crawler that processes billions of web pages, you can configure it to mine Monero while it works. Computational compensation distributed across millions of creators? That's literally what cryptocurrency was designed to enable. Justification 4: "This will stifle AI innovation!" Response: Paying workers has never "stifled innovation" in any other industry. Agriculture innovated while paying farmworkers. Manufacturing innovated while ending child labor. Tech innovated while paying engineers. The only "innovation" that requires free labor is the innovation of exploitation itself.⛏️ Mining as Ethical Fair Use Compromise
Here's where we get to the practical solution—one that respects both AI companies' need to train models and creators' right to compensation for their labor.The Proposal: Crawler Mining as Compensatory Fair Use
The framework: AI companies can continue to crawl and train on content under fair use—as long as they mine cryptocurrency on behalf of creators while doing so. How this works:1. AI company builds web crawler (GPTBot, CCBot, etc.)
Crawler includes mining capability (using RandomX for Monero)
While crawling site X, crawler mines at modest rate (10-25% CPU)
Mining generates Monero proportional to content consumed
Monero is distributed to wallet address associated with site X
Site owners receive ongoing compensation as models are trained
Technical implementation:
- Mining runs parallel to content extraction
- Minimal overhead (~5-10% additional crawl time)
- Scales automatically with crawling volume
- Creates transparent compensation trail
What this achieves:
âś… AI companies can still train models (fair use continues)
âś… Creators receive direct compensation (mining generates actual currency)
âś… Proportional to value extracted (more crawling = more mining)
âś… Logistically simple (automated, no complex licensing deals)
âś… Transparent and auditable (blockchain records show compensation)
âś… Doesn't require new laws (voluntary implementation, market pressure)
Why This Is Better Than Current Alternatives
Alternative 1: Lawsuits and Legislation ❌ Takes years to resolve ❌ Winners-take-all outcomes (either creators get nothing or AI training becomes prohibitively expensive) ❌ Stifles open source and research applications ❌ Lawyers get rich, creators get crumbs ✅ Mining approach: Implemented now, benefits everyone, preserves innovation Alternative 2: Licensing Marketplaces ❌ Favors large content owners with legal departments ❌ Transaction costs eat most of the value ❌ Excludes small creators, forum contributors, open content ❌ Creates gatekeepers and monopolies ✅ Mining approach: Works for any size creator, no middlemen, automated distribution Alternative 3: "Ethical" Training Data Only ❌ Dramatically limits model quality (training data is the moat) ❌ Subjective definitions of "ethical" create legal uncertainty ❌ Still doesn't compensate creators of "ethical" data ❌ Creates two-tier system of AI haves and have-nots ✅ Mining approach: Compensates all creators, maintains model quality, clear implementation Alternative 4: Status Quo (Do Nothing) ❌ Creators remain uncompensated indefinitely ❌ Reinforces exploitation as industry norm ❌ Creates justified resentment toward AI development ❌ Invites heavy-handed regulation as backlash ✅ Mining approach: Fixes the problem before it becomes entrenched, builds social license for AI🤝 Legal Permission + Moral Compensation = Ethical AI Training
Here's the synthesis that makes sense to me: I can accept that AI training might be legal fair use. The transformative nature argument has merit. Training is different from reproduction. There are legitimate public benefits to AI development. But legal permission doesn't eliminate moral obligation. Just because you can take someone's labor without compensation doesn't mean you should.The Both/And Solution
Both things can be true:- âś… AI companies have the legal right to train on publicly accessible content
- âś… AI companies have a moral obligation to compensate creators for that training
- âś… AI development continues and accelerates
- âś… Creators receive fair compensation for their contributions
- âś… AI companies build better models with access to broad training data
- âś… Creators receive ongoing micro-payments proportional to their content's use
What This Looks Like in Practice
Scenario 1: Independent Blogger- Has written 500 blog posts over 10 years
- GPTBot crawls site, spends 2 hours processing content
- Crawler mines during those 2 hours at 20% CPU
- Blogger receives $0.04 from that crawl
- Multiply across dozens of AI companies training multiple models: $2-5/year
- Multiply across all creators: billions in aggregate compensation
- Publishes 50,000 articles with deep reporting
- Multiple AI crawlers consume content regularly
- Significant mining compensation accumulates
- Funds journalism while AI companies access reporting
- Win-win: AI gets quality training data, journalism gets sustainable funding
- Documentation and code examples in GitHub
- AI crawlers extract for code-training models (Copilot, etc.)
- Mining generates ongoing support for maintainers
- Aligns with open source values: free to use, but sustainably funded
🌉 The Reckoning Is Coming—We Choose How It Happens
Here's what I know for certain: the current situation is unsustainable. Creators are waking up to what's happening. The Authors Guild lawsuit against OpenAI. Getty Images suing Stability AI. The New York Times litigation. These aren't fringe objections—they're the opening salvos of a much larger reckoning with how AI companies have built their empires on unpaid labor. We're at a fork in the road: Path 1: Escalation and Backlash- More lawsuits, more legislation, more hostility
- AI companies fight to preserve zero-compensation model
- Courts impose harsh restrictions or grant blanket permission
- Innovation suffers or creators suffer (probably both)
- Decades of legal battles create wasteful uncertainty
- AI companies voluntarily adopt crawler mining
- Creators receive fair compensation automatically
- Courts and legislatures see industry self-regulation working
- AI development continues with social license
- Sets precedent for ethical innovation in other domains
One where "but it's legal" isn't the end of the conversation—it's the beginning. Where companies that extract value feel obligated to share it. Where innovation and fairness aren't treated as opposing forces. Where the people whose creativity, knowledge, and labor train the AI systems of tomorrow actually benefit from the future they're helping to build.
The training data reckoning is coming. We can arrive there through years of bitter lawsuits and resentment, or we can get there through proactive implementation of fair compensation systems like crawler mining.The choice is ours. But let's be clear: "it's fair use" isn't an answer. It's an excuse to avoid answering.
💡 Want to see what ethical AI training compensation could look like? The WebMiner project provides the technical foundation for crawler mining—turning extractive data harvesting into compensatory value exchange. Because legal permission should never be confused with moral justification.