Fighting Back: Publishers and Authors Take on Undesirable Crawlers

Crawlers undesirable: Publishers and authors are fighting back against AI bots

Artificial intelligence (AI) systems rely largely on training data to function properly, and this includes generative text AIs like OpenAI’s ChatGPT. These systems browse websites, absorb media texts, and learn from them to generate reformulated content within seconds. However, this has raised concerns among publishers and authors who feel their copyright and exploitation rights are being violated. In response, the industry is pushing for an extension of ancillary copyright to cover text mining by AIs.

While this demand contradicts the political trend at the European level — with the EU not wanting to slow down member states’ AI development — reform of the copyright directive in 2019 specifically restricts authors’ rights to promote AI “made in Europe.” Article 3 of the directive allows text mining for research purposes, while article 4 defines a “machine-readable” opt-out for commercial purposes.

Germany has implemented this directive through § 44b of its law on the adaptation of copyright to the requirements of the digital single market. This law creates legal certainty for commercial data analyses, allowing AI providers to access almost any text, image, and metadata from websites as long as they don’t store the content permanently. Despite this, publishers’ associations argue that such uses are not covered by the wording of the law. They fear that Google, Meta & Co. will create summaries of media articles with AI bots, gaining an unfair advantage without employing any journalists.

To protect their rights, publishers are looking for ways to reserve them in a “machine-readable” form. While the robots.txt file seems like the most obvious method, the Robots Exclusion Standard it’s based on doesn’t provide specific instructions against AI crawlers. The German government suggests that the reservation can also be contained in the website’s imprint or General Terms and Conditions. The law requires that the reservation be machine-readable, but it does not apply retrospectively.

The prevailing opinion among copyright advocates is that any textual declaration on a web page should be considered machine-readable. However, providers have no way of determining if AI operators comply with the rules, as it hasn’t been technically possible to identify AI crawlers. Moreover, collecting societies that collect royalties for rights holders also worry that the legal permission for commercial text and data mining doesn’t provide for any entitlement to remuneration.

As the topic moves further into the public spotlight, many authors and publishers have no idea that their offers can be read and used for AI training. Europe’s push to regulate AI has left some of those affected behind. While the industry wants an extension of ancillary copyright to cover text mining by AIs, publishers and authors want tougher action by the EU Commission, which could prohibit such bundling under competition law and the Digital Markets Act.

Leave a Reply