Publishers seek compensation and attribution from AI training data - AMEC survey

Publishers are facing a new threat of disintermediation and revenue loss from the use of their content to train AI systems.

Large language models (LLMs) such as Claude, Gemini and ChatGPT are only as good as their training data, which is often sourced from large chunks of the internet, Google Books and Wikipedia.

These models require regular updates with contemporary data to stay up-to-date. This has led data-rich organizations, including media publishers, to sell their content to train these models.

The Associated Press and the Financial Times are among the first to broker deals with OpenAI to license their content as training data for the ChatGPT model. However, more than 500 news publishers, including The New York Times, Reuters and The Washington Post, have installed a blocker to prevent their content from being collected and used as training data.

As the race to license content to LLMs heats up, two issues have emerged:

  • compensation for content used as training data

  • attribution to source material when content is returned within results

These same issues have been played out in the media monitoring market for more than 20 years. AMEC, the International Association for Measurement and Evaluation of Communication,  has launched a survey with the Press Database and Licensing Network (PDLN) and FIBEP to gather industry perspectives on these issues.

Copyright is a blind spot and liability for many organizations. The ethical challenge for brands and corporate communications is significant.
— Johna Burke, CEO, AMEC

Please take ten minutes to complete the short questionnaire to support the research. Your perspective will help AMEC’s work in lobbying on copyright issues and industry education.

Previous
Previous

Navigating global risks: opportunities for corporate communicators to drive positive change

Next
Next

Five trends shaping the future of the corporate communications function