The AI access policy of Automattic, the owner of Tumblr and WordPress, has recently become the subject of a major controversy. On February 27, 2024, 404 Media reported that Automattic plans to sell Tumblr and WordPress.com user data to AI companies such as ChatGPT maker OpenAI and text-image generator Midjourney. The likelihood that these allegations are true is similar to Shutterstock's collaboration with OpenAI and has provoked outrage.
About the Agreement
The deal comes as publishers such as the Associated Press and social media platforms such as Reddit have entered into agreements to share their content with OpenAI and Google, respectively, and such deals are on the rise. More AI giants are aiming to develop large language models through such partnerships with content-rich sources without being subject to subsequent lawsuits for copyright infringement.
But in the 404 Media report, the Automattic deal became fodder for controversy due to several complications. This is because the initial platform data, allegedly compiled by a Tumblr product manager, included private posts, deleted blog content, and even third-party content from official partner blogs. This report raises user privacy and data security concerns. This calls for deeper scrutiny, focusing on the ethical and legal dimensions of AI technology and data use. In this context, Automattic's policy and the control options it provides to its users are important.
404 Media Report
Although 404 Media provided quotes from an internal source, it did not provide specific evidence to verify these quotes, such as screenshots or access to source materials. 404 Media also points out that while it refers to user content as "user data", this could be misconstrued as personally identifiable information (PII) or credit card details. However, the content discussed in the article is content that is already in the public domain.
404 Media is a news and content platform that specializes in technology and digital media. It analyzes technology companies, their products and policies, follows developments in the sector and produces news on these topics. It closely monitors and produces content about new initiatives, technology trends and digital media policies, especially on the internet. Such platforms generally monitor the activities, policies and strategies of technology companies and other digital media organizations to inform the public and analyze sectoral developments.
Automattic Answer
Automattic shared a post about its AI usage policy a few hours after 404 Media's article was published. The post explained Tumblr and WordPress' position on all users having their public content included in data shared with AI partners. While they have created a way to give AI partners quick access to content that users are open to sharing, they have also taken steps to remove access to content that they no longer want shared. In other words, AI companies were made accessible and manageable because the content in question was already in the public domain.
Automattic has also released a "new tool that allows you to decline content sharing with third parties from your public blogs". "We will withhold this content from AI companies with whom we can build productive relationships, including AI platforms that use it for educational models. We already block content aggregation from WordPress.com and will continue to do so, we are committed to ensuring that our partners respect these decisions."
WordPress.org Users Not Affected
Josepha Haden Chomphosy, Executive Director of WordPress, shared this with the community on Slack: "I can confirm that the WordPress project is not in the business of selling user data or content for AI training purposes. This is a consistent stance throughout the long history of WordPress, even as recently as 2023 when I was sharing my thoughts on the future of our project." Later, Jetpack stated that data from Jetpack-linked sites was not included. "This only applies to WordPress.com hosted sites."
As a result, Automattic's AI access policy has raised concerns about the processing and sharing of user data. 404 Media's report revealed the complexity of Automattic's policies and data security concerns. However, Automattic's response and the control options it offers to users suggest that the company is taking steps to address these concerns. This shows that ethical and legal issues related to tech companies' use of AI increasingly need to be considered.