Privacy

Social media giants urged to tackle data-scraping privacy risks

Comment

Image of a person typing on a computer.
Image Credits: Basak Gurbuz Derman (opens in a new window) / Getty Images

A joint statement signed by regulators at a dozen international privacy watchdogs, including the U.K.’s ICO, Canada’s OPC and Hong Kong’s OPCPD, has urged mainstream social media platforms to protect users’ public posts from scraping — warning they face a legal responsibility to do so in most markets.

“In most jurisdictions, personal information that is ‘publicly available’, ‘publicly accessible’ or ‘of a public nature’ on the internet, is subject to data protection and privacy laws,” they write. “Individuals and companies that scrape such personal information are therefore responsible for ensuring that they comply with these and other applicable laws. However, social media companies and the operators of other websites that host publicly accessible personal information (SMCs and other websites) also have data protection obligations with respect to third-party scraping from their sites. These obligations will generally apply to personal information whether that information is publicly accessible or not. Mass data scraping of personal information can constitute a reportable data breach in many jurisdictions.”

The timing of the statement, which was also signed by privacy regulators in Australia, Switzerland, Norway, New Zealand, Colombia, Jersey, Morocco, Argentina and Mexico — who are all members of the Global Privacy Assembly’s international enforcement cooperation working group — coincides with the ongoing hype around generative AI models which typically require large amounts of data for training and could encourage more entities to scrape the Internet in a bid to acquire data-sets jump on the generative AI bandwagon.

High profile examples of such systems, such as OpenAI’s large language model ChatGPT, have relied (at least in part) upon data posted online for training their systems — and a class action lawsuit filed against the U.S. company in June, which CNN Business reported on, alleges it secretly scraped “massive amounts of personal data from the internet”.

Among the privacy risks the regulators highlight is the use of data scraping for targeted cyberattacks such as social engineering and phishing; identity fraud; and for the monitoring, profiling and surveilling of individuals, such as using data to populate facial recognition databases and provide unauthorised access to authorities — a clear swipe at Clearview AI, which has faced a number of enforcements from international regulators (including several across the EU) over its use of scraped data to power a facial recognition ID tool which it sold to law enforcement and other users.

They also warn scraped data can be used for unauthorised political or intelligence gathering purposes — including by foreign governments or intelligence agencies. And be used to pump out unwanted direct marketing or spam.

They don’t directly cite the training of AI models as one of these “key” privacy risks but generative AI tools which have been trained on people’s data without their knowledge or consent could be repurposed for a number of the malicious use cases they cite, including to impersonate people for targeted cyberattacks, identity fraud, or to monitor/surveil individuals.

As well as the statement being made public, the regulators note that a copy has been sent directly to YouTube’s parent company, Alphabet; TikTok’s parent ByteDance; Meta (owner of Instagram, Facebook and Threads); Microsoft (LinkedIn); Sina Corp (Weibo); and X (aka, the platform previously known as Twitter) — so mainstream global social media platforms are clearly front-and-center as the international watchdogs consider the privacy risks posed by data scraping.

Some platforms have of course already had major data scandals linked to data scraping — such as the 2018 Cambridge Analytica data misuse scandal which hit Facebook after a developer on its platform was able to extract data on millions of users without their knowledge or consent as a result of lax permissions the company applied; or the $275 million General Data Protection Regulation (GDPR) penalty Facebook was handed last year in relation to a data scraping incident that affected 530 million users as a result of insecure product design. (The latter incident is also subject to a lawsuit by an Irish digital rights group that’s challenging the DPA’s enforcement finding that there was no security breach.)

While the regulators’ joint statement contains a clear shot across the bows of mainstream social media site on the need to be proactive about protecting users’ information from scraping, there is no commensurately clear warning accompanying the message that failure to act and protect people’s data will result in enforcement action — which does risk diluting the statement’s impact somewhat.

Instead, the watchdogs urge platforms to “carefully consider the legality of different types of data scraping in the jurisdictions applicable to them and implement measures to protect against unlawful data scraping”.

“Techniques for scraping and extracting value from publicly accessible data are constantly emerging and evolving. Data security is a dynamic responsibility and vigilance is paramount,” they also write. “As no one safeguard will adequately protect against all potential privacy harms associated with data scraping, SMCs and other websites should implement multi-layered technical and procedural controls to mitigate the risks.”

Recommended measures to limit the risks of user data being scraped that are mentioned in the letter include having designated in-house team/roles focused on data scraping risks; ‘rate limiting’ the number of visits per hour or day by one account to other account profiles and limiting access if unusual activity is detected; and monitoring how quickly and aggressively a new account starts looking for other users and taking steps to respond to abnormal activity.

They also suggest platforms take steps to detect scrapers by identifying patterns in bot activity — such as having systems to spot suspicious IP address activity.

Taking steps to detect bots such as deploying CAPTCHAs and blocking IP address where data scraping activity is identified is another recommendation (albeit bots can solve CAPTCHAs so that piece of advice is already looking outdated).

Other recommended measures is for platforms to take appropriate legal action against scrapers, such as sending of ‘cease and desist’ letters; requiring the deletion of scraped information; obtaining confirmation of the deletion; and taking other legal action to enforce terms and conditions prohibiting data scraping.

Platforms may also have a requirement to notify affected individuals and privacy regulators under existing data breach laws, the watchdogs warn.

The social media giants who were sent a copy of the letter are being encouraged to respond with feedback within a month demonstrating how they will meet regulators’ expectations.

Individuals told ‘think long term’

The letter does also include some advice for individuals to take steps to help protect themselves against the risks of scraping — including suggesting web users pay attention to platforms’ privacy policies; think carefully about what they choose to share online; and make use of any settings that allow them to control the visibility of their posts.

“Ultimately, we encourage individuals to think long term,” they add. “How would a person feel years later, about the information that they share today? While SMCs and other websites may offer tools to delete or hide information, that same information can live forever on the web if it has been indexed or scraped, and onward shared.”

The letter also urges individuals who are concerned their data may have been scraped “unlawfully, or improperly” to contact the platform or website in question and if they do not get a satisfactory response it suggests they file a complaint with their relevant data protection authority. So the regulators are encouraging users to be more vigilant about scraping which could, ultimately, lead to an uptick in investigations and enforcements in this area.

The dozen international regulators signing the joint statement all hail from non-European Union markets. But, as noted above, EU data protection regulators are already active on data scraping risks through enforcements taken under the bloc’s GDPR.

They are also closely watching developments in generative AI services — so concerns raised in the letter look broadly aligned with issues already on the radar of the bloc’s data protection authorities.

Notably, Italy’s privacy watchdog slapped ChatGPT with a local stop-processing order earlier this year — which led to a brief break in service while OpenAI rushed out with disclosures and controls. Google’s Bard AI chatbot took longer to launch in the EU than in some other regions after its lead EU privacy regulator in Ireland raised similar concerns. But EU DPAs are simultaneously coordinating on how best to apply the local data protection rules to these novel AI chatbots, including vis-a-vis the crux issue of the lawfulness of the data processing used to train the models in light of the GDPR’s framework. So decisions on the core legality of tools like ChatGPT remains pending in the EU.

Earlier this year, France’s DPA, the CNIL, also warned that protection against data scraping will be a key plank of an AI action plan it announced in May.

France’s privacy watchdog eyes protection against data scraping in AI action plan

More TechCrunch

India’s Adani Group is plotting a move into e-commerce and digital payments, according to a Financial Times report, as the conglomerate seeks to diversify its portfolio and compete with Mukesh…

Adani to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Ledger, a French startup mostly known for its secure crypto hardware wallets, has started shipping new wallets nearly 18 months after announcing the latest Ledger Stax devices. The updated wallet…

Ledger starts shipping its high-end hardware crypto wallet

A data protection taskforce that’s spent over a year considering how the European Union’s data protection rulebook applies to OpenAI’s viral chatbot, ChatGPT, reported preliminary conclusions Friday. The top-line takeaway…

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

Here’s a shoutout to LatAm early-stage startup founders! We want YOU to apply for the Startup Battlefield 200 at TechCrunch Disrupt 2024. But you’d better hurry — time is running…

LatAm startups: Apply to Startup Battlefield 200

The countdown to early-bird savings for TechCrunch Disrupt, taking place October 28–30 in San Francisco, continues. You have just five days left to save up to $800 on the price…

5 days left to get your early-bird Disrupt passes

Venture investment into Spanish startups also held up quite well, with €2.2 billion raised across some 850 funding rounds.

Spanish startups reached €100 billion in aggregate value last year

Featured Article

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

James Khatiblou, the owner and CEO of Onyx Motorbikes, was watching his e-bike startup fall apart.  Onyx was being evicted from its warehouse in El Segundo, Los Angeles. The company’s unpaid bills were stacking up. His chief operating officer had abruptly resigned. A shipment of around 100 CTY2 dirt bikes from Chinese supplier Suzhou Jindao…

14 hours ago
Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Featured Article

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Iyo represents a third form factor in the push to deliver standalone generative AI devices: Bluetooth earbuds.

14 hours ago
Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Arati Prabhakar, profiled as part of TechCrunch’s Women in AI series, is director of the White House Office of Science and Technology Policy.

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

AniML, the French startup behind a new 3D capture app called Doly, wants to create the PhotoRoom of product videos, sort of. If you’re selling sneakers on an online marketplace…

Doly lets you generate 3D product videos from your iPhone

Elon Musk’s AI startup, xAI, has raised $6 billion in a new funding round, it said today, as Musk shores up capital to aggressively compete with rivals including OpenAI, Microsoft,…

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Indian startup Zypp Electric plans to use fresh investment from Japanese oil and energy conglomerate ENEOS to take its EV rental service into Southeast Asia early next year, TechCrunch has…

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

Last month, one of the Bay Area’s better-known early-stage venture capital firms, Uncork Capital, marked its 20th anniversary with a party in a renovated church in San Francisco’s SoMa neighborhood,…

A venture capital firm looks back on changing norms, from board seats to backing rival startups

The families of victims of the shooting at Robb Elementary School in Uvalde, Texas are suing Activision and Meta, as well as gun manufacturer Daniel Defense. The families bringing the…

Families of Uvalde shooting victims sue Activision and Meta

Like most Silicon Valley VCs, what Garry Tan sees is opportunities for new, huge, lucrative businesses.

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

Everything in society can feel geared toward optimization – whether that’s standardized testing or artificial intelligence algorithms. We’re taught to know what outcome you want to achieve, and find the…

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Miriam Vogel, profiled as part of TechCrunch’s Women in AI series, is the CEO of the nonprofit responsible AI advocacy organization EqualAI.

Women in AI: Miriam Vogel stresses the need for responsible AI

Google has been taking heat for some of the inaccurate, funny, and downright weird answers that it’s been providing via AI Overviews in search. AI Overviews are the AI-generated search…

What are Google’s AI Overviews good for?

When it comes to the world of venture-backed startups, some issues are universal, and some are very dependent on where the startups and its backers are located. It’s something we…

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. OpenAI announced this week that…

Scarlett Johansson brought receipts to the OpenAI controversy

Accurate weather forecasts are critical to industries like agriculture, and they’re also important to help prevent and mitigate harm from inclement weather events or natural disasters. But getting forecasts right…

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

pcTattletale’s website was briefly defaced and contained links containing files from the spyware maker’s servers, before going offline.

Spyware app pcTattletale was hacked and its website defaced

Featured Article

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Synapse’s bankruptcy shows just how treacherous things are for the often-interdependent fintech world when one key player hits trouble. 

3 days ago
Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Sarah Myers West, profiled as part of TechCrunch’s Women in AI series, is managing director at the AI Now institute.

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI and publishers are partners of convenience

Evan, a high school sophomore from Houston, was stuck on a calculus problem. He pulled up Answer AI on his iPhone, snapped a photo of the problem from his Advanced…

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Well,…

Startups Weekly: Drama at Techstars. Drama in AI. Drama everywhere.

Last year’s investor dreams of a strong 2024 IPO pipeline have faded, if not fully disappeared, as we approach the halfway point of the year. 2024 delivered four venture-backed tech…

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Federal safety regulators have discovered nine more incidents that raise questions about the safety of Waymo’s self-driving vehicles operating in Phoenix and San Francisco.  The National Highway Traffic Safety Administration…

Feds add nine more incidents to Waymo robotaxi investigation

Terra One’s pitch deck has a few wins, but also a few misses. Here’s how to fix that.

Pitch Deck Teardown: Terra One’s $7.5M Seed deck