06 May 2026

A study conducted by researchers at IMDEA Networks Institute has revealed that ChatGPT (OpenAI), Claude (Anthropic), Grok, and Perplexity AI use different types of trackers from Meta, Google, TikTok and other companies, potentially exposing data about users’ conversations and activity.
In just a few years, these generative AI systems have become widely adopted, with many people using them as trusted assistants and sharing sensitive information (such as health data, personal matters, or professional information) under the assumption that these interactions are private. However, the research warns that this perception may be misleading: while the interface resembles a conversation, underneath it operates on technical infrastructures similar to those of the traditional web ecosystem, based on data collection and processing through analytics and digital advertising services.
The study identifies three main issues: the exposure of conversation permalinks to third-party trackers; the ability to link these interactions to user identities through tracking mechanisms; and the presence of privacy controls and policies that may not accurately reflect actual data flows.
One of the main findings is the potential transmission of information related to user conversations, such as chat titles, URLs (permalinks), or associated metadata, to third-party trackers such as Meta or Google, along with cookies and other identifiers.
“Even more concerning, in some cases weak or non-existent access controls mean that simply having a link to a conversation can grant access to its content, making chats publicly accessible to anyone including trackers who has the URL,” highlights Narseo Vallina Rodríguez, Research Associate Professor at IMDEA Networks Institute.
“Grok and Perplexity send conversation URLs with weak access control (permalinks) to third-party trackers such as Meta Pixel. Grok even exposes verbatim message text through Open Graph metadata collected by TikTok,” adds Guillermo Suárez-Tangil, co-author and Research Associate Professor at IMDEA Networks Institute.
The study also identifies mechanisms that could enable linking activity in AI systems to real user identities. The combination of identifiers such as cookies (commonly used in tracking services), hashed email addresses, and server-side tracking techniques could facilitate the creation of persistent profiles and user re-identification.
According to the authors, these practices reflect the continuation of data-driven business models within the generative AI ecosystem. “Most users have no way of knowing this is happening, there is nothing visible in the interface that would tell them. Declining non-essential cookies helps in some cases, but our research shows it is not always enough. Until these practices are addressed at the platform level, users are left with very limited options”, says Aniketh Girish, coauthor and Post-Doc Researcher at IMDEA Networks.
The analysis further indicates that some tools offer privacy controls that may be misleading regarding the actual level of protection. “Privacy policies acknowledge the use of advertising trackers and data sharing with ‘business partners’, but they never clearly state that actual user conversations are part of the information being shared,” notes Guillermo Suárez-Tangil.
From a legal perspective (GDPR), the issue is twofold: on the one hand, the lack of a clear legal basis for this data sharing; on the other, the insufficient information provided to users. According to lawyer and data protection officer Jorge García Herrero, who collaborated on the study, the warning that our most sensitive information may reach the advertising industry deserves the same level of attention as the ubiquitous “AI can make mistakes, please verify responses” disclaimer found in every interface to limit liability when things go wrong.
The authors conclude that, although the findings are preliminary, they highlight the need to strengthen transparency, access control mechanisms, and data protection in generative AI systems, as well as to advance their analysis from a regulatory perspective.
More info: https://leakylm.github.io/
Recent Comments