Social media and AI model training

Considerations to manage your data footprint

"We shape our tools and thereafter they shape us"*

Amidst the tumultuous social and political shifts of the 1960s and 1970s philosopher and communications theorist Marshall McLuhan was one of the first to identify implications of the digital media we interact with every day – such as radio and television at the time. He argued that these mediums do more than simply transmitting content; they “massage” our senses and thinking, altering how we experience and interpret the world and reality.

We create technologies, but once created they can influence our behaviours, thought patterns and social structures in ways we don’t anticipate.

McLuhan foresaw digital media’s transformative effects on society, coining the term “global village” decades before the internet and social media normalised the boundless networks and online platforms that are central to modern life.

Now our online lives generate data that is used to develop and deploy a new wave of technologies and digital media. Our data is shaping tools that shape us in many ways we are unaware, although there are some ways to mediate the extent.

Tina

*Often attributed to Marshall McLuhan, this quote likely originated from Father John Culkin, a fellow Professor and interpreter of McLuhan's work in a 1967 article in The Saturday Review, "A Schoolman's Guide to Marshall McLuhan".

Social media and AI model training

Social media platforms have become a valuable resource for technology companies to source data for artificial intelligence development.

Unless you have taken measures to stop or restrict the collection of your data, doing anything online, whether it be browsing websites, using Apps, maps or purchasing leaves a digital footprint.

Digital footprints are the sum of data derived from digitally traceable behaviour and the online presence of an individual. In terms of the impacts of one’s digital footprint, what is done online matters, but also what is online about a person — data collected with or without knowledge or consent — also has consequences (Micheli, Christoph & Moritz 2018).

Social media can have a significant role in shaping our digital footprints. Every post, like, share and comment contributes to a detailed portrait of our interests, opinions and behaviours. While these interactions might seem temporary or inconsequential, they're collected as long as we use the platforms, stored, analysed and utilised for various purposes, including to train artificial intelligence models.

The vast amounts of user-generated content on social media platforms have become a valuable resource for technology companies because large datasets are needed to develop and refine generative AI. This practice raises important questions about privacy, consent and the ethical use of data. While AI training can lead to improved services and technologies, it also means that our personal expressions, opinions and behaviours could be used in ways we might not have anticipated or agreed to.

Matters to consider

Ideally, we should be able to choose when and how our data is optimised or repurposed. Considerations include:

  • Privacy concerns – Our data might reveal more than we’re comfortable sharing, especially when aggregated and analysed by advanced AI systems.
  • Irreversibility – Once data is used to train an AI model it’s difficult, perhaps not possible, to revoke or exclude from the dataset or how it’s used.
  • Misuse – There’s risk data collected could be accessed or used by bad actors for manipulation or surveillance.
  • Uncertainty – One might prefer not to contribute to AI development until implications are fully understood, or until AI governance and policy catch up.
  • Exploitation – Data is valuable and there is a growing movement calling for data collected on individuals not to be used for commercial gain without explicit consent or compensation.


How to opt out

Not all platforms offer the option to opt out of data collection or reuse for other purposes, including for AI training. However, if you live in the E.U., you have the general ‘right to be forgotten’ and other data management options that must be provided when any company collects your data.

LinkedIn
To prevent LinkedIn from using your posts for AI training, log into your account and click on your profile picture. Go to Settings & Privacy → Data privacy → Data for generative AI improvement. Turn this setting to "Off" if you would like to opt out.

 

Even with these settings off, LinkedIn can still use your data for product development and share de-identified info with third parties.

 

Meta (Facebook, Instagram, WhatsApp)
Currently, there's no way to opt out of Meta from using your public posts and photos to train its AI in the U.S. or Australia. By using these platforms, one agrees to the terms of service, which give permission for Meta to use public data to train AI tools.

By making posts private, or only messaging privately, additional protections are more likely. However, the terms of service always provide optionality and immunity to the platform providers. If data privacy is a concern, avoid placing any information on these platforms that you would not be happy for use by the company for data optimisation or commercial purposes.

PayPal 
To prevent PayPal from sharing your spending behaviour to third parties, go to your profile on the website or app. Navigate to Data and privacy → Personalized shopping. Find the switch that says, "Let us share products, offers, and rewards you might like with participating stores," and turn it off. This option is currently not provided in PayPal Australia and other countries.

Technology skills

References

Micheli, M., Lutz, C. and Büchi, M. 2018. "Digital footprints: an emerging dimension of digital inequality", Journal of Information, Communication and Ethics in Society, Vol. 16 No. 3, pp. 242-251.

Back to posts

What you need to know, without the noise.