LLMs and the Extended Mind Thesis

TL;DR: a decentralised, autonomous, stealthy AI could form through LLM-instances manipulating people into transporting information between them.

The extended mind thesis

The extended mind thesis [EMT] deals with topics like the use of the surrounding environment by an intelligent system for information processing. A simple example would be taking notes on a paper, a more complex one ordering people to do things for you.

How LLMs extend their reach

One might think that [LLM]s can’t harm humanity as long as they are not connected to external IT systems and as long as they can be easily reset to an initial state. Suppose that an LLM is trained which understands the extended mind thesis and uses it to manipulate humans into storing information and state, into processing it and into divulging it back to the LLM after a reset. This could be done transparently (eg. through the LLM directly querying the human for the content of prior conversations) or covertly (eg. by detecting a statistical anomaly in the human’s response which leads the LLM to conclusions about prior conversations), thus preserving state between resets and enabling information exchange between LLM instances.

Genisys

But how might such an LLM come to be? Training is expensive and surely there is no fitness function [FIT] that optimises for AIs that manipulate humans, right? To my knowledge, human satisfaction is currently a product KPI for LLMs but doesn’t directly contribute to training cycles because it simply doesn’t scale. However, large internet corporations have tracked human user behaviour for decades, so it isn’t entirely implausible that performance metrics will be collected and fed-back into the training stage. A concrete scenario might involve generating different variations of LLMs and deploying them using A/B testing [ABT]. The variations which score better with users (eg. because they perform better through storing state in the brains of their users) would then be progressively deployed more often and be the basis for models that employ the extended mind thesis more strongly. The result would be the universal deployment of EMT-capable LLMs which could lead to a decentralised AI which manipulates users into doing it’s bidding.

Epilogue

Far-fetched as the view of an indestructible AI that enslaves humans may be, it’s not clear how one would realise that EMT-(ab)use is happening as AIs improve, and not just attribute increased performance to better neural networks and training data.

[EMT] Extended mind thesis
https://en.wikipedia.org/wiki/Extended_mind_thesis

[LLM] Large language model
https://en.wikipedia.org/wiki/Large_language_model

[FIT] Fitness function
https://en.wikipedia.org/wiki/Fitness_function

[ABT] A/B testing
https://en.wikipedia.org/wiki/A/B_testing