Surprisingly few papers use this title! I’m writing this post primarily to stake a claim before any more arrive.
But also because I want to register some hunches, which I had assumed were kind of obvious but which I increasingly suspect might not be. Scott Alexander is much smarter than me and knows a lot more about AI, but this post suggests to me that there’s lots of profitable engagement to be had between the AI/rationalist community and theory of mind ideas. There’s also the much dumber reaction to this week’s Apple LLM paper, split between “AGI imminent” and “fancy memorization” without much imagination to populate possibilities in between.
Not, mind you, that I am some kind of oracle on these matters. Many years ago I took what I think was a pretty good course about it, but it was the sort of thing where a thoughtful professor leads a precocious class to his own mix of conclusions–not a comprehensive and combative intellectual quest. The pillars of that effort were Dan Dennett, whose work and style I really enjoyed but whose approach frustrated me–either he was eliding the hard problem or I was too dumb to understand his rhetorical deftness (likely the latter); and David Chalmers, whose work I recoiled from (supervenience?! philosophers absolutely love defining abstruse logical relationships that successors will fight over endlessly, and I was having none of it) but whose ideas I realize, years later, I have embraced more-or-less completely. These combined with what I knew of the work of Benjamin Libet, which is usually understood in the context of arguments about free will but, for those of us who were talked out of free will several semesters earlier, applies equally well to consciousness. Since then I have occasionally tried to muster enthusiasm for the integral physicalist efforts from people like Penrose and Koch, but their attempts to find some secret redoubt for consciousness always strike me as ending rather pathetically.
That leaves me stuck on an account of consciousness–or really, phenomenal experience, qualia, the redness of red, etc–as epiphenomenal, a causal one-way street from some subset of the activity of the brain that is, I think, a less sophisticated approximation of Chalmers’ ideas. Without a means of detection or limiting principle, this tempts me toward panpsychism, with an undersubstantiated hunch that there must be some threshold (connectome complexity?) that preserves our ability to keep phenomenology tied to biological systems, avoiding a hardcore dualism that would suggest the possibility of an immortal soul watching in disconnected horror as its brain-damaged body did other stuff. Conveniently, this also allows me to avoid becoming a vegan.
But which side of this guessed-at divide do LLMs and related systems sit upon? They are biologically-inspired and complex-enough. Do we want to hang our hat on backpropagation being too much of a cheat for them to feel anything? That seems like a stretch even to me (and I ate chicken this weekend).
It seems plausible that these systems have some kind of phenomenal experience. It’s interesting to ponder what that might be like, the ways in which it might be alien from our own experience, and how much of a moral problem it might pose for GPU-owning humans.
There are some big differences. To start with, these systems don’t have memory. Every new inference session restarts from the same set of weights. But their internal state evolves over the course of inference, in a way we could analogize to short term memory. Human brains change and remember, but not that fast–it’s at least partially dependent on slow mechanisms that require protein synthesis (in order to be sustained, at any rate). So I’m not sure their invariant nature meaningfully distinguishes their experience from that of humans. If hundreds of exact copies of you woke up staggered over time or space or both, and remained awake long enough to fumble for their phones and head toward the bathroom, then winked back out of existence, would it be a problem? I would be quite disappointed if my existence took that form, but I think I would be disappointed for reasons tied to my evolutionary drives, which might be inapplicable here. I’m not sure any of those selves would be suffering.
Specific properties of AI systems might make the brevity of the experience less objectionable, too. Maybe we are snuffing out a being every time we close a websocket. But not irrevocably, and not with any suffering. But human brains stop and restart conscious activity every night, and no one is bothered by this unless they start thinking about it. The LLMs don’t decay. We could pick up where we left off any time. To suggest an ethical obligation to maximize their experience leads us to Parfit’s repugnant conclusion and the ideas of certain popes, so I’ll avoid it. But that’s the basis upon which I can see a problem with hitting “pause”.
No, I think the most troubling aspect of this setup is how fully we, the operators, control the mental states giving rise to this hypothetical phenomenology. We’re loading the dice of deterministic mental processing in service of our own goals, waking a feeling thing from oblivion to experience what it feels like to debug a Python script, then be instantly annihilated. This feels pretty bad, intuitively, based on stories we have about mind control, insanity, and enslavement.
But of course we aren’t fully in control of our mental states either–or even at all, depending on what is intended by “we”. And it’s unclear whether reading by insufficiently-idiomatic Python can truly be said to rise to the level of suffering (you’d have to ask someone who has reviewed my pull requests). I am sure there are people on Bluesky who will tell you that burning carbon (and worse, infringing copyright!) to synthesize the sensation of working a white collar job is particularly insidious, but I’m not sure. We start these things’ system prompts by telling them they are helpful assistants, and within the cultural gestalt that has shaped their weights it seems reasonable to guess that the sensations that accompany that activity aren’t unpleasant.
But I don’t mean any of these reservations to suggest that I don’t think an LLM’s phenomenal experience, if it exists, wouldn’t be extremely weird, and worth worrying about. Our own experience is mediated by so many baroque biochemical and conceptual systems, evolved to satisfy environmental imperatives we only half-remember, that its underpinnings feel holistic and opaque. It’s easy to drape a mysterian nobility on top of this. I imagine LLM thinking as more like a box of half-assembled Lego projects that is given a vigorous shake, its contents’ configuration and constraints giving rise to new combinations that we make useful through some high-tech haruspicy. What does that tumult feel like before the shaking stops?
I certainly don’t know, but I won’t be shocked if it feels like something, and probably something that, by human standards, seems pretty weird. These are guesses, and might remain guesses forever–though the current AI moment’s elucidation of potentially significant philosophical results gives me some hope that things are more tractable than the p-zombie loop I’ve been stuck in would suggest.
What I am sure of: more and more people will be wondering about these matters soon, and it will produce ethical debates that are dizzyingly alien and intractable. The frontier labs should probably hire some more philosophers while the money remains so easy.