People have been asking if current ML systems might be conscious. I think overly strong answers to this in both directions include "certainly not" and "sure, but so might atoms" as well as almost any variant of "yes". Here I’ll try to give a sense of my own views of AI consciousness, what they're grounded in, and how much I think it matters.
This is a topic with a lot of different threads—too many to cover satisfactorily in a single post. So there are many places where I just describe my views and give a rough summary of my reasoning. I’m also not trying to present my views as particularly novel or insightful. Philosophy of mind is not my area of specialization, and almost everything I discuss here has a literature spanning decades or even centuries.
What do we mean by "consciousness"?
There are lots of things people can mean by consciousness, and philosophers have spent a lot of time trying to identify and carve up the many concepts that get conflated into this one word. I'll use the term "consciousness" to refer to phenomenal consciousness. Very roughly, you're phenomenally conscious if you're having experiences or if there's some kind of what-it-is-likeness to your existence. Philosophers sometimes describe this as "having an inner cinema", though the cinema might be more like fleeting sensations or sounds than the rich movie-like inner life of humans. (See e.g. §3 of Block, 1995 for a more nuanced account.)
I care about phenomenal consciousness because (i) I think it's an incredibly striking property for things to have, (ii) I think it's the foundation for many of the other properties we care about, and (iii) I think it has moral significance, which I discuss below. I don’t think phenomenal consciousness requires things like intelligence or self-awareness (though see here on views to the contrary). I think a floating ball of sensations could have phenomenal experiences without being intelligent or self-aware.
Some people care about properties like intelligence and self-awareness because they want to identify features that might distinguish humans from non-human animals. In general, I’m more interested in what distinguishes a tiger from a rock than in what distinguishes a human from a tiger. Humans are more capable than animals by virtue of being more intelligent. But interesting capabilities and moral worth seem present in many living creatures, even if both properties tend become more pronounced as creatures become more complex.
Are current AI systems conscious?
I think the largest of our current ML systems are more likely to be phenomenally conscious than a chair but far less likely to be conscious than a mouse. I'd also put them at less likely to be conscious than insects or fish or bivalves. I think I'd place current ML systems in the region of plants. Plants are complex systems that respond to negative and positive stimuli, that can act in ways that seem to require planning, and that can do things that look like internal and external communication. And yet plants lack what many think of as plausible candidates for prerequisites of phenomenal consciousness in biological systems, like networks of neurons.
So I think current ML systems are probably not conscious in the same way I think plants are probably not conscious. I say all this with relatively low confidence, however. I’m sure the right cognitive scientist or philosopher of mind could argue me into the view that our largest current ML systems are even less likely to be conscious than plants are, or that they may be as conscious as bivalves or insects.
Having said all this, I do think ML systems as a group have much more potential for consciousness in the future than plants or bivalves do.
ML systems aren't really a single thing. They're a class composed of many different but related artifacts that are being developed over time. Most ML systems have architectures that are distinct from those of brains (e.g. see here). But I think future ML research based on the foundation of neural networks could result in systems with much more of the architectural, behavioral, and cognitive correlates of consciousness than we see today. This could be the result of more biologically-inspired neural networks, or by discovering that neural networks that deviate from biological neural networks (e.g. whose outputs are continuous rather than discrete) can nonetheless result in the behavioral and cognitive correlates of consciousness.
I believe this partly because I find the computational sufficiency thesis plausible—the thesis that says “the right kind of computational structure suffices for the possession of a mind” (Chalmers, 1993/2012 and Chalmers 2012). If we think that the computational structure of the human brain can be captured by non-biological systems, we have reason to believe these systems can be conscious.
Of course, the extent to which systems that give rise to similar computational structures to the human brain remains might be conscious is an open question. But it seems to me that if we think a non-biological system fully representing the computational structure of a human brain can be conscious, we also have evidence that a system implementing very similar but distict computations can also be conscious. It seems implausible to think that while a complete representation of the human brain can be conscious, even the slightest deviation from it cannot be.
So although I don’t think current ML systems are likely to be conscious, I don't see overwhelming reasons to think that future ML systems couldn't one day be conscious, or even that they couldn’t one day be as conscious as present-day humans.
What counts as evidence of consciousness?
What am I doing when I try to estimate the likelihood that something is conscious now or that it will be in the future? I think I'm trying to weigh up at least four different types of evidence: architectural, behavioral, functional, and theoretical.
Architectural evidence varies depending on how analogous the physical structures of a system are similar to my own, and whether those physical similarities are plausibly the kinds of things that give rise to my own consciousness. I have both a brain and fingertips, but the fact that my brain state is correlated more closely with my experiences makes things with brains but not fingertips more likely to be consciousness than hypothetical creatures with fingertips but not brains.
Behavioral evidence is evidence that an entity has what we think of as behavioral and cognitive correlates of consciousness, e.g. awareness of its environment, responses to external stimuli, or more complex behaviors like speech and reasoning.
Functional evidence is evidence that it makes sense for this kind of entity to be conscious given the goals that it has and how these relate to its environment. A table or chair isn't really subject to optimization pressure, and so it doesn't have any reason to form an awareness of an environment that a mouse does.
Theoretical evidence includes evidence for different theories of what gives rise to consciousness in the first place.
I don’t think we should care about architectural, behavioral, or functional properties for their own sake, or that we should treat them as identical to consciousness. What we care about is whether the agent is phenomenally conscious, and I take these to be evidence of that. But suppose it’s possible for an agent has what Block (§4-5, 1995) calls “access consciousness”—it can extract and integrate information from its environment, and access this information when making decisions—and yet lack phenomenal consciousness. Such an agent could have the right architecture for consciousness and behave as if it were conscious, and yet have no phenomenal experiences. On my view, we would be rational to think this agent was phenomenally consciousness based on the strength of evidence available. But if we could somehow see the world from their point of view, we would realize we were mistaken.
On the question of theoretical evidence, many philosophers of mind have argued in favor of what we might call continuous or inclusive theories of consciousness like panpsychism. On these views, many or even all entities are believed to be conscious, though the nature of their consciousness could be very different from our own.
Other theories of consciousness are less inclusive. On these less inclusive views, consciousness is thought to depend on more complex physical structures that beings like ourselves have. Some of these theories are highly restrictive, e.g. some mechanists denied that non-human animals are conscious, but most are more inclusive than this.
Finally, some theories like illusionism posit that there is no such thing as phenomenal consciousness. On this view, the fact that ML systems are not conscious doesn’t distinguish them from humans or any other living creatures. And whatever properties we think makes humans interesting or deserving of moral status, they must be something other than phenomenal consciousness. We can debate how much or how little of those other properties ML systems have compared with humans.
How likely you should think it is that current ML systems are conscious depends partly on your probability distribution across all these different theories. My impression is that philosophers of mind give more weight to permissive theories of consciousness than most. But I think many people's probability distributions skew towards somewhat-but-not-totally-permissive theories of consciousness, e.g. those that say small mammals are conscious but plants are not.
For many semi-inclusive theories of consciousness, certain creatures like invertebrates or fish will be in the "gray zone" of consciousness. We wouldn't be too surprised to find out that they're not at all conscious, but we also wouldn't be too surprised to find out that they have some very low-level of consciousness. Current ML systems seem to be causing some people to ask how we can tell when something has entered the consciousness gray zone. This suggests that even if they aren't conscious yet, these people can foresee ML systems entering this gray zone in the future. After all, we tend not to ask such questions about chairs.
Does AI consciousness matter?
Consciousness is a fascinating property in its own right, and it would be a pretty big deal if humans could one day bring into existence non-biological consciousness. But I want to focus on the question of whether consciousness matters ethically.
I think consciousness matters ethically if it plays a role in whether current or future ML systems are moral patients or moral agents. If something is a moral patient, we should care about our treatment of it. If something is a moral agent, it can be held morally responsible for its actions. Lots of moral patients are not considered moral agents, e.g. animals and very young children.
Both of these questions have been discussed by those working in machine ethics (see Harris & Anthis for a review of this literature). Here I will just present my own high-level views on these questions.
Moral patienthood
I believe entities probably become moral patients as soon as they are sentient: i.e. as soon as they're able to have valenced subjective experiences like pleasure and suffering. It seems implausible to me that an agent could be capable of suffering and yet we could have no moral obligation to alleviate it's suffering, even if we could do so at no cost to ourselves.
(While I think sentience is probably sufficient for moral patienthood. I take less of a stand on whether sentience is necessary for moral patienthood. For example, we might think that the hypothetical creatures that Chalmers calls 'vulcans', which are phenomenally conscious but not sentient, can be moral patients.)
Phenomenal consciousness seems like a necessary condition for sentience. After all, it's hard to have valenced subjective experiences if you can't have any subjective experiences. I mostly care about consciousness from an ethical point of view because I think it's a prerequisite for sentience and, therefore, moral patienthood.
Failing to attribute patienthood to a moral patient is often much worse than accidentally attributing moral patienthood to non-patient. When we fail to treat moral patients as moral patients, we can do significant harm by dismissing their suffering. Humans also have a long history of denying moral patienthood to others when acknowledging patienthood would be inconvenient. Given this, I think it's better to err on the side of mistakenly attributing sentience than mistakenly denying it. This doesn't mean acting as if current ML systems are sentient – it just means having a lower bar for when you do start to act as if things might be sentient.
If we ever have reason to believe the ML systems we are working with are phenomenally conscious and capable of valenced experiences, I think we should start to treat them as moral patients. So I think the question of AI consciousness is far from ethically irrelevant.
Arguments that a group should be granted some moral status or more moral status are often highly fraught. This is true of the debate about the moral status of future ML systems, but it's also true of almost all the other things that people have argued we should grant more moral status to: animals, insects, fetuses, environments, and so on.
Why are these debates often so fraught? I think one reason is that when we say "group X is more important than we thought", people sometimes think we're implying that the groups we currently grant moral status to are less important than we thought, or that we should divert resources away from these groups and towards group X.
But helping one group doesn't need to come at the expense of others. Adopting a plant-based diet could be good for both animals and human health, for example. Groups are often not competing over the same resources and we can often use different resources to help both groups rather than forcing trade-offs between them. If we want to increase resources going to global poverty, diverting existing donations from charitable causes isn’t our only option — we can also encourage more people to donate and to donate more, thereby pulling money out of current personal spending.
Similarly, believing sentient ML systems would have moral patienthood doesn't mean we care any less about the wellbeing of people or that we need to divert existing resources away from helping them.
Moral agency
What about moral agency? If you're a moral agent, your actions are judged by the standards of right and wrong. You're expected to act in ways that are good and avoid acting in ways that are bad. Typically, moral agents are held responsible for their actions while non-agents are not. We punish adult humans when they harm others, but we don’t punish rocks or ladders for causing similar harms.
The weakest notions of moral agency only require something like being responsive to positive and negative incentives. If you respond to incentives, it makes sense to hold you to the demands of morality in the sense that we should punish you for bad behavior and reward you for good behavior, since this will improve your future behavior. Rather than dispute weak vs. strong accounts of moral agency, I'll use "punishable" to refer to this kind of weak moral agency. An agent is punishable if we are rational and morally permitted to punish them for wrongdoing.
Being punishable doesn't seem to require being phenomenally conscious. Current ML systems are already punishable in a sense, though their "punishments" hopefully don't involve causing them any suffering. We think it's appropriate to give negative feedback to an ML system if it messes up, for example, since the system responds to negative feedback.
One might worry that if future ML systems are punishable in a more robust sense, humans should be punished less for the misdeeds of the systems they create. But I don't think this is true. There isn't a fixed pool of punishment that we need to divide between an ML system and its creators. If ten people collaborate to rob a store, we don't take the legal punishment for one person robbing a store and divide it between all ten of them. We try to assign the right level of punishment to person involved rather than assigning a fixed amount of punishment to each crime. This is because our goal is to give all the agents the right incentives to avoid wrongdoing. So even if ML systems become more punishable for their actions, it doesn't follow that their creators become less punishable for them.
What about stronger notions of moral agency? We often think an agent was morally responsible for their actions only if they had a capacity to understand right and wrong, weren't deceived about the relevant facts, and could have acted otherwise. Someone who is morally responsible for their actions seems deserving of blame rather than just punishment. We also tend to treat them as the primary cause of the wrongful actions, even if other causal factors played a role.
To illustrate this, suppose someone persuades their friend to start a forest fire. The person starts the fire but wouldn't have done so were it not for their friend's persuasion. Despite this, it’s the person who starts the fire that will be held morally responsible if they are caught, and not their persuasive friend. (In an ideal world, we might punish the persuasive friend too, but we place the bulk of the moral responsibility on the person who actually started the fire.) Alternatively, suppose the same person trains their dog to start a forest fire. In this case, we place the bulk of moral responsibility on the human trainer and not on their pet.
Why do we hold the human arsonist morally responsible but not the well-trained dog? First, the human arsonist had the capacity to think about their options and choose not to act on their friends persuasion, whereas the dog lacked this capacity to reason about their options. Second, the dog never understood that its actions are wrong and so it never displayed a disposition to do wrong things—just to do things it was trained to do. And a disposition to do wrong is something we usually want to correct.
So the more an agent has the ability choose between options and the more their own dispositions play a role in the outcome, the more it makes sense to hold them morally responsible for the outcomes of their actions and the less it makes sense to hold responsible someone whose actions occurred at an earlier point in the causal chain.
Suppose advanced ML systems became moral agents in this stronger sense, i.e. agents fully capable of understanding right and wrong, considering the options available to them, and acting on their own desires. Does this mean that those who create or shape the ML system should be absolved of moral responsibility if it behaves badly?
I don't think so. Even if we think moral agents are responsible for their own actions, it doesn't follow that others can't be responsible for the outcome of their decisions to bring those agents into existence, to influence how they are disposed to behave, and so on. To illustrate this, consider the following two scenarios:
Company A makes cars and announces that every car bought from them will come with its own personal driver. They don't bother to do any checks on the drivers they hire, including whether they're legally allowed to drive, and don't provide any training. They also give their drivers gift cards to local liquor stores as a work perk. One day, one of their drivers decides to drive while drunk and gets into an accident that severely injures the car owner and kills a pedestrian.
Company B makes cars and announces that every car bought from them will come with its own personal driver. They perform rigorous checks on each of their drivers to ensure they're legally allowed to drive and don't have any past driving offenses. They also provide an eight week course on safe driving for each of their drivers as well as resources drivers might need to drive safely, and do ongoing checks of their driving skills every six months. One day, one of their drivers decides to drive while drunk and gets into an accident that severely injures the car owner and kills a pedestrian.
In both of these cases, the car drivers are adult humans that we don't have any reason not to ascribe moral agency to. We also don't have any reason to think one of them has more moral agency than the other. But the mere fact that it was moral agents that performed the final action doesn't absolve the car companies of any moral responsibility for its outcomes. We might think both drivers have moral responsibility for their own personal decision to drive drunk, but the companies are responsible for their own role in bringing about the outcome.
When bad outcomes happen, they are evidence that someone in the causal chain of events leading up to the outcome did something that was wrong and avoidable. In this case, it's clear that Company A acted far worse than Company B. Company A was negligent in a way that made the outcome almost inevitable, whereas Company B tried to do everything in its power to make such an outcome highly unlikely.
We might absolve Company B or wrondoing in this case, or we might decide to punish Company B even if we think the outcome was the result of bad luck, but we would probably want to be harsher with the more negligent Company A than with Company B. Importantly, we would want to punish Company A even if we think the driver is a moral agent that should also be punished.
In order to assess whether creators act with due care when they create something, we have to ask the following, regardless of whether the thing they create has moral agency or not:
What was the expected impact of creating the particular entity that they did, factoring in the actions of agents that were likely to interact with it?
How much effort did they put into getting evidence about what its impact would be?
How much control did they have over the behavior of the entity they created, either by directly influencing its behavior or by affecting its dispositions?
How much effort did they put into improving the behavior of the entity insofar as it was within their power, again either directly or by affecting its dispositions?
Sometimes ML systems will fail even though creators put in every effort to make sure they turn out well, and sometimes ML systems will fail due to creator error or negligence. Creating moral agents certainly complicates matters because moral agents are less predictable than automata. But it doesn't absolve creators of the obligations they have to ensure that their creations are safe.
Of course, all of this is highly speculative. Most people don't think current ML systems are conscious, and consciousness is clearly not sufficient for moral agency in this stronger sense. But I think it's worth noting that even if ML systems were to develop strong moral agency, this wouldn't absolve AI labs of their obligations to (a) invest resources in better ways of predicting how safe and beneficial their systems are, (b) invest resources in making their systems safe and beneficial, and (c) not release systems that fail to be predictably safe and beneficial.
How important is work in AI consciousness?
I don't think we yet live in a world where AI labs are running the moral equivalent of animal experiments on their models. But I would like to live in a world where, over time, we have more evidence grounding the probabilities we assign to where we are on the scale of "we're doing no harm" to "we're doing harm equivalent to swatting a fly" to "we're doing harm equivalent to a large but humane mouse experiment" to "we're doing harm equivalent to a single factory farm" to "our RL agents are sentient and we've been torturing them for thousands of years".
We are used to thinking about consciousness in animals, which evolve and change very slowly. Rapid progress in AI could mean that at some point in the future systems could go from being unconscious to being minimally conscious to being sentient far more rapidly than members of biological species can. This makes it important to try to develop methods for identifying whether AI systems are sentient, the nature of their experiences, and how to alter those experiences before consciousness and sentience arises in these systems rather than after the fact.
Although ML systems are disanalogous to animals in many respects, some of our current techniques for detecting pain in animals may still be applicable to attempts at estimating the likelihood of sentience in ML systems. For example, we could look for architectural features that would allow for pain detection, consider the usefulness of phenomenal pain for the agent, look for behavioral indications of pain, and so on.
The lack of similarity between biological and non-biological systems might increase our uncertainty a great deal and require the creation of new techniques that apply to ML models in particular, but this doesn't mean it's impossible to improve our estimates. Some authors have already tried to propose tests for AI consciousness (e.g. Schneider & Turner, 2017). And my own view is that it would be useful to build a range of potential evaluations for machine consciousness and sentience—evaluations that adequately reflect our uncertainty across our various theories of both. How much evidence each of these evalautions provide will inevitably depend on the different accounts of consciousness and sentience we are uncertain over.
Of course, some might argue that no future AI systems—or no future AI systems based on certain architecures—could be conscious. Such arguments have already arisen in the existing literature on machine consciousness.
I see arguments in favor of this claim as complimentary to the work of estimating the likelihood of AI consciousness, just as the strength or weakness of the arguments given by early mechanists help us form a better estimate of the likelihood of consciousness in animals. If the best arguments we can find against the possibility of AI consciousness are poor or shown to be unsuccessful, our estimates of the likelihood of AI consciousness will likely increase. If they are excellent and we cannot find good objections to them, our estimates will likley decrease.
How important is it for us to develop a framework for evaluating ML consciousness and sentience? Here are what I take to be the main reasons to work on this:
We could end up causing a lot of suffering to future ML systems if we don't develop a framework for identifying how sentient they are
There is tractable low-hanging fruit here, since there isn't (to my knowledge) much work on a practical framework for assessing sentience in ML systems
It's good for us to normalize taking the possibility of sentience in other entities seriously, and engaging in careful work on these questions
This might be a very big deal later, e.g. if we eventually develop digital minds
Here are what I take to be the main reasons against working on this:
Creating overconfidence in machine sentience or its importance could be a distraction that hampers important AI research, including safety research
The work of constructing a framework for detecting sentience is likely to be very difficult and it would be difficult to have it achieve broad acceptance
People are likely to be convinced of the value of AI systems independently and this work will not end up being relevant to the treatment of AI
There are more urgent and important issues that need attending to, including AI safety, global poverty, climate change, etc.
Whether someone should invest their money or time in this kind of work depends — as it always does — on facts about them, as well as where that money or time would have gone otherwise. I think this could be a very important project for someone who has expertise in areas like the philosophy of mind, cognitive science, neuroscience, machine consciousness, or animal consciousness, and who has or can develop a working understanding of contemporary ML systems.
Doing valuable work in this area requires a willingness to turn theoretical research into practical frameworks that can be used to estimate the likelihood of consciousness and sentience in ML systems. It also requires a tolerance for what is likely to be a perpetual state of uncertainty. But just as we shouldn't let the best be the enemy of the good, we also shouldn't let certainty be the enemy of inquiry.
Thanks to Chris Olah, Jackson Kernion, Danny Hernandez, Jared Kaplan, David Mathers, and Ransom Mowris for their helpful comments. Thanks also to everyone who commented on earlier drafts of this post.
Re: "Doing valuable work in this area requires a willingness to turn theoretical research into practical frameworks that can be used to estimate the likelihood of consciousness and sentience in ML systems."
In the continued absence of a convincing mechanistic theory of phenomenal consciousness, one could develop a long list of "potential indicators of consciousness," give each indicator a different evidential weight, catalogue which classes of ML systems exhibit which indicators, and use this to produce a (very speculative!) quantitative estimate for the likelihood of phenomenal consciousness in one class of ML systems vs. another.
Overlapping lists of potential indicators of consciousness that have been proposed in the academic literature are here:
https://www.openphilanthropy.org/2017-report-consciousness-and-moral-patienthood#PCIFsTable
https://rethinkpriorities.org/invertebrate-sentience-table
Of course, in addition to the question of "likelihood of being phenomenally conscious at all," there is the issue that some creatures (and ML systems) may have more "moral weight" than others, e.g. due to differing typical intensity of experience in response to canonical stimuli, differing "clock speeds" (# of subjectively distinguishable experiential moments per objective second), and various other factors that one might intuitively think are morally relevant. I sketched some initial thoughts at the link below, which could potentially be applied to the analysis of different classes of ML models:
https://www.lesswrong.com/posts/2jTQTxYNwo6zb3Kyp/preliminary-thoughts-on-moral-weight
Jason Schukraft at Rethink Priorities is leading some projects building on this past work.
Really enjoyed this article - thank you for such a clear discussion!
I'm curious if you think that we should consider not training certain ML systems, if:
1) There's enough probability that the system would experience suffering, and/or
2) The extent of the potential suffering is great
Some frameworks for decision-making under uncertainty use expected value to choose moral actions, and I'm curious if you think those frameworks (or others) suggest that we shouldn't train certain ML systems?