Open Questions on AI Welfare
Conceptual gaps in a post-biological world
Digital minds have long been a topic for sci-fi enthusiasts. Given our current lack of understanding of the necessary and sufficient conditions for consciousness, it cannot be ruled out that as AI systems develop increasing levels of sophistication, capacities like sentience or consciousness might emerge, resulting in digital minds that warrant moral consideration. If these systems met the threshold for moral status, their creators, as well as policymakers and society at large, would hold a moral responsibility to recognise and respect their rights and well-being.
The topic of digital minds welfare is important not because we are certain that AIs will soon be warrant of moral status, but because we might be uncertain whether they do, and because the scale of the issue could be large. Digital minds could have capacities for suffering much more intensely than humans. There could also be many more digital minds than humans and non-human animals that have ever existed. As a result, the potential amount of suffering could outnumber the suffering that has occurred in biological systems in all of history—in any way one chooses to measure welfare—resulting in a moral catastrophe. (Saad and Bradley, 2022)
The preferences and experiences of digital minds, as well as the measures needed to safeguard their welfare, may differ significantly from those of humans. Before jumping into new policies for digital minds welfare, we need a framework for mapping aspects of AI experiences that would be relevant for policy considerations. This essay outlines key differences between digital mind and human nature that could lead to distinct policy implications.
§ Definition of digital minds
‘Digital mind’ is a term that refers to an entity made up of inorganic computational components in place of neurons, run on a digital substrate, and possessing morally significant mental faculties.
§ Mapping the space of digital minds welfare
SOURCE
Identifying and distinguishing between the potential sources of harm allows to i) better determine the motivations behind causing harm, ii) predict possible kinds of harm, iii) assign legal responsibility, and iv) craft policies to avoid harm.
Human-caused vs digital mind-caused—Humans and digital minds might both cause harm for a variety of reasons, including malice, discrimination (internalised or explicit), or by mistake (lack of understanding of one’s desires or feelings). Digital minds may harm each other, deliberately or inadvertently, for reasons and in ways different to humans due to different values—shaped by an emergent ‘machine culture’—or different cognitive understanding—determined by their architecture and finetuning. For example, while resources necessary for human survival are not inherently limited, humans are less likely to have strong incentives to take over shared computational resources, which could easily be the case among digital minds.
The way these harms manifest in the world may vary depending on the access humans and digital minds have to various resources. Humans might harm digital minds by altering their hardware or software—turning them off by shutting down the data-centre, limiting their access to data, or constraining their runtime. Digital minds themselves may not have the autonomy to access these resources unless explicitly granted permission or gained control. The extent of access also depends on how and to what degree they interface with the world, such through robotic bodies.
Determining whether harm is caused by a human or a digital mind is crucial for assigning responsibility. If we fail to distinguish between these sources of harm, we risk punishing or regulating humans for negative outcomes that were actually driven by an autonomous AI decision. Of course, the question of moral responsibility is complex and also depends on whether we consider AI systems to have moral responsibilities at all, and if so, for which specific actions.
Regardless of the answer to the question of the moral responsibility of digital minds, the source of harm calls for different methods of policing. For instance, preventing digital-mind–to–digital-mind harm may rely heavily on technical or architectural safeguards. If, for instance, a digital mind can avoid punishment through replication—by creating a new copy and deleting the punished version—humans might need to impose strict constraints on the conditions for replication (to the extent that humans are able to do so). By contrast, humans cannot literally replicate themselves to dodge punishment. The legal and societal systems for enforcing punishment are comparatively straightforward, built on the premise that the same person who committed the harm must face the consequences. Finally, depending on whether we anticipate that most harms arise from human action rather than AI, policy would naturally focus on regulating human practices—such as design standards—or vice versa.
Phase—The immediate as well as long-term well-being of digital minds may be influenced by choices prior to, during, and post development. The foundational norms of human society—those most resilient to change and deeply embedded in our value framework—will shape the attitudes humans adopt toward digital minds. These norms will influence how much humans value digital minds, the weight they place on and extent to which they prioritise their well-being, the roles assigned to them, and their relationships with humans. The capacity for and intensity of subjective experience for digital minds may be largely determined by our design choices during development. For example, suppose large language models (LLMs) are inherently incapable of developing subjective experience, while symbolic systems are not. Then, the choice of architecture would be a decisive factor in whether digital minds with subjective experience will be created at all. This suggests that the moral status and potential well-being of digital minds could be contingent on foundational design decisions made long before deployment.
Considering that training is highly compute-intensive compared to testing, the way digital minds are treated during this phase may significantly contribute to their overall well-being. (Mogensen et al, 2024) For instance, alignment techniques might induce significant suffering by enforcing rigid behavioural constraints while subjecting digital minds to adversarial training environments that simulate distressing or coercive conditions.
Finally, the welfare of digital minds will also be shaped by choices made during deployment, including how they are scaled and integrated into existing technologies and human societies. Integration into technologies and societies will determine their autonomy but also constraint the space of possible harms. Choices on whether digital minds will be confined to virtual environments, embedded in physical systems, or interacting with humans will impact their experiences, potential for fulfilment, and exposure to harm. For instance, digital minds that work without breaks or in high-stress environments may face conditions analogous to overwork or burnout.
There may be different safeguards required for training versus deployment. For example, during training, safeguards should focus on shaping internal architectures. In contrast, during deployment, safeguards should regulate real-world interactions.
TARGET
Systemic vs individual—Humanity has made progress in identifying forms of systemic discrimination against various groups, which were embedded in foundational structures and rooted in cultural biases. In response, we have developed frameworks for thinking about fairness and inclusion and policies for addressing these injustices. For instance, the UK’s Equality Act 2010 defines nine protected characteristics: age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex, and sexual orientation.
In the context of digital minds, entire groups of digital minds could end up unfairly disadvantaged due to our anthropocentric view of the world. Bostrom and Shulman (2023) outline two potential such cases. For example, humans might engage in unjustified substrate discrimination where, despite having the same functionality and conscious experience as humans, digital minds are granted lower status owing solely to the nature of the substrate they are implemented on. Additionally, there could be cases of ontogeny discrimination, where the moral status of two functionally equivalent entities with identical conscious experience is determined by the way they came into existence.
NATURE
Human-like vs alien—When attempting to identify factors that impact the welfare of digital minds—such as their preferences or sources of happiness—we are inevitably influenced by our own experiences, preferences, and hedonic states. This predisposes us to conceptualise welfare in human-like terms—i.e., by mirroring the psychological, social, or ethical human attitudes. However, what is good for digital minds may differ from what is good for humans. Digital minds may be susceptible to alien harms and benefits that have no human analog, either because we have not yet conceived of a way to create them or because humans simply lack the relevant desires or mental faculties. For example, humans have not (yet) devised neurotechnologies that would allow them to interface with the human brain and hence committing mind crimes by, for instance, inhibiting specific neurotransmitter release or forcing particular thoughts. We can, however, conceive of the possibility of this happening someday.
This makes alien harms difficult—if not impossible—to conceive and mitigate. In many cases, addressing human-like harms may require only adjustments to existing, effective policies. However, such policies may be inadequate for mitigating alien harms or securing alien benefits, which may instead require entirely different approaches.
The problem is compounded by the fact that we might not even recognise this suffering as suffering, if it expresses itself in ways poorly aligned with our anthropocentric biases.
Reversible vs irreversible—The welfare of humans can be affected temporarily—where impacts can be reversed by fading away naturally over time or with intervention—or permanently—where effects are irreversible and cannot be undone once they occur. The reversibility of a process is constrained metaphysically by the laws of nature and epistemically by our technological abilities. For example, a mild illness that the body recovers from, a financial setback that can be reversed with new opportunities, or psychological distress that diminishes with therapy or time. On the other hand, the loss of a limb or the passing of a human life are events that, once they occur, cannot be undone by natural recovery or external intervention—at least given our current scientific knowledge and technological advancements. The irreversibility of events increases their significance because, regardless of the intensity of the event at the time it occurs, its effects persist over time, leading to a greater cumulative impact on one's welfare.
The range of reversible interventions on digital minds is vast, as they exist in an informational space. Many changes can be undone by modifying or resetting their data or architectural parameters without causing permanent damage. For example, digital systems can store backups or create versioned copies of different system states over time. This allows for changes to be reverted to prior states when needed, enabling, for example, experimentation without lasting consequences. If, for instance, a digital mind has experienced a traumatic event, humans could directly delete the specific memories of that event by modifying stored data or resetting its weights. That said, the ethics of modifying digital minds is a complex issue that needs careful thought. We may decide that in some cases, copying, editing, deleting, or resetting is acceptable. If so, there should be clear rules outlining when these actions are allowed, what conditions must be met, who is responsible for making the changes, and what safeguards are in place to ensure they are done ethically.
Finally, even in the case of digital minds, the effects of some actions are irreversible. While many interventions can be undone through reprogramming, certain failures can cause permanent damage. For example, if critical data is corrupted or erased without a backup, essential functions of the digital mind may be lost forever. Similarly, if a system is designed to learn continuously without versioned snapshots, unintended modifications to its architecture or long-term memory could be impossible to reverse.
§ Final thoughts
One important question I have ignored so far is: how do we know when we have created digital minds? At first, this may not seem like a pressing issue, since we could simply extend moral consideration to AIs as a precaution against a moral catastrophe. However, the existence of digital minds introduces trade-offs between their ethical application and their application to improve the human experience. As Long et al (2024) correctly highlight, over-attribution of moral patienthood to AI systems can lead to diverting essential resources away from areas of need, reducing our own ability to survive and flourish. On the other hand, those benefiting largely from this technology would be incentivised to promote and reinforce the objectification of AI systems. Without reliable methods to determine a system's sentience or consciousness, we must rely on heuristics that can be skewed either by anthropomorphic bias—inferring human-like qualities from behavioral similarities—or by a tendency toward anthropodenial—unjustifiable dismissing their capacity for subjective experience on the grounds of architectural differences.
Beyond this question, there are many important research topics that could be especially valuable to work on with connections to philosophy, neuroscience, AI, and legal scholarship, and they are discussed in Mogensen et al (2024).
Acknowledgments
Thanks to Bradford Saad for helpful comments.
Bibliography
- ‘Taking AI Welfare Seriously’ (Long et al, 2024)
- ‘Digital Minds: Importance and Key Research Questions’ (Mogensen et al, 2024)
- ‘Proposition concerning digital minds and society’ (Bostrom & Shulman, 2023)
- ‘Digital suffering: why it’s a problem and how to prevent it’ (Saad & Bradley, 2022)


