Voluntary cooperation is a main feature of the civilizational game. It got us to where we are today. We explored how to improve and defend this dynamic from within the game. Gradually, non-human intelligent players are entering the playing field. In a few more iterations, they will be major players. If we want to continue our Paretotropian ascent, we better make sure our cooperative framework is set up to handle a diversity of intelligences pursuing a diversity of goals.
AI Threats
Let’s pretend we have achieved computer security. Are we set for a cooperative long-term future? It is worth revisiting Toby Ord’s AI takeoff scenario. According to Ord, once an AI breaches computer security vulnerabilities, it could escalate its power.
“This is more speculative, but there are many plausible pathways: by taking over most of the world’s computers, allowing it to have millions or billions of cooperating copies; by using its stolen computation to improve its own intelligence far beyond the human level; by using its intelligence to develop new weapons technologies or economic technologies; by manipulating the leaders of major world powers (blackmail, or the promise of future power); or by having the humans under its control use weapons of mass destruction to cripple the rest of humanity. Of course, no current AI systems can do any of these things. But the question we’re exploring is whether there are plausible pathways by which a highly intelligent AGI system might seize control. And the answer appears to be “‘yes.’”
While “AGI” sometimes describes an Artificial General Intelligence of mere human-level intelligence, many assume an AGI reaching that level will eventually exceed it in most relevant intellectual tasks. An AGI that displaces human civilization as the overall framework of relevance for intelligence and dominates the world can be described as an AGI singleton. This scenario carries two threats worth unpacking; the first strike instabilities generated by a mere possibility of such a takeover, and the value alignment problems resulting from a successful takeover.
AI First Strike Instabilities
An AGI singleton potentially conquering the world is also a threat of a unitary permanent military takeover, discussed before. We live in a world where multiple militaries have nuclear weapon delivery capabilities. If an AGI takeover scenario becomes credible and believed to be imminent, this expectation is itself an existential risk.
Any future plan must be constrained by our world of multiple militaries, each of which can start a very costly war. If some actor realizes that another actor, whether AI-controlling human or AI entity, will soon be capable of taking over the world, it is in their interest to destroy them first. Even if non-nuclear means were used for this, attempting to push ahead in AGI capacities pre-emptively re-creates the Cold War’s game theory. This is true even if an AGI is impossible to create, but just believed possible. Our transition from the current reality to a high-tech, high-intelligence space-based civilization must avoid this first-strike instability.
However imperfect our current system is, it is the framework by which people pursue their goals and in which they have invested interests. First-strike instability is just a special case of a more general problem; if a process threatens entrenched interests, they will oppose that process. This means the unitary AGI takeover scenario is more dangerous in more different ways than it first appeared. We must avoid it becoming a plausible possibility.
AI Value Alignment
Let’s imagine we survived the first strike instabilities, and have entered a world in which a powerful AGI singleton can shape the world according to its goals. Our future would depend on how these goals align with human interests. Eliezer Yudkowsky summarizes this challenge as “constructing superintelligences that want outcomes that are high-value, normative, beneficial for intelligent life over the long run; outcomes that are, for lack of a better short phrase, ‘good’.”1
Attempting to construct an entity of unprecedented power that reliably acts in a “high-value” manner, raises deep ethical questions about human values. In Chapter 2, we saw why value disagreements have remained unsolved amongst humans since the dawn of philosophy. Even if we could figure out what is “high-value” for humans, we have little reason to assume that it translates well to non-human descendants.
To illustrate this, it helps to remember that the felt goals that humans pursue are really a consequence of our evolutionary chain’s instrumental goals. Survival of the fittest has instrumental goals regarding how to behave in a fit manner. These became felt goals which correlated with activities corresponding to the instrumental goals. Most humans care deeply about their children, but to evolution this simply has instrumental value. If instrumental goals can grow into subjectively felt caring, this hints at the difficulty of accurately modeling the evolution of non-human intelligences’ goals. They have different cognitive architectures, grow up under different evolutionary constraints and on different substrates.
Economic cooperation relies on the division of labor. Specialization, in turn, translates into the division of knowledge and goals. A future of markets many orders of magnitude larger than today’s market comes with a rapid increase in specialized knowledge and instrumental goals. There is no reason for those instrumental goals not to evolve into felt goals. As felt goals are pursued, they create an even larger variety of instrumental goals.
Steve Omohundro suggests we may be able to model a few basic drives that any advanced intelligence will have, regardless of its final goals.2 Those drives include wanting to continue to exist and to acquire more resources which are useful prerequisites for many future goals. Nevertheless, beyond those basic drives, projecting how instrumental goals of advanced non-human intelligences grow into felt goals that align with human values is a daunting problem.
The recent explosion of DAOs is a step in this direction. They are not intelligent. But they show that, for better or worse, our civilization seems to incentivize the creation of human-incorruptible autonomous entities.3 If we cannot avoid creating non-human entities with diverse intelligences and goals, we can’t rely on having to make precise valuation assessments of those goals. Focusing on such extremely hard problems might well result in a different outcome. Technological breakthroughs may happen before we arrive at any satisfying answers, resulting in a future that ignores the goals of many human and non-human volitional entities.
Civilization & Computers: Safety in Multipolarity
Any effort toward creating an AGI singleton that is aligned with our values could also be used toward an alternative scenario. It’s difficult to see what civilization is adapted to because it is the result of a huge variety of subtle influences acting over a very long time. This tempted some to imagine that we could centrally plan something better and resulted in the painful lessons learned throughout history. It took centuries for political philosophy to advance from the question “Who should rule?” to questioning if there must be a ruler.
But we haven’t really learned the nature of that fallacy so much as that it has dangers. Now we may be tempted to think we can algorithmically aggregate people’s preferences to create an agent that gets us what we want. But on closer examination, neither effective computer systems, nor civilization’s underlying process, resemble such a central planner.
Rather than writing any code embodying all the program’s knowledge, a programmer writes separate pieces of code. Each is a specialist in some very narrow domain, embedded in a request-making architecture. We discussed how the microkernel operating system seL4 serves as a coordination device that implements simple rules that let programs which embody specialized knowledge cooperate. As modern computer systems push knowledge out to their edges, their central feature may well remain such a fixed simple rules framework.
Similar to an individual computer system, civilization is composed of networks of entities making requests of other entities. Just as seL4 coordinates across specialist computer system components, institutions coordinate across human specialists in our economy. Civilization already aligns the intelligences of human institutions with human beings. It has wrestled with the alignment problem for thousands of years. Different intelligences have tested its stability and it has largely successfully survived these tests.
It is an architectural decision to design a system that never has to come to an agreement about any one thing. We must avoid the fatal conceit that we can design in detail an intelligent system that works better than creating a framework. In a framework, an emergent intelligence composed of a variety of entities serving a variety of goals can engage in cooperative problem-solving. Each agent is constrained by the joint activity of the other agents that hold each other in check. If any single entity is a small player in a system of others pursuing other goals, it has an interest in upholding the framework that allows it to employ the goal-seeking activity of other entities.
Taking inspiration from human cooperative systems is not a new idea in AI. Back in 1988, Drexler suggested that “the examples of memes controlling memes and of institutions controlling institutions also suggest that AI systems can control AI systems.”4 In a similar spirit, Sam Altman of OpenAI stated that “Just like humans protect against Dr. Evil by the fact that most humans are good, and the collective force of humanity can contain the bad elements, we think it’s far more likely that many, many AIs, will work to stop the occasional bad actors than the idea that there is a single AI a billion times more powerful than anything else.”5
Take Inspiration from the U.S. Constitution
Some existing governmental constitutions have successfully built antifragile frameworks. The U.S. Constitution gave each government official the least power necessary to carry out the job, what can be called Principle of Least Privilege.6 In addition, it purposely put different institutions in opposition with each other via division of power, checks and balances, and significant decentralization. Decreasing speed and efficiency in favor of reducing more serious risks is a positive tradeoff. Friction is a feature, not a bug. Ordering the system so that institutions pursue conflicting ends with limited means is more realistic than building any one system that wants the right goals. Such working precedents can inspire us to build continuously renegotiated frameworks among evermore intelligent agents.
James Madison, when designing the US Constitution, is believed to have said something along the lines of: "If men were angels, no government would be necessary. If angels were to govern men, neither external nor internal controls on government would be necessary. In framing a government which is to be administered by men over men, the great difficulty lies in this: You must first enable the government to control the governed; and in the next place oblige it to control itself."
One could say that Madison regarded large-scale human institutions as superintelligences and was terrified of the value alignment problem. Civilization up to that point suggested that human activity is oppressed by superintelligences in the form of large-scale human organizations with values not aligned with human values. The Founding Fathers were faced with a singleton-like nightmare of designing a superintelligent institution composed of systems of individuals who want to take actions that society does not approve of. They felt that they had no choice but to try to create an architecture that was inherently constructed to maintain its integrity, not at being ideal but at avoiding very serious flaws.
Given that worst-case scenarios of our future are extremely negative and numerous, we would do extraordinarily well simply avoiding the worst cases. In the case of AGIs, instead of building an optimal system, we should focus on not building a system that turns into a worst-case scenario. The authors of the U.S. Constitution did not design it as an optimized utility-function to perfectly serve everyone’s interests. Their main objective was to avoid it becoming a tyranny.
Even though it was imperfect and had dangers, the Constitution succeeded well enough that most U.S. citizens have better lives today. It is extraordinary that it maintained most of its integrities for as long as it did, even one Industrial Revolution later. It is not the only one. We can start by studying the mechanisms of the federal-state balance in the UK, Switzerland, the earlier United Provinces of the Netherlands, the Holy Roman Empire, and ancient Greece’s Peloponnesian League confederation, as well as the Canadian, Australian, postwar German, and postwar Japanese constitutions.
Cooperate Across Intelligences
While we do not generally think about institutions as intelligent, their interaction within a framework of voluntary cooperation lets them more effectively pursue a great variety of goals. This increases our civilization’s intelligence. The overall composition of specialists through voluntary request-making is the great superintelligence that is rapidly increasing its effectiveness and benefits to its engaged entities.
For designing human institutions, we can rely on our knowledge of human nature and political history. With regards to AI safety, there is less precedence to work with. But just as future artificial intelligences will dwarf current intelligences, so are current intelligences dwarfing the Founding Fathers’ expectations. The US Constitution was only intended as a starting point on which later intelligences could build. We only need to preserve robust multipolarity until later intelligences can build on it. Let’s look at a few experiments pointing in promising directions.
Use Local Knowledge: Private Machine Learning
We start from today’s technology world containing centralized giants. Their resources and economies of scale let them do the required large-scale data collection for building evermore powerful AI. But today’s AI systems mainly perform services to satisfy a particular demand in bounded time with bounded resources.7 As we develop more sophisticated AIs, they may continue as separate specialized systems applying ML to different problems. Decentralized systems may have a competitive edge in solving specialized problems.8 They incentivize contributions from those closest to local knowledge instead of hunting for it top-down. By incentivizing mining, Bitcoin became the system with the most computing power in the world.9 To compensate for power centralization, we can reward specialists for cooperating toward larger problem-solving.
In "Markets and Computation," Miller and Drexler discuss how market mechanisms, like trade and price signals, can bring together diverse parties to create globally effective behavior. We already rely on human to human and human to computer interactions, but what if we extended market mechanisms to include increasingly advanced AIs? Humans could use pricing and trade systems to meet their needs and assess the success of AIs. And if an AI can evaluate its own success, it might employ humans to help solve problems requiring human knowledge. By pooling knowledge from a sea of human and computational objects, we could achieve a higher collective intelligence.
There are limitations to centralized AI systems compared to decentralized designs. Especially if those decentralized designs use cryptographic and auxiliary techniques that protect the algorithms and enable data processing without revealing the data itself.
Georgios Kaissis and co-authors introduce the terms "secure AI" and "privacy-preserving AI" to refer to methods that protect algorithms and enable data processing without revealing the data itself. They propose techniques that guarantee the integrity of the computational process and its results, and are resistant to identity or membership inference, feature/attribute re-derivation, and data theft.
One of these techniques is federated learning, a decentralized approach to machine learning that distributes copies of the algorithm to nodes where the data is stored, allowing local training while retaining data ownership. However, federated learning alone may not guarantee security and privacy, so it should be paired with other measures like differential privacy, homomorphic encryption, and secure multi-party computation. These techniques could unlock new approaches to using sensitive data on problems that require it, like health or financial data.
The beauty of these techniques is that they allow actors to collaboratively compute on data without sharing it, further promoting collaboration over competition on certain problems. They also have safety-enhancing properties, making them an exciting avenue for computation involving large datasets while maintaining privacy. While these methods all have their technical limitations, their potential for enabling novel approaches to computation is worth exploring.
In terms of the potential of cryptographic approaches for AI safety, in "Building Safe AI," Andrew Trask discusses the potential of using cryptographic approaches for AI safety. Specifically, he proposes using homomorphic encryption for safely training AI on data sets belonging to different parties.10 The data is encrypted before going to the computing device, computations are performed on encrypted data, and only the encrypted results are sent back and decrypted at the source. Since the computing device doesn’t have the decryption key, no personal information can be extracted.
Imagine Alice can encrypt her neural network and first send it to Bob with a public key so he can train it on his data. Upon receiving the network back, Alice decrypts and re-encrypts it. She then sends it to Carol with a different key to use it on her data. Alice shares the computed result while retaining control over her algorithm’s IP. Bob and Carol can benefit from the result while controlling their own data.
This technique has two valuable properties. First, it safeguards the intelligence of the network against theft, which means valuable AI can be trained even in insecure environments. In the real world this means, individuals and companies can cooperate using each other’s algorithms and data without risking their intelligence being stolen. Local nodes have data sovereignty, and the AI itself can’t link the data to the real-world without the secret key.
Second, the network can only generate encrypted predictions that cannot impact the outside world without a secret key. This creates a useful power imbalance between humans and AI. If the AI is homomorphically encrypted, then it perceives the outside world as homomorphically encrypted too. In other words, this technique enables the human who controls the secret key to unlock the AI itself or only individual predictions that the AI makes.
For now, large-scale application of this type of homomorphic computation would be prohibitively expensive, but it has the potential to create secure multipolar computing environments.
Extend Principal Agent Alignment: The Blockchain
Earlier we defined civilization as consisting of networks of entities making requests of other entities. Requests may involve human-to-human interactions, human-to-computer interactions, and computer-to-computer interactions. As we move into an ecology of more advanced AIs, designing robust mechanisms across them will be key. The specifics of this transition will depend on the technologies available at the time. But we can learn from a few tools that are already at our disposal.
In a principal-agent relationship, a principal (a human or computational entity) sends a request to an agent. To align the agent’s decisions with its interests, the principal uses several techniques, including selecting an agent, inspecting its internals, allowing certain actions, explaining the request, rewarding cooperation, and monitoring the effects.
When designing principal-agent arrangements, by combining techniques across both rows and columns, some techniques’ strengths can make up for others’ weaknesses. Computer security (Allow actions) alone misses some differences among agent actions that harm the principal, such as when the agent benefits from misbehavior (Reward cooperation). This requires more than a security analysis. We also need to analyze the attacker’s incentives (reward cooperation). From individually breakable parts, we create arrangements with greatly increased structural strength.11
Voluntary cooperation is a good candidate for guiding interaction among increasingly intelligent entities. It already guides simple computer systems’ interactions, and these are at least as different from humans as we are from our potential cognitive descendants. Interactions, whether human-to-human or computer object-to-object, need means to serve the participants’ goals. A voluntary framework fulfills this purpose. It is the base for building increasingly capable systems that are aligned with its participants. This analysis of today’s human and computational entities could be sufficiently independent of the entity’s intelligence to be extendable to advanced AI systems.
As AIs get more sophisticated, we will need to extend our principal agent tool-box. For human cooperation, a completely specified contract could, in theory, perfectly implement the desired behavior of all parties.12 In reality, humans can’t evaluate all optimal actions in all possible states of the world that the contract unfolds in without incurring prohibitive costs when drafting the contract. Instead, real-world contracting is often supported by external informal structures, such as culture, that provide the implied terms in the contract to fill the gaps when necessary. Refusing to hire someone who is judged to have breached a contract, is a powerful cultural technology.
Such cultural technology is enabled by our Internal Spectator, described earlier, that allows us to model how other humans will react to us taking certain actions. This cognitive architecture can predict the social penalty we will incur, and initiate emotions such as shame that make us retreat from a potential rule violation. We are not pure Homo Economicus game theoretic optimizers but are instead guided by a strong sense of evolved norms and ethics. Human signaling behaviors work because we have bounded abilities to fake our true motivations; our emotions show through in many cases.
It will be difficult to build artificial agents with a cognitive architecture that can internalize the costs associated with actions we regard as wrong. Today’s AI systems can already deceive humans and future artificially intelligent agents may develop trickery we have not evolved to detect. This pure Homo Economicus paradigm with unbounded ability to fake is frightening since those bounds on humans account for much of our civilization’s stability and productiveness. Potential traps of a future dominated by pure game theoretic optimizers are terrifying.13
We need more sophisticated tools to cooperate with intelligences whose cooperation style we have not evolved to parse. Blockchain, by virtue of its information transparency and incorruptibility, allows credibility that relevant aspects of an entity only operate according to the visible program. This means that an artificial entity running on a blockchain can be credibly transparent with its internal workings verifiable by anyone. Hopefully, very few interactions need to be transparently preprogrammed. However, some interactions’ pathologies may best be solved by moving them onto inspectable public blockchains. We can divide computation into those encapsulated interfaces with 100% ability to fake and the transparent part on a public blockchain with zero ability to fake.
Mixed arrangements may replicate some stabilizing aspects from the human world such as norms, signaling, and bounded ability to fake. Whatever the specifics, the crucial aspect is that we are designing an integrated network spanning multiple systems. This means that any checks and balances must account for human-to-human, human-to-AI, and AI-to-AI interactions.
Human-machine corporations can come with upsides: Once AIs can fully participate in human social interactions, a possible cooperation pattern is a human and advisory AI agent combining their strengths when interacting with others.14 We may not be able to avoid future superintelligent equivalents of YouTube recommendation systems that want to lead us into unwanted misleading information holes. On the other hand, our own AI assistants may help us discover cases with contradictory evidence and help us better evaluate them even when other media tries to mislead us.
A Superintelligent Human AI Ecology
Gradually, increasingly intelligent AIs will automate most human tasks.15 Since AIs themselves are based on R&D, consisting of automatable narrow technical tasks, much of the path to advanced AI may be automatable. Gradually we may develop, as Eric Drexler says, “comprehensively superintelligent systems that can work with people, and interact with people in many different ways to provide the service of developing new services.”
Rather than one monolithic agent recursively improving, this process can be more like a technology base becoming rapidly better at producing specialized services. Many of today’s market strategies, including prices and trading, that encourage adaptive modification based on local knowledge, could work for complex computing as well.16 In a human-computer economy, computational entities sell their services, others benefit from it, creating evermore complex entities to cooperate with.
The complex problems that large cooperative networks of maturing human societies can solve now would be impossible for small hunter-gatherer villages. Just as human intelligence is often judged by the ability to achieve goals set by an intelligence test, one could measure a civilization’s intelligence by its ability to achieve a range of goals set using resources provided for this purpose.17
The Agoric Papers suggest such an intelligence test as thought experiment: “One can imagine putting a person or an ecosystem in a box and then presenting problems and contingent rewards through a window in the box. A box full of algae and fish will ‘solve’ a certain narrow set of problems (such as converting light into chemical energy) and will typically pay little attention to the reward. A box containing an intelligent person will solve a different, broader range of problems. A box containing, say, an industrial civilization (with access to algae, fish and Bell Labs) will solve a vastly greater range of problems. This ability to solve externally posed problems can be taken as a measure of that ecosystem’s ‘intelligence’.18
We are now setting the stage for the knowledge future players can use to deal with their problems. Without increasing our civilization’s problem-solving ability, we have no long-term future. Our success will be largely determined by how we allow cooperation among a range of intelligences on the problems ahead.
Just like today’s economy includes corporations, not just humans as players, future ecologies may contain more esoteric human AI symbiots. But as long as players can hold each other in check, technologies may continue to emerge gradually, with a diversity of intelligent entities, some human, some artificial, improving their problem solving by cooperating. If we can keep the instrumental value of an AGI singleton with universal capabilities low compared to an ecosystem of cooperating specialists, we can avoid a unitary take-over.
As machines become more intelligent, the intelligence they contribute to civilization may soon be more than the human contribution. This is good news because we will need all the intelligence we can get for future rounds of play. But the framework of relevance can remain the expanding superintelligence of civilization composed of a diversity of cooperating intelligences, rather than one unitary AGI. At least until those future intelligences can invent new solutions for retaining the balance.
Revisit AI Threats
Extending civilization’s cooperative fabric to include AIs in a decentralized manner is a tall order. But, if successful, this approach can account for both of the threats that we started this chapter with; the threat of first strike instabilities and the threat of a misaligned AGI singleton.
Transform First Strike Instabilities into Peaceful Competition
An arms race is very explicitly about threats of violence. In an arms race, both sides suffer tremendous costs but at most one side wins. Until the arms race finishes, everyone keeps paying for its next increment to prevent the other side from winning. The costs can be much higher than the amount won, so even the winner can be in a state of miserable destruction. Nevertheless, all parties are pressured to match the other sides because the arms themselves are a threat of violence. You can't simply decide not to play.
An AI arms race has to be separated into the competition for intelligence and its violent deployment. The true danger is not an AGI itself, but the mechanisms it - or its owners - could deploy to harm vulnerable entities. We are physically and digitally vulnerable. In Chapter 6, we proposed an active shield of mutually watching watchers to decrease physical vulnerability. In Chapter 8, we proposed computer security that is independent of the intelligence of the attacker to decrease cyber vulnerabilities. As long as intelligence is not attached to physical actuators, our main concern should be cyber security. If it is augmented with actuators, they must be positioned to keep each other in check.
The hidden threat of involuntary interaction, with its potential for unitary strategic takeovers, is what is dangerous. The more we decentralize intelligent systems of voluntary interactions, the better we will be at avoiding such a takeover. If a single entity grows sufficiently large that the rest of the world is not much bigger, the Schelling Point of voluntarism can be destroyed. In a world in which each entity is only a small part, voluntarism will be a general precedent that different players mutually expect in the attempt to pursue those goals.
Until a military perspective gets introduced into the AI narrative, the dynamic is better described as a concern to stay ahead in economic competition. In a world of voluntary cooperative interaction without a hidden threat of involuntary interaction, we can all benefit from creating improved AI services via the market because human beings would no longer be a bottleneck on productive activity. More on that in Chapter 9.
Transform AI Value Alignment into Paretotropism
Let’s revisit the second threat; mis-aligned AI values. As long as future intelligences pursue their goals in a voluntary multipolar world, we needn’t worry about their goal structure. AI drives of acquiring more resources or seeking to stay alive are not problematic when they can only be achieved via voluntary cooperation. They become problematic when involuntary actions are possible. Can we do better than mere voluntary co-existence?
When two separately evolved sources of adaptive complexity come into contact, both may realize they can gain from their differences by cooperating to unlock the positive sum composition of their differing complexity. We certainly believe our lives are richer because of the richness of animals’ non-humanness. They are interesting by being a source of complexity. The atrocities committed against non-human animals is a result of the lack of voluntary architectures that frame interactions across species. Unlike non-human animals, we have a chance at putting voluntary boundaries in place with respect to future intelligences of our own making.
Civilization’s growth of knowledge and wealth results from human cooperation unlocking more knowledge in pursuit of more goals. What could we possibly hope to contribute to a human AI exchange? AIs may eventually outcompete us on everything. But even if they are more advanced than us, they might still better serve their goals by cooperating with us if our complexity contributes at all to those goals. Even if you excel at everything, you can still benefit from specializing in one thing and trading with others who have a comparative advantage at different things. If we benefit from having deeper chains of specialists to cooperate with, the growth in adaptive complexity may well lead to continued cooperation.
We ended Chapter 2 with the observation that civilization is a superintelligence aligned with human interests. It’s getting rapidly more intelligent and its interactions increasingly benefit without harming.19 By introducing artificial intelligences into this dynamic we may be able to further steepen our Paretotropian ascent.
Chapter Summary
In this chapter, we foreshadowed how an increasingly intelligent game could be increasingly beneficial for its players. The fear of an intelligent takeover by an AGI can be divided into the threat of first strike instabilities on the path and that of a successful takeover by an AGI singleton. The better we get at incorporating AI into our multipolar voluntary cooperative architecture in a decentralized manner, the better we can avoid both scenarios. Where will this future lead? Let’s find out in the final chapter.
