Provably Safe Artificial General Intelligence via Interactive Proofs

Philosophies 6 (4):83 (2021)
  Copy   BIBTEX

Abstract

Methods are currently lacking to _prove_ artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation _AGI 1 _ rapidly triggers a succession of more powerful _AGI n _ that differ dramatically in their computational capabilities (_AGI n _ _n_+1 ). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2 −100 ). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. _In toto_, IPS provides a way to reduce _AGI n _ ↔ _AGI_ _n_+1 interaction hazards to an acceptably low level.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,795

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Analytics

Added to PP
2022-01-05

Downloads
22 (#982,541)

6 months
15 (#215,221)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

Reward is enough.David Silver, Satinder Singh, Doina Precup & Richard S. Sutton - 2021 - Artificial Intelligence 299 (C):103535.
Transfinite ordinals in recursive number theory.R. L. Goodstein - 1947 - Journal of Symbolic Logic 12 (4):123-129.
Autonomous technology and the greater human good.Steve Omohundro - 2014 - Journal of Experimental and Theoretical Artificial Intelligence 26 (3):303-315.

Add more references