Provably Safe Artificial General Intelligence via Interactive Proofs

Kristen Carlson

Download from

dx.doi.org

More download options

Provably Safe Artificial General Intelligence via Interactive Proofs

Kristen Carlson

Philosophies 6 (4):83 (2021) Copy BIBT_EX

Abstract

Methods are currently lacking to _prove_ artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation _AGI 1 _ rapidly triggers a succession of more powerful _AGI n _ that differ dramatically in their computational capabilities (_AGI n _ _n_+1 ). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2 −100 ). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. _In toto_, IPS provides a way to reduce _AGI n _ ↔ _AGI_ _n_+1 interaction hazards to an acceptably low level.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Keywords

AGI AI containment AI safety AI value alignment interactive proof systems multiple-prover systems artificial general intelligence

Reprint years

DOI

10.3390/philosophies6040083

Other Versions

No versions found

My notes

Analytics

Added to PP
2022-01-05

Downloads
22 (#982,541)

6 months
15 (#215,221)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

Reward is enough.David Silver, Satinder Singh, Doina Precup & Richard S. Sutton - 2021 - Artificial Intelligence 299 (C):103535.

An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis.James D. Miller, Roman Yampolskiy & Olle Häggström - 2020 - Philosophies 5 (4):40.

Understanding and Avoiding AI Failures: A Practical Guide.Robert Williams & Roman Yampolskiy - 2019 - Philosophies 6 (3):53.

Transfinite ordinals in recursive number theory.R. L. Goodstein - 1947 - Journal of Symbolic Logic 12 (4):123-129.

Autonomous technology and the greater human good.Steve Omohundro - 2014 - Journal of Experimental and Theoretical Artificial Intelligence 26 (3):303-315.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Provably Safe Artificial General Intelligence via Interactive Proofs

Abstract

Categories

Keywords

Reprint years

DOI

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work