Machines learning values

Steve Petersen

Machines learning values

In S. Matthew Liao (ed.), Ethics of Artificial Intelligence. Oxford University Press (2020) Copy BIBT_EX

Abstract

Whether it would take one decade or several centuries, many agree that it is possible to create a *superintelligence*---an artificial intelligence with a godlike ability to achieve its goals. And many who have reflected carefully on this fact agree that our best hope for a "friendly" superintelligence is to design it to *learn* values like ours, since our values are too complex to program or hardwire explicitly. But the value learning approach to AI safety faces three particularly philosophical puzzles: first, it is unclear how any intelligent system could learn its final values, since to judge one supposedly "final" value against another seems to require a further background standard for judging. Second, it is unclear how to determine the content of a system's values based on its physical or computational structure. Finally, there is the distinctly ethical question of which values we should best aim for the system to learn. I outline a potential answer to these interrelated puzzles, centering on a "miktotelic" proposal for blending a complex, learnable final value out of many simpler ones.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Edit

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Author's Profile

Steve Petersen

Niagara University

Keywords

ethics of ai value alignment ai safety superintelligence ethical rationalism specificationism coherence

Reprint years

Other Versions

No versions found

My notes

Analytics

Added to PP
2020-06-19

Downloads
858 (#26,673)

6 months
169 (#22,959)

Historical graph of downloads

How can I increase my downloads?

Author's Profile

Steve Petersen

Niagara University

Citations of this work

Moral disagreement and artificial intelligence.Pamela Robinson - 2024 - AI and Society 39 (5):2425-2438.

Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective.Tom Everitt, Marcus Hutter, Ramana Kumar & Victoria Krakovna - 2021 - Synthese 198 (Suppl 27):6435-6467.

Add more citations

References found in this work

Anarchy, State, and Utopia.Robert Nozick - 1974 - New York: Basic Books.

A Theory of Content and Other Essays.Jerry A. Fodor - 1990 - MIT Press.

Language, Thought, and Other Biological Categories.Ruth Millikan - 1984 - Behaviorism 14 (1):51-56.

Superintelligence: paths, dangers, strategies.Nick Bostrom (ed.) - 2003 - Oxford University Press.

Computational Philosophy of Science.Paul Thagard - 1988 - MIT Press.

View all 20 references / Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Machines learning values

Abstract

Author's Profile

Categories

Keywords

Reprint years

Other Versions

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Author's Profile

Citations of this work

References found in this work