Categories

Academia (6)Actors (5)Adversarial Training (7)Agency (6)Agent Foundations (20)AGI (19)AGI Fire Alarm (3)AI Boxing (2)AI Takeoff (8)AI Takeover (6)Alignment (5)Alignment Proposals (10)Alignment Targets (4)Anthropic (1)ARC (3)Autonomous Weapons (1)Awareness (6)Benefits (2)Brain-based AI (3)Brain-computer Interfaces (1)CAIS (2)Capabilities (20)Careers (14)Catastrophe (29)CHAI (1)CLR (1)Cognition (5)Cognitive Superpowers (9)Coherent Extrapolated Volition (2)Collaboration (6)Community (10)Comprehensive AI Services (1)Compute (9)Consciousness (5)Content (2)Contributing (29)Control Problem (7)Corrigibility (8)Deception (5)Deceptive Alignment (8)Decision Theory (5)DeepMind (4)Definitions (86)Difficulty of Alignment (8)Do What I Mean (2)ELK (3)Emotions (1)Ethics (7)Eutopia (5)Existential Risk (29)Failure Modes (13)FAR AI (1)Forecasting (7)Funding (10)Game Theory (1)Goal Misgeneralization (13)Goodhart's Law (3)Governance (25)Government (3)GPT (3)Hedonium (1)Human Level AI (5)Human Values (11)Inner Alignment (10)Instrumental Convergence (5)Intelligence (15)Intelligence Explosion (7)International (3)Interpretability (17)Inverse Reinforcement Learning (1)Language Models (13)Literature (4)Living document (2)Machine Learning (20)Maximizers (1)Mentorship (8)Mesa-optimization (6)MIRI (2)Misuse (4)Multipolar (4)Narrow AI (4)Objections (60)Open AI (2)Open Problem (4)Optimization (4)Organizations (15)Orthogonality Thesis (3)Other Concerns (8)Outcomes (5)Outer Alignment (14)Outreach (5)People (4)Philosophy (5)Pivotal Act (1)Plausibility (7)Power Seeking (5)Productivity (6)Prosaic Alignment (7)Quantilizers (2)Race Dynamics (6)Ray Kurzweil (1)Recursive Self-improvement (6)Regulation (3)Reinforcement Learning (13)Research Agendas (26)Research Assistants (1)Resources (19)Robots (7)S-risk (6)Sam Bowman (1)Scaling Laws (6)Selection Theorems (1)Singleton (3)Specification Gaming (10)Study (13)Superintelligence (34)Technological Unemployment (1)Technology (3)Timelines (14)Tool AI (2)Transformative AI (4)Transhumanism (2)Types of AI (2)Utility Functions (3)Value Learning (5)What About (9)Whole Brain Emulation (6)Why Not Just (15)

Definitions

86 pages tagged "Definitions"
What are the different versions of decision theory?
What are the different possible AI takeoff speeds?
What are the differences between AGI, transformative AI, and superintelligence?
What are "pivotal acts"?
What are "mesa-optimizers"?
What are large language models?
What are brain-computer interfaces?
What are scaling laws?
What are "human values"?
What is Infra-Bayesianism?
What is intelligence?
What is an agent?
What is the orthogonality thesis?
What is the "long reflection"?
What is neural network modularity?
What is "AI takeoff"?
What is "causal decision theory"?
What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?
What are astronomical suffering risks (s-risks)?
What is an intelligence explosion?
What is a "value handshake"?
What is a "quantilizer"?
What is Goodhart's law?
What is AI Safety via Debate?
What is "whole brain emulation"?
What is "transformative AI"?
What is "superintelligence"?
What is "narrow AI"?
What is "metaphilosophy" and how does it relate to AI safety?
What is "hedonium"?
What is "functional decision theory"?
What is "evidential decision theory"?
What is "coherent extrapolated volition (CEV)"?
What is "biological cognitive enhancement"?
What is "agent foundations"?
What is "HCH"?
What is "Do what I mean"?
What is corrigibility?
What are “type signatures”?
What is instrumental convergence?
What is Iterated Distillation and Amplification (IDA)?
What are existential risks (x-risks)?
What is prosaic alignment?
What is reinforcement learning (RL)?
What is behavioral cloning?
What is imitation learning?
What is an alignment tax?
What are the differences between subagents and mesa-optimizers?
What are the "no free lunch" theorems?
What is an optimizer?
What is perverse instantiation?
What is deceptive alignment?
What is artificial intelligence (AI)?
What is mutual information?
What is feature visualization?
What are the differences between a singularity, an intelligence explosion, and a hard takeoff?
What are polysemantic neurons?
What is AIXI?
What is a shoggoth?
What is inner alignment?
What is tool AI?
What is a subagent?
What is "jailbreaking" a large language model (LLM)?
What is reward hacking?
What is mindcrime?
What is outer alignment?
What is a singleton?
What are the power-seeking theorems?
What is compute?
What is adversarial training?
What is the "Bitter Lesson"?
What is the difference between verifiability, interpretability, transparency, and explainability?
What is a “treacherous turn”?
How can LLMs be understood as “simulators”?
What is a "polytope" in a neural network?
What is an “AGI fire alarm”?
What is Vingean uncertainty?
What is the "sharp left turn"?
What is the Church-Turing thesis?
What is compute governance?
What are AI timelines?
What is discovering latent knowledge (DLK)?
What is a loss landscape?
What is meta-RL?
What is Moravec’s paradox?
What is zero-shot prompting?

AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.