Eighteen Open Research Questions for Governing Advanced AI Systems
By Ihor Ivliev @ 2025-05-03T19:00 (+2)
Forthcoming AI systems, ranging from large agentic tool chains to potential Artificial General Intelligence, may exhibit continuous learning, self-modification, large-scale coordination, and strategic behaviour. Conventional governance frameworks often emphasize static models and narrow risk classes; these assumptions may not hold for advanced systems. This note presents eighteen research questions addressing key theoretical and engineering challenges identified through a synthesis process. Each question includes an illustrative near-term research goal. This proposed agenda is provisional and offered to stimulate critique, refinement, and cross-disciplinary collaboration.
Frontier AI systems are expected to:
- operate with limited observability (opaque internal reasoning);
- modify objectives or code;
- interact as large multi‑agent collectives;
- influence critical infrastructure at machine speed.
These features strain current audit and certification practices. Addressing the potential governance gap likely requires advances in areas such as formal specification, sensing and control, institutional design, and social legitimacy. The eighteen questions presented below outline one possible research agenda targeting these foundational challenges.
The following eighteen questions represent a synthesized research agenda targeting foundational challenges (some of them) for the governance of advanced AI systems:
A. Value specification and intrinsic safety (Questions 1‑4)
Question 1: How can complex, evolving, pluralistic human values, ethics, norms, and intentions be formally represented (beyond simple utility functions) and effectively elicited from diverse human populations (across cultures, generations, ensuring minority representation), accounting for context, conflict, and underspecification, resulting in representations suitable for machine learning and verification?
Disciplines: Formal Methods, Knowledge Representation, Computational Social Choice, Ethics, HCI, Anthropology.
Potential Deliverable: Survey‑to‑DSL compiler pilot.
Question 2: How can governance frameworks formally represent deep, conflicting value uncertainties (e.g., via probability distributions or ambiguity sets over ethical theories/principles), and implement mechanisms (e.g., based on Bayesian updating, information-theoretic measures) for AI systems to reason and act prudently under this normative uncertainty?
Disciplines: Decision Theory, Bayesian Statistics, Formal Epistemology, Ethics, AI Alignment.
Potential Deliverable: Working value‑update schema & reasoning engine prototype.
Question 3: How can goal systems, utility functions, or objective specifications for advanced AI be designed using a formal “goal algebra” (e.g., ⊕, ⊗ operators over utility components) or operator set (e.g., bounded‑impact operators, satisficing thresholds, or quantilisation rules) that provably guarantees robustness against catastrophic instrumental convergence on undesirable sub-goals (e.g., power-seeking) and minimizes unforeseen negative externalities across diverse contexts?
Disciplines: AI Alignment Theory, Utility Theory, Formal Methods, Logic, Decision Theory.
Potential Deliverable: Proof of safety against power-seeking for specific operators in a grid‑world or equivalent formal environment.
Question 4: (Phase A) What specific architectural design patterns (e.g., those enforcing strict energy/compute bounds on actuation) can provide intrinsic safety guarantees against certain catastrophic physical harms? (Phase B) What broader fundamental computational properties, logical constraints, or architectural designs provide intrinsic safety guarantees against diverse failure modes by construction, reducing reliance on external monitoring or alignment? Examples of candidate Phase B properties include monotonic resource ceilings and formally verified actuation whitelists.
Disciplines: Computer Architecture, Formal Methods, Security Engineering, AI Safety, Control Theory, Ethics.
Potential Deliverable (Phase A): Demo pattern that forbids kinetic harm via energy budget.
B. Adaptive control and recovery (Questions 5‑8)
Question 5: How can a self-modifying governance protocol (e.g., implemented via a typed language or reflective interpreter) provably guarantee termination, persistent alignment with a protected core principle set, bounded resource usage (e.g., polynomial-time), and operation within defined latency budgets after any finite sequence of rule edits, ensuring safe adaptation?
Disciplines: Formal Methods, Programming Language Theory, Type Theory, Control Theory, AI Safety.
Potential Deliverable: Typed “constitution kernel” + proof‑carrying patch language.
Question 6: Which objective, quantitative thresholds (informed by the early‑warning metrics identified in Question 9) should automatically trigger adaptive shifts in governance mode (e.g., from verification‑centric to dynamics‑based control), and how can these thresholds be validated, for example via digital‑twin rehearsal?
Disciplines: Control Theory, Statistical Process Control, Org. Theory, Safety Engineering, Simulation Science.
Potential Deliverable: Twin‑validated threshold library + rehearsal playbook.
Question 7: What mechanisms, potentially using cryptographic state attestations and reversible update logs, enable rapid recovery from AI failures by securely rolling back system state (including model weights and memory) to a previously validated known-good snapshot, possibly including provable sanitation of hazardous information?
Disciplines: Systems Engineering, Cryptography, AI Safety, ML Operations, Database Theory.
Potential Deliverable: Reversible LLM fine‑tuner demonstrating snapshot/rollback.
Question 8: How can oversight mechanisms (e.g., auditing protocols, monitoring systems) be designed under conditions of information asymmetry to remain effective and maintain authority while meeting required oversight latency constraints, even when the governed AI system actively models, anticipates, and potentially counter-optimizes against the oversight process itself? Can we design interaction games (e.g., using randomization) with provable regret bounds against such strategic agents? The effectiveness of such oversight would be strengthened by the verifiable explanation certificates sought in Question 14.
Disciplines: Game Theory, Mechanism Design, Control Theory, AI Safety, Security, Computational Complexity.
Potential Deliverable: Regret‑bounded randomized audit prototype tested against a simple strategic agent model.
C. Monitoring and containment (Questions 9‑12)
Question 9: Which concrete, externally observable, potentially non-epistemic metrics (validated via intervention-based causal discovery, considering latency constraints and specific domain risks like biorisk) serve as reliable causal precursors to misalignment, instability, or critical failures in opaque AGI/Swarm or cyber-physical systems?
Disciplines: Causal Inference, Empirical AI Safety, Cyber-Physical Systems, Statistics, Signal Processing, Control Theory.
Potential Deliverable: Benchmarked precursor metrics (latency‑aware, bio‑risk) validated in three or more domains.
Question 10: Can we devise formally checkable “coarse-graining” operators (drawing on techniques like renormalization, hierarchical Bayesian inference) that not only prove aggregate macro-dynamics emerge from micro-states with bounded uncertainty, but also guarantee that core ethical constraints or value properties are preserved (“renormalized”) across these scales? (Initial research could explore feasibility on simplified systems, e.g., cellular automata or basic swarm models.)
Disciplines: Complexity Science, Statistical Physics, Formal Methods, Bayesian Inference, Ethics, AI Alignment.
Potential Deliverable: 1) Checker on a cellular‑automaton model; 2) extension to a toy swarm.
Question 11: What theoretical guarantees (e.g., from graph control theory, percolation theory) and resilient intervention strategies can ensure that distributed, potentially adversarial agentic swarms can be reliably contained or quarantined with extremely high probability (e.g., ≥ 99.999%), even if individual agents attempt evasion or the network topology changes?
Disciplines: Control Theory (Distributed), Graph Theory, Formal Methods, AI Safety (Swarms), Network Security.
Potential Deliverable: 1) Analytical containment bound and 1 000‑agent simulation at 99.9 % reliability; 2) scale‑up to 10 k agents targeting 99.999 %.
Question 12: Can we build and empirically validate modelling suites that forecast key dynamics in the global AI development ecosystem (capability races, compute concentration, talent flows) over five‑year horizons, and establish public benchmarks for forecast accuracy? The suite should incorporate “compute landlord” indicators such as GPU market share and cloud‑capacity allocation, with publicly auditable data sources.
Disciplines: Complex Systems Modeling, Economics (Innovation/Growth), Data Science, AI Forecasting, Geopolitics.
Potential Deliverable: Public leaderboard comparing three or more models on a five year forecast accuracy for capability races and compute concentration trends.
D. Infrastructure, explanation, and coordination (Questions 13‑16)
Question 13: What is the smallest self-contained record (provenance packet) that captures essential epistemic metadata (provenance, justification links, uncertainty, lifecycle/governance status) for safety-critical data with minimal storage overhead (e.g., a single-digit percentage relative to payload size)? How can this schema be integrated with hardened governance infrastructures (logs, audit trails) using techniques like zero-knowledge proofs and cryptographic commitments to ensure tamper-evidence and resilience against adversarial attacks (incorporating continuous red-teaming and potentially “cognitive firewalls” for reviewers)?
Disciplines: Data Provenance Engineering, Information Theory, Schema Design, Cryptography (ZKPs), Security Engineering, Formal Verification, HCI.
Potential Deliverable: Prototype CBOR schema + Merkle‑anchored log with ZK‑proof audit path, red‑team latency test.
Question 14: Can we develop scalable mechanisms (e.g., based on zero-knowledge proofs, typed logic, or other formal methods) that allow an AI system to provide a cryptographic certificate attesting that a given explanation for its output is actually faithful and deception-resistant to its internal reasoning process, enabling verifiable transparency?
Disciplines: Cryptography (ZKPs), Formal Methods, Explainable AI (XAI), Logic, HCI, AI Safety.
Potential Deliverable: First ZK‑explanation certificate implementation for a non‑trivial model/task.
Question 15: What governance update workflows (potentially leveraging computational social choice or deliberative methods integrated with AI tools) can provably meet specified consent, representation, and fairness thresholds for incorporating diverse stakeholder feedback and evolving societal values into AI governance rules, robustly under potential adversarial conditions?
Disciplines: Political Science, Computational Social Choice, HCI, Mechanism Design, Ethics, Public Policy.
Potential Deliverable: Consent‑threshold voting/deliberation test‑bed integrated with a rule-proposal system.
Question 16: What hybrid technical-institutional mechanisms (e.g., smart contract-based treaties, automated escrow / slashing systems for safety violations, reputation markets, liability frameworks) can establish minimal, enforceable safety commitments and align incentives across competing global actors (nations, corporations, laboratories), including hardware‑attested compute escrow or metering where appropriate, ensuring verifiable compliance and adaptive amendment rules? (Consider, for example, protocols for verifying adherence to compute usage thresholds via hardware attestation.)
Disciplines: Mechanism Design, Game Theory, International Relations, Law, Cryptoeconomics, Formal Methods, Political Economy.
Potential Deliverable: Sandbox treaty template + slashing market simulation tested between two virtual jurisdictions.
Block E: Methods & Societal Safeguards (Questions 17–18)
Question 17: What formal evidence standards, including criteria for falsifiability, replicability tiers, uncertainty-reporting protocols, and adversarial peer-review, are required so that theoretical and empirical claims in AI-governance research accumulate reliable, actionable knowledge - especially when they concern novel systems or long-range projections?
Disciplines: Philosophy of Science, Statistics, Epistemology, Experimental Design, Research Methodology, Formal Methods.
Potential Deliverable: Draft and pilot a minimum “Governance Evidence Protocol” (GEP) checklist across three diverse AI-governance studies.
Question 18: What types of adaptive policy levers - e.g., wage-insurance triggers linked to automation rates, AI-driven rapid-retraining credits, or fiscal stabilizers like AI taxation schemes - and associated real-time monitoring metrics should be developed and validated to demonstrably mitigate large-scale labor-market shocks caused by task-displacing AI, while remaining politically feasible and robust?
Disciplines: Economics (Labor, Macro, Public), Political Science, Computational Social Science, Policy Analysis, Complex Systems Modeling, AI Ethics.
Potential Deliverable: Build an agent-based labor-shift model calibrated to historical automation waves, demonstrating the comparative effectiveness of ≥ 3 adaptive policy levers under simulated AI deployment scenarios incorporating potential energy/resource shocks.
The questions map to an abstract control loop:
- Specify values and objectives (Q1‑4).
- Monitor system state (Q9‑11).
- Steer or rollback (Q5‑8).
- Record, verify, and coordinate (Q12‑16). Supporting these core loops are foundational methods (Q17) and societal safeguards (Q18).
Understanding these dependencies aids research coordination, though significant parallel progress appears feasible.
This agenda is necessarily provisional and incomplete. Identified limitations include:
- Geopolitical barriers to global coordination (relevant to Q16).
- Implicit handling of resource/environmental externalities.
- The significant gap between foundational concepts and deployed systems.
- Ensuring governance mechanism latency is adequate for AI speed.
- Addressing power concentration related to compute infrastructure.
- The certainty of unforeseen "unknown unknowns" with AGI/emergence.
Many questions (6, 7, 8, 11, 12, 16, 18) require stress-testing and comparative evaluation in realistic yet safe environments. An open-source, modular AI-governance simulation platform - with plug-ins for diverse agent behaviours, governance protocols, institutional mechanisms, and socio-economic models - should be treated as critical shared infrastructure. Developing such a test-bed is a significant engineering and community effort that will enable reproducible research across the agenda.