Training Data Attribution: Examining Its Adoption & Use Cases

By Deric Cheng, Convergence Analysis, Justin Bullock, David_Kristoffersson @ 2025-01-22T15:40 (+18)

This is a linkpost to https://www.convergenceanalysis.org/research/training-data-attribution-tda-examining-its-adoption-use-cases

Note: This report was conducted in June 2024 and is based on research originally commissioned by the Future of Life Foundation (FLF). The views and opinions expressed in this document are those of the authors and do not represent the positions of FLF.

This report investigates Training Data Attribution (TDA) and its potential importance to and tractability for reducing extreme risks from AI. TDA techniques aim to identify training data points that are especially influential on the behavior of specific model outputs. They are motivated by the question: how would the model's behavior change if one or more data points were removed from or added to the training dataset? 

Report structure:

Key takeaways from our report:


SummaryBot @ 2025-01-22T18:21 (+1)

Executive summary: Training Data Attribution (TDA) is a promising but underdeveloped tool for improving AI interpretability, safety, and efficiency, though its public adoption faces significant barriers due to AI labs' reluctance to share training data.

Key points:

  1. TDA identifies influential training data points to understand their impact on model behavior, with gradient-based methods currently the most practical approach.
  2. Running TDA on large-scale models is now feasible but remains untested on frontier models, with efficiency improvements expected within 2-5 years.
  3. Key benefits of TDA for AI research include mitigating hallucinations, improving data selection, enhancing interpretability, and reducing model size.
  4. Public access to TDA tooling is hindered by AI labs’ desire to protect proprietary training data, avoid legal liabilities, and maintain competitive advantages.
  5. Governments are unlikely to mandate public access to training data, but selective TDA inference or alternative data-sharing mechanisms might mitigate privacy concerns.
  6. TDA’s greatest potential lies in improving AI technical safety and alignment, though it may also accelerate capabilities research, potentially increasing large-scale risks.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.