Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)

By Joe_Carlsmith @ 2023-11-27T18:01 (+11)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
SummaryBot @ 2023-11-28T12:57 (+1)

Executive summary: This section distinguishes between two concepts of an "episode" in machine learning training - the intuitive episode and the incentivized episode. The intuitive episode is a natural unit of training (e.g. a game), while the incentivized episode is the period of time over which training directly pressures the model to optimize.

Key points:

  1. The incentivized episode is the period over which training actively punishes the model for not optimizing. It may be shorter than the full training period.
  2. The intuitive episode is a natural unit picked for training (e.g. a game). It is not necessarily the same as the incentivized episode.
  3. Care is needed in assessing if the intuitive episode matches the incentivized episode, i.e. if training incentivizes cross-episode optimization.
  4. Some training methods directly pressure cross-episode optimization, others don't. Details of training algorithms matter.
  5. Conflating the two concepts can lead to inappropriate assumptions about incentivized time horizons. Empirical testing is important.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.