Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)
By Joe_Carlsmith @ 2023-11-27T18:01 (+11)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullSummaryBot @ 2023-11-28T12:57 (+1)
Executive summary: This section distinguishes between two concepts of an "episode" in machine learning training - the intuitive episode and the incentivized episode. The intuitive episode is a natural unit of training (e.g. a game), while the incentivized episode is the period of time over which training directly pressures the model to optimize.
Key points:
- The incentivized episode is the period over which training actively punishes the model for not optimizing. It may be shorter than the full training period.
- The intuitive episode is a natural unit picked for training (e.g. a game). It is not necessarily the same as the incentivized episode.
- Care is needed in assessing if the intuitive episode matches the incentivized episode, i.e. if training incentivizes cross-episode optimization.
- Some training methods directly pressure cross-episode optimization, others don't. Details of training algorithms matter.
- Conflating the two concepts can lead to inappropriate assumptions about incentivized time horizons. Empirical testing is important.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.