14 Ways ML Could Improve Informative Video

By Ozzie Gooen @ 2023-01-10T13:53 (+8)

This is a linkpost to https://quri.substack.com/p/14-ways-ai-could-improve-video

~30-minute brainstorm. I haven’t done ML engineering myself, but am an enthusiast.

ML systems are getting scarily good. I’m a fan of online video for sharing information. Video is tougher to automate than text alone, but I think many of the steps can be automated with roughly existing ML.

(I still think that ML could be used dangerously, but I also think there are a lot of positive uses in the short-term. I think EAs could actively help develop some uses, and expect to see other groups develop other uses.)

Some video AI integrations I'd like to see include:

Every old video should get automatically cleaned up audio.
Every video and audio file online should get a generated transcript.
Each transcript should get summarized. Ideally, summarized are individualized or at least customized for viewers.
Each transcript/summary gets evaluated. We get estimates of how accurate/outdated/relevant/important/neglected/innovative the work is.
Using something like reinforcement learning, we get better at connecting people with information that is important for them to learn. So, it's not too difficult, inaccessible, or redundant to them.
If you do want to watch a video, there should be automatic breaks where it interjects it with extra additional context. Like, "Clarification: This point is now outdated. We recommend skipping ahead 2min."
Videos could also have a lot of extra text annotation. Text on the side that adds extra relevant information about different scenes.
Instead of watching full 20 minute videos, AI recommends that you only watch minutes 2-5, then 10-15. It summarizes the rest with auto-generated video snippets.
Stock footage can automatically be replaced by generated footage most preferable to the viewer.
Eventually, many videos will be completely autogenerated. AI figures out what information is best for you, using what methods, and creates videos on the fly.
Video is interactive. It's very easy to pause a video and ask it to change topic or answer a specific question.
Autogenerated and personalized video should be able to feed into user-provided data. So an autogenerated personality could say things like, “So, this concept would have been useful to you 5 days ago, when you had a conversation with Amelia.”
Once we get used to Virtual Reality, it might make sense to stop emphasizing 2D videos. It’s not clear how to best incorporate 3D videos into metaverse-like settings, but there are different options.
Once we get brain-computer interfaces, or at least strong video camera driven facial analysis, we could tune video content depending on signals of interest and engagement. If you start getting bored during an educational video, it could jump to a fun example.

I think video is better than raw text for many people. It's also more work and more information-dense. But much of the pipeline definitely seems automatable to me, mostly with existing technologies. It would be a lot of engineering work though.