ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning
Netflix Tech
OCTOBER 18, 2019
The key insight was that by assuming a latent Gaussian Process (GP) prior on the key business metric actions like viral engagement, job applications, etc., And finally each new observation needs to update the policy, compute offline policy evaluation metrics and then push the policy back to production so it can generate new intents to treat.
Let's personalize your content