DIY LLM Evaluation, a Case Study of Rhyming in ABBA Schema
Xebia
MAY 8, 2024
DIY LLM Evaluation, a Case Study of Rhyming in ABBA Schema It’s becoming common knowledge: You should not choose your LLMs based on static benchmarks. My hypothesis is that AABB rhyming is so common, that the instructions of ABBA are not strong enough to overcome the weight of the pull of the training data.
Let's personalize your content