Emergent large language models (LLMs) such as OpenAI’s ChatGPT (particularly its latest iteration, GPT-4), Claude AI and Gemini have demonstrated limited decision-making. In this article, we’ll discuss contemporary research surrounding decision-making by LLMs and what it could mean for their future. Traditionally, effective decision-making in LLMs refers to deducing underlying patterns or rules and flexibly and appropriately applying them to new scenarios to make a decision. An experiment by the Santa Fe Institute noted that LLMs, including ChatGPT, could not “reason about basic core concepts.” Making well-reasoned decisions relies on a nuanced understanding of the prompt’s context and the output’s consequences. Moreover, poor LLM decision-making tends to yield disastrous results in practice. For example, in 2023, the National Eating Disorder Association had to suspend their AI chatbot. “Tessa” started dispensing insensitive nuggets of advice such as weekly weigh-ins and eating at a 500 to 1,000 calorie deficit. The chatbot was quickly disabled amid a storm of controversy. Rather than just providing incorrect information, LLMs can also recommend generic outcomes. INSEAD noted that when ChatGPT was prompted for research questions about business strategies, the model tended to veer towards generic, conventional wisdom concerning participative management. For example, LLMs tend to recommend collaborative working, fostering a culture of innovation and aligning employees with organizational goals. But business strategizing is a complex social and economic process that does not benefit from generic advice.
Full report : Improving decision-making in LLMs: Two contemporary approaches.