
We put up with bad software all the time. Anyone who has raged against their enterprise travel booking machine or tried to decipher the interface to their corporate tool for logging employee feedback knows what I’m talking about. Despite these problems, we continue to use (and, let’s be honest, write) bad software. Yet when it comes to large language models, ChatGPT, and other aspects of our generative AI (GenAI) universe, we don’t seem to accord the same level of patience. As developer Simon Willison notes, “While normally you would see people complain about how hard software is to use, in this case [of GenAI], people having trouble getting good results instead assume that it’s actually useless and give up on it entirely.”
Are we holding GenAI to an unrealistic standard?
Inflated expectations
The answer is clearly yes, but who is to blame? Pretty much everyone. From the people who fear AI-driven machines will take our jobs, to vendors AI-washing their tired products, to the media looking for interesting content, to [insert demographic here], we’ve collectively come to expect too much from AI, both good and bad.
In the case of GenAI, this has led proponents to overlook or soft-pedal some of GenAI’s obvious shortcomings. Bill Gates, for example, has an incredibly ambitious vision for where GenAI is going that seems divorced from even the most optimistic present-day reality. Such hype helps no one and makes it harder to tackle some of GenAI’s core problems.
For starters, as Amelia Wattenberger argues, chat is a strange, unintuitive way to discover GenAI’s smarts. As she notes, things like ChatGPT “greet” users with a text box, with no real guidance on what to type into the box and, essentially, no visibility into why it responds in a certain way. She continues, “Of course, users can learn over time what prompts work well and which don’t, but the burden to learn what works still lies with every single user.”
Compounding this problem, researchers Zamfirescu-Pereira, Wong, Hartmann, and Yang claim, “Even for [natural language processing] experts, prompt engineering requires extensive trial and error, iteratively experimenting and assessing the effects of various prompt strategies on concrete input-output pairs before assessing them more systematically on large data sets.” We’re all trying to figure out how to create inputs that yield great output, and we’re mostly failing. It doesn’t help that the industry has been moving so fast, as Benj Edwards points out: “Whatever techniques you develop to use them well [are] obsolete in three to four months.” Surely vendors like OpenAI could be baking more guardrails into their products, making it easier for non-experts to become productive and eliminate some of these UX issues.
These teething pains with GenAI, however, don’t warrant the conclusion that it’s either all hype or that it doesn’t work.
Practical reality
The friction inherent in ChatGPT and other GenAI tools is real, as Sebastian Bensusan details in a friction log, but also solvable. And some of that “solving” comes down to user experience. Yes, the tools can and should bake more smarts into the interface, but it’s also true that one key way to get more value from GenAI is to keep practicing until we figure out where its sharp edges are.
Few have more experience with this than Willison, who suggests, “To get the most value out of [large language models]—and to avoid the many traps that they set for the unwary user—you need to spend time with them and work to build an accurate mental model of how they work, what they are capable of, and where they are most likely to go wrong.” Yes, the tools need to improve, but this doesn’t eliminate the need for users to get smarter and savvier as well.
For those inclined to dismiss GenAI because it’s hard, I’d urge patience and practice, as Willison does. As he concludes, GenAI “can be flawed and lying and have all [sorts of] problems … and it can also be a massive productivity boost.”
Next read this: