Whether you’ve attended AI conferences, keep up to date on press releases or even simply came across a post on your social media feed, you’ve probably seen companies promote their AI product as an all-encompassing solution. If you thought the claims they made were too good to be true, you should probably trust that intuition.
When it comes to out-of-the-box enterprise chatbot and chat with your document solutions, there is a plethora of choices. So what do these companies offer? In general, flexibility, customization and specialization. Some companies want their product to appeal to as wide a market as possible, and therefore focus on making a product that is versatile, customizable and which clients can tailor to their specific needs. These products are quick and easy to integrate into existing systems. Users can tune the functionality to specific purposes, easily alter the UI, and access built-in analytics. Other companies may specialize their product to suit a market niche. The drawback here is that unless your application precisely aligns with that niche, you will likely be paying for features that you don’t need.
Training an entire model for an application such as a chatbot is expensive, and ultimately provides less control than the more common alternative, retrieval augmented generation (RAG). In RAG applications, the information available to the chatbot is stored in a knowledgebase, with a particular structure that makes it easy to search for data related to user queries. The data which best matches the user question is then passed to a large language model (LLM), which generates a response.
When our team recently designed a customized chatbot for a client, we had a proof-of-concept running in only a couple of weeks. Another week or two of tweaking and testing and it worked… surprisingly well. Suspiciously well. It made us wonder, is that all there is to making a chatbot?
The 80/20 Principle in Chatbot Development
It turns out, this proof-of-concept chatbot didn’t handle edge cases well. In fact, it was downright unpredictable when not limited to questions directly answered by the knowledgebase. Beyond that, the client was a medical company meaning there were legal restrictions regarding topics the chatbot could and could not talk about, and ours had no way of identifying these topics.
It’s not difficult to design a chatbot that works “pretty well”. Unfortunately, that is not enough when your product is going to be put out into the world. This is a clear case of the 80/20 principle at work, where you can get about 80% of your desired functionality in only 20% of your time spent. However, it’s ensuring you achieve that last 20% of your functionality that consumes the majority of your time. In our case, this involved preventing the bot from discussing specific off-limit topics, making it more conversational, and formatting the reference material to make them more easily accessible. Here are a few examples of issues that an out-of-the-box or template-style chatbot solution might not handle.
Content Checks
Whether simply to prevent users from treating it as a free version of ChatGPT, or if there are liability issues at play, it is likely that you want to limit the topics your website’s chatbot discusses. These checks can be simple, such as searching for certain keywords and phrases, to more complex, such as prohibiting entire topics from being talked about. It is relatively simple to prevent the chatbot from commenting on forbidden topics . It is significantly more challenging to guard against the bot discussing off-limit topics, while still giving it enough freedom to answer questions it should.
Organizing Reference Data
As the saying goes, “garbage in = garbage out”. LLMs are great at handling unstructured data, but the more you are able to organize and process that data before generating a response, the better the result will be. This is an entirely separate problem from designing the chatbot itself.
To this end, many advanced RAG methods already exist, and more are continually being developed. These methods generally involve increasing the quality or amount of data being passed to the LLM during response generation. However, which method performs best depends on both the format of the knowledgebase data, and the use-case of the chatbot. Ensuring the right strategy is chosen for a particular application requires knowledge, experience, and experimentation, bringing us to our last point…
Testing, testing, testing!
The best products are tested and iteratively improved. Out-of-the-box solutions don’t provide the opportunity to do this. There may be parameters that can be tweaked, pushing the results in the desired direction, but this will never match the performance of a fully tested and tailored solution.
Conclusion
Although enterprise chatbots offer an inexpensive and convenient solution, they may fall short in handling the specific nuances and edge cases required for specialized applications. The 80/20 principle illustrates why this is the case. Meeting more stringent requirements involves substantial effort, particularly in cases involving legal compliance, managing complex data structures, and providing reliable responses. In these situations, only a bespoke solution can provide the full customization that the application requires.