Wednesday, Jul 17, 2024
A Closer Look at GPTs and Assistants API
The OpenAI DevDay conference dominated the generative AI news space last week. Some will argue that OpenAI just killed a bunch of startups, as it happens every time they release a new feature that impacts companies developing similar functionality. Some will disagree and claim that it will go down in history as the official birthdate of a new decade-long wave of developer-led value creation. Anyhow, for many of us working with Large Language Models (LLMs), this release was really exciting as it comes with many significant updates: GPT-4 Turbo, which offers a 128K token context window and lower prices, experimental access to GPT-4 fine-tuning, new multimodal capabilities including vision, image creation with DALL·E 3, and text-to-speech (TTS) functionalities.
Each of these updates brings a lot of value to both developers and end-users. For example, the drastically larger context window in GPT-4 Turbo will alleviate many challenges associated with content chunking strategies, enabling the development of more advanced solutions with relative ease. However, for the purpose of this post, we will focus on two brand new features that promise to introduce completely new paradigms in customizing models and producing AI-enabled assistants.
GPTs
GPTs are essentially specialized versions of ChatGPT, designed to cater to specific tasks with a high degree of customization. This innovation represents a leap forward in the realm of AI, offering a versatile tool for both developers and non-technical users.
GPTs provide a streamlined path for constructing advanced, multi-modal chatbots. The development process is now transformed into an intuitive, no-code experience. This shift is not just about ease of use; it’s about empowering users to harness the full potential of AI without the steep learning curve of intricate programming. The capabilities of GPTs encompass vision capabilities, integrating the artistic flair of DALL-E, and extend to functionalities like web browsing and Python code interpretation. Furthermore, they facilitate interactions with public APIs, thus broadening the scope of applications.
This development can be seen as an evolution of concepts from other open-source projects, like the Agents defined by LangChain, a framework for building LLM applications. However, OpenAI has refined this concept, presenting a more accessible approach to building agents.
Previously, leveraging the OpenAI API demanded a solid grasp of integrating Chat Completion APIs with tools like LangChain. This barrier has been significantly lowered, ushering in an era of rapid, simplified development of advanced conversational bots. However, it’s important to note that while this simplification streamlines the process, there may be trade-offs in terms of the flexibility offered by more customized approaches.
To embark on creating your own GPT, OpenAI’s editor is the starting point. Accessible to those with a ChatGPT Plus or Enterprise license, the editor is divided into two sections - Creation and Configuration on the left, and Preview and Testing on the right - which are very easy to use. You can begin your journey here. GPT editor will be guiding you through each step - from choosing a name and logo for your GPT to fine-tuning its behavior to suit your specific needs. This is a real addition to the “old school” set of tools because it proactively asks you questions and guides you, ensuring that the final product provides the right information in the right tone. It feels quite natural to develop natural language apps and utilities using natural language. Not so long ago, this would have required thorough knowledge of text preprocessing, chunking, embedding, vector databases, APIs, and the tools needed to put it all together.
Despite their advanced capabilities, it’s important to recognize the limitations of GPTs, such as occasional hallucinations (as with all other LLMs) or longer response times. Nonetheless, this innovation holds the potential to render some third-party chatbot generation tools obsolete, particularly those based on simpler Retrieval Augmented Generation models. For more complex and control-intensive use cases, however, the traditional, more customized approaches still hold an edge.
Assistants API
OpenAI Assistants API is a new offering poised to improve the way developers build AI assistants within their applications. Traditionally, the process of producing and operating custom Large Language Models (LLMs) involved a complex interplay of various elements. Developers had to juggle infrastructure management, handle diverse data types, optimize models, and design intricate pipelines. Dealing with prompts, managing context windows, maintaining application states, and ensuring effective observability were just a few of the hurdles. Additionally, technical aspects like chunking, embeddings, storage mechanisms, caching, and augmented generation added layers of complexity. This often meant spending considerable time and effort just to establish a functional system, leaving less bandwidth for addressing actual customer problems.
Assistants API simplifies these processes dramatically. It eliminates the burdensome need to manage conversation histories and brings in convenient access to OpenAI-hosted tools like the Code Interpreter and Retrieval functionalities. In addition to that, its enhanced function-calling capability is streamlining integration with various third-party tools.
A cornerstone feature of the Assistants API is its implementation of persistent threading for ongoing conversations. This means each user interaction is neatly organized into its own thread, keeping conversations distinct and manageable. These threads not only store message history but also intelligently truncate it to fit within the model’s context length. This is a significant shift from the previous developer-intensive model, where keeping track of conversation states and determining the relevance of past messages was a complex and time-consuming task. This transforms OpenAI’s models from being stateless to stateful, offloading the responsibility of memory management from the developer to OpenAI and streamlining application development as a result.
The Assistants API also introduces an innovative approach to retrieval. Previously, enhancing LLM responses necessitated the use of Retrieval Augmented Generation (RAG) techniques, which involved integrating data from external sources into the model’s responses. This process, typically reliant on vector databases and embedding techniques, was complex and labor-intensive. The Assistants API abstracts these complexities, allowing for faster and more efficient development of tools that leverage RAG.
While these advancements are noteworthy, it’s important to recognize the limitations of the Assistants API, particularly for developers seeking to build highly complex agents and workflows. The current iteration of OpenAI’s retrieval model offers limited customization, which may not suffice for advanced RAG-based solutions that require features like multitenancy, metadata handling, hybrid queries, and custom embedding models. Additionally, there’s a concern about being reliant on a single LLM provider. Tools like Langchain, which offer more flexibility and customization, are likely to remain relevant and in use for the foreseeable future.
In summary, both GPTs and Assistants API are significant steps forward, offering easier and faster development of AI-enabled applications. However, for more complex and customized solutions, the search for the perfect tool continues.
How can we help?
Mono began exploring Large Language Models years back, at a time when the term was largely associated with experimental models in their infancy. Today, we’ve progressed to developing sophisticated systems that leverage Natural Language Understanding technologies, serving clients globally. Contact us and let’s start brainstorming!