HomeTechnologychatbotBest Practices for Developing a Generative AI Co-Pilot for Enterprises

Best Practices for Developing a Generative AI Co-Pilot for Enterprises

A blue digital game of tetris blocks — Tetris

Since the launch of ChatGPT, every customer is wondering how they can leverage generative AI for their business. From internal efficiency and productivity to external products and services, companies are rushing to deploy generative AI technologies across all sectors of the economy.

While GenAI is still in its infancy, its capabilities are expanding rapidly: from vertical search to photo editing to writing assistants, the common thread is leveraging conversational interfaces to make software more accessible and powerful. Chatbots, now renamed “copilots” and “assistants,” are in fashion once again, and while a set of best practices, Step 1 in developing a chatbot is to analyze the problem and start small.

A co-pilot is an orchestrator that helps the user complete many different tasks through a free text interface. There are an infinite number of possible input cues and they all need to be handled with elegance and security. Instead of setting out to solve every task and risk not meeting user expectations, developers should start by solving a single task really well and learn along the way.

LLM development: open or closed?

By early 2023, the performance leaderboard in LLM was clear: OpenAI was ahead with GPT-4, but well-capitalized competitors like Anthropic and Google were determined to catch up. Open source offered promising ideas, but performance on text generation tasks was not competitive with closed models.

Experience with AI over the last decade leads one to believe that open source would come back with a vengeance and that is exactly what happened. The open source community has increased performance while reducing costs and latency. LLaMA, Mistral and other models offer powerful foundations for innovation, and major cloud providers such as Amazon, Google and Microsoft are largely taking a multi-vendor approach, including supporting and amplifying open source.

While open source has not met published performance benchmarks, it has clearly surpassed closed models in the set of trade-offs any developer has to make when bringing a product to the real world. The 5 S of model selection They can help developers decide which type of model is right for them:

Intelligence (Smarts)- Through fine tuning, open source models can absolutely outperform closed models on limited tasks. This has been demonstrated several times.
Spend- Open source is free outside of fixed GPU time and engineering operations. At reasonable volumes, this will always scale more efficiently than usage-based pricing.
Speed– By owning the complete set, developers can continually optimize latency and the open source community produces new ideas every day. Training small models with knowledge of large models can reduce latency from seconds to milliseconds.
Stability: A derivative of performance It is inherent to closed models. When the only lever of control is rapid engineering, this change will inevitably undermine a product experience. On the other hand, collecting training data and periodically retraining a fixed model baseline allows for systematic evaluation of model performance over time. Larger upgrades can also be planned and evaluated with new open source models, like any major product release.
Security- Model delivery can ensure end-to-end data control. Going further, AI security in general is best served by a strong and thriving open source community.

Closed models will play an important role in customized business use cases and for prototyping new use cases that push the boundaries of AI capability and its specific content or domains. However, open source will provide the foundation for all major products where GenAI is critical to the end-user experience.

LLM development: training the model

Developing a high-performing LLM requires commitment to creating the world's best data set for the task at hand. This may seem daunting, but two facts should be considered: First, better does not mean bigger. Cutting-edge performance on specific tasks can often be achieved with hundreds of high-quality examples. Second, for many tasks in the business or product context, unique data assets and problem understanding offer an advantage for closed model vendors to collect training data to serve thousands of customers and use cases. .

Distillation is a critical tool to optimize this investment in high-quality training data. Open source models are available in various sizes, from over 70 billion parameters to 34 billion, 13 billion, 7 billion, 3 billion and smaller. For many specific tasks, smaller models can achieve sufficient “smartness” with significantly better “spend” and “speed.” Distillation is the process of training a large model with high-quality human-generated training data and then asking that model to generate orders of magnitude more synthetic data to train smaller models. Multiple models with different performance, cost and latency characteristics provide great flexibility to optimize the user experience in production.

RGA: Augmented Generation Recovery

When developing products with LLM, developers quickly learn that the output of these systems is only as good as the quality of the input. ChatGPT, which is trained on the entire Internet, maintains all the benefits (access to all published human knowledge) and disadvantages (misleading, copyrighted, and insecure content) of the open Internet.

In an enterprise context, that level of risk may not be acceptable to customers who make critical decisions every day, in which case developers can turn to recovery-augmented generation, or RGA. RGA grounds the LLM in authoritative content by asking you to only reason about information retrieved from a database rather than reproducing knowledge from your training data set. Current LLMs can effectively process thousands of words as input context for RGA, but almost all real-life applications must process many orders of magnitude more content than that. As a result, the task of retrieving the appropriate context to feed the LLM is a critical step.

More is invested in building the information retrieval system than in LLM training. Since both keyword-based retrieval systems and vector-based retrieval systems currently have limitations, a hybrid approach is better for most use cases. LLM will be the most dynamic area of GenAI research in the coming years.

User experience and design: integrate chat without barriers

From a design perspective, chatbots should fit seamlessly with the rest of an existing platform and should not feel like an add-on. It should add unique value and leverage existing design patterns where they make sense. Guardrails should help the user understand how to use the system and its limitations, should handle user input that cannot or should not be responded to, and should allow automatic injection of application context. Here are three key integration points to consider:

Chat versus GUI: For most common workflows, users would prefer not to chat. Graphical user interfaces were invented because they are a great way to guide users through complex workflows. Chat is a fantastic solution when a user needs to provide hard-to-anticipate context to resolve their issue. Consider when and where to activate chat in an app.
Set context: As mentioned above, a limitation of LLMs today is the ability to maintain context. A recovery-based conversation can quickly grow to millions of words. Traditional search controls and filters are a fantastic solution to this problem. Users can set the context of a conversation and know that it resolves itself over time or adjusts it along the way. This can reduce cognitive load while increasing the likelihood of providing accurate and useful responses in the conversation.
Auditability- Ensure that any GenAI results are cited in the original source documents and are auditable in context. Verification speed is a key barrier to trust and adoption of GenAI systems in an enterprise context, so invest in this workflow.

The launch of ChatGPT alerted the world to the arrival of GenAI and demonstrated the potential of the next generation of AI-powered applications. As more companies and developers build, scale, and deploy AI chat applications, it's important to keep these best practices in mind and focus on alignment between your technology and business strategies to create an innovative product with real long-term impact and value. . Focusing on completing a task well while looking for opportunities to expand a chatbot's functionality will help a developer succeed.

next >>

How to build a strong, adaptable data culture that instills investor confidence

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Best Practices for Developing a Generative AI Co-Pilot for Enterprises

LLM development: open or closed?

LLM development: training the model

RGA: Augmented Generation Recovery

User experience and design: integrate chat without barriers

Leave a response Cancel reply

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

MORE PUBLICATIONS