Practical LLM Enhancement Strategies for Product Managers

Learn how to effectively use LLMs to empower AI products

Dec 20, 2023

In the realm of AI product development, one of the most formidable challenges is managing costs, with a notable emphasis on the expenses incurred during the training of Large Language Models (LLMs). The costs can be substantial: for instance, training GPT-3, which has 175 billion parameters, was estimated to cost around $4.6 million. Projections suggest that the future GPT-5, with even more complexity, could potentially require an investment of approximately $2 billion. To put this into perspective, a more modest 7-billion parameter model might still demand around $25,000 for training from scratch.

Given these figures, product leaders, particularly those steering groundbreaking (0-1) projects, are tasked with a critical responsibility: devising strategies that are not only innovative but also cost-efficient. This balancing act is crucial in harnessing the power of AI while maintaining financial sustainability.

In this post, we will discuss two critical strategies to leverage the power of LLMs:

Retrieval-Augmented Generation (RAG) strategies
Prompt engineering strategies

Retrieval-Augmented Generation (RAG) strategies

What is RAG?

Retrieval-Augmented Generation (RAG) is an AI framework introduced in a research paper by Meta in 2020. This innovative framework equips generative AI models with the ability to access and utilize external information beyond what is contained in their initial training data or model parameters. This external data can be sourced from either a closed domain, which includes specific, domain-focused or proprietary documents, or from an open domain comprising publicly accessible, internet-indexed documents. RAG leverages this external data to expand and refine the knowledge base of the generative AI model, thereby enhancing its ability to respond to various prompts. This adaptability makes it possible to tailor the AI model’s responses according to the unique needs of users or businesses.

How does RAG work?

RAG is a process that consists of two main steps:

Retrieval step: retrieve relevant information to a user’s prompt from a collection of data sources that are ingested in the system and
Generation/Synthesis step: ask the LLM to generate a response with the additional information retrieved

In other words, RAG allows to look up information (in other words, semantic search) to automatically provide better, more informed answers to user queries.

Retrieval step. The collection of data sources is ingested in the system and stored in the vector databases, which use an embedding model to convert data (text, images, audio or video) into vectors based on their meaning. When a user writes a query, this is converted into vectors to make it compatible with the database. During the retrieval process, the RAG system performs a nearest-neighbor search to identify database items that are most similar in meaning to the user’s query.

Generation step. The generation process follows the same principles that we have explained in previous posts on how LLMs work, with the only difference that the LLM inputs are now combined with the outputs from the retrieval process before providing the final response to the user.

A RAG system combines a retrieval component with an LLM. The basic retrieval process searches on static, indexed databases. The retrieval component does not search the web or databases in real-time or on-the-fly. Instead, it accesses a fixed dataset or corpus that has been indexed beforehand. When a query is made, the retrieval component finds the most relevant documents or entries from this indexed dataset. These documents are then provided to the generative component of the RAG system, which uses them to inform its response. However, RAG can be combined with a web search and index data in real-time.

RAG Benefits

Some of the benefits that we need to be aware of when evaluating AI-enabled products are the following:

Reduced hallucinations - The RAG system significantly enhances the capabilities of large, static language models that rely on pre-existing datasets by providing them with the means to access up-to-date and specialized information. This integration effectively minimizes the occurrence of inaccuracies and "hallucinations" in the model's responses. By addressing the gaps in the foundational knowledge of the LLM, RAG supplies relevant context, which is critical in enabling the model to generate precise and informed answers to user inquiries.

User prompt (without RAG):

Create a graphic of a holiday cake.

GPT-4 answer:

Here is the graphic of a festive holiday cake, richly decorated with red and green icing and set in a cozy kitchen setting.

User prompt (with RAG):

Create a graphic of a holiday cake after searching on Bing for the most popular holiday cakes in 2023.

GPT-4 answer:

I did a quick search for more information and here's what I found.
The most popular holiday cakes of 2023 seem to cover a range of flavors and styles, from the rich and creamy to the fruity and nutty. Some standout options include:
Eggnog Pound Cake: A cake that starts with a yellow cake mix base and is enhanced with eggnog and nutmeg for a festive flavor.
Chocolate Gingerbread Yule Log: A classic yule log cake filled with ginger flavor and finished with dark chocolate bark, perfect for those who appreciate the gingerbread tradition with a chocolate twist.
….
These cakes are not only delicious but also embody the spirit of the holiday season with their flavors and decorations. Each of these would make a delightful graphic representation of a holiday cake for 2023.

User prompt:

Create a chocolate Gingerbread Yule Log.

GPT-4 answer:

The chocolate gingerbread yule log graphic has been created for you.

Up-to-date information - RAG effectively addresses the limitations imposed by the training data's time cutoff. It does this by enabling the model to access up-to-date information about events and topics that have emerged since the completion of its training. This approach not only diminishes the occurrence of inaccuracies or 'hallucinations' in the model's outputs but also significantly enhances the precision and contextual relevance of its responses.

Domain-specific knowledge - RAG represents a highly effective and efficient method for enriching foundational models with specialized, domain-specific data. The use of vector databases in this context is both scalable and cost-effective, primarily because they alleviate the need for labeled datasets or input from subject matter experts (SMEs). Given these advantages, RAG stands out as the most advanced method for tailoring models to specific needs to date. It surpasses other techniques such as building proprietary models, fine-tuning, and prompt engineering in terms of efficiency and effectiveness in model specialization.

User prompt:

Can you provide me with a table that contains all the W-shapes dimensions according to the American Institute of Steel Construction (AISC) Steel Construction standards?

GPT-4 answer:

I can't provide a table with all the W-shapes dimensions from the AISC Steel Construction Manual, as the manual is a copyrighted document. However, the AISC Steel Construction Manual includes detailed information on W-shapes (wide flange beams), including their dimensions, properties, and other relevant data. This manual is widely used by professionals in the field for structural steel design and construction. For comprehensive and specific information on W-shapes and other aspects of steel construction, you can refer to the latest edition of the AISC Steel Construction Manual, which is available for purchase through the AISC website or authorized distributors.

After using OpenAI’s “Create a GPT” agent feature and uploading the AISC steel construction manual:

GPT-4 answer:

Here is a section of the AISC Steel Construction Manual's Table 1-1, showing dimensions for various W-Shapes:
W24×370h: Area 109 in.2, Depth 28.0 in, Web Thickness 1.52 in
W24×335h: Area 98.3 in.2, Depth 27.5 in, Web Thickness 1.38 in
W24×306h: Area 89.7 in.2, Depth 27.1 in, Web Thickness 1.26 in
W24x279h: Area 81.9 in.2, Depth 26.7 in, Web Thickness 1.16 in
W24×250: Area 73.5 in.2, Depth 26.3 in, Web Thickness 1.04 in
W24×229: Area 67.2 in.2, Depth 26.0 in, Web Thickness 0.960 in
W24×207: Area 60.7 in.2, Depth 25.7 in, Web Thickness 0.870 in
W24×192: Area 56.5 in.2, Depth 25.5 in, Web Thickness 0.810 in
W24×176: Area 51.7 in.2, Depth 25.2 in, Web Thickness 0.750 in
W24×162: Area 47.8 in.2, Depth 25.0 in, Web Thickness 0.705 in
W24×146: Area 43.0 in.2, Depth 24.7 in, Web Thickness 0.650 in
W24×131: Area 38.6 in.2, Depth 24.5 in, Web Thickness 0.605 in
W24×117c: Area 34.4 in.2, Depth 24.3 in, Web Thickness 0.550 in
W24×104c: Area 30.7 in.2, Depth 24.1 in, Web Thickness 0.500 in.
These details are part of a larger table that includes further specifications and measurements for various W-Shapes.

Easy updates - RAG frameworks offer a streamlined solution for updating foundational models, effectively removing the need for expensive and time-consuming retraining processes. With RAG, databases can be easily refreshed by incorporating new documents. For instance, updates about the latest products, cutting-edge procedures, or evolving policies can be seamlessly integrated. Additionally, RAG frameworks can organically update their knowledge base by accessing current internet resources.

For example, a RAG-enhanced customer service bot can be promptly updated with information on a newly launched smartphone, allowing it to provide accurate, up-to-date advice and troubleshooting tips to users without the need for extensive retraining of the underlying model.

RAG vs. Fine-Tuning

Fine-tuning

Purpose of fine-tuning: Fine-tuning is aimed at modifying and adjusting the model's behavior. It's about teaching the model new skills or specific tasks, like writing patent litigation claims.

Method: Involves changing the actual weights of a pre-trained model.

Data Requirements: Utilizes highly domain- and task-specific data. Developing a high-quality, relevant dataset for the desired use cases is crucial.

Techniques: Includes methods such as instruction fine-tuning (using prompt-response pairs) and Reinforcement Learning from Human Feedback (RLHF).

Application Example: For a lawyer, fine-tuning would be akin to specialized training in tax law.

RAG (Retrieval-Augmented Generation)

Purpose: RAG is focused on providing additional, relevant context to the model at the time it is called upon to make a decision or provide a response.

Method: It involves inserting data directly into the prompt or context window of the model during inference. It does NOT involve further training LLMs.

Process Steps: The steps include embedding, creating vector stores, reranking, among others.

Application Example: For a tax lawyer using RAG, it would be like having access to relevant case documents and current laws at their fingertips for reference.

Key Difference and Complementarity

Combination for Effectiveness: These two techniques are often needed in concert. For example, a tax lawyer benefits from both specialized training (fine-tuning) and access to relevant case documents (RAG) to effectively assess content.
False Dichotomy: The debate between using RAG and fine-tuning is a false dichotomy as they serve different purposes and are complementary, not mutually exclusive.

Prompt engineering strategies

Mastering the art of prompt design is crucial for unlocking the full capabilities of AI models. A meticulously crafted prompt can transform an ordinary AI response into an extraordinary one, demonstrating the remarkable difference that thoughtful and precise prompt engineering can make.

1. Write clear and concise instructions

It is essential to include all relevant details and context in your request to ensure the response is closely aligned with your needs. This approach minimizes the need for the model to make assumptions about your intent, leading to more accurate and relevant outcomes.

Less Clear Instruction: "Analyze the survey results."

More Clear and Detailed Instruction: "Please conduct a thorough analysis of the recent customer satisfaction survey results. Start by summarizing the overall satisfaction trends observed in a concise paragraph. Then, create a detailed breakdown in a table format, showing the percentage of responses for each satisfaction level (e.g., very satisfied, satisfied, neutral, dissatisfied, very dissatisfied) across different product categories. Conclude with a bullet-point list of three key areas for improvement based on customer feedback, and suggest potential strategies for each area."

2. Use specific response formats

For optimal clarity and effectiveness, certain tasks should be delineated as a series of consecutive steps. Clearly articulating each step in order enhances the model's ability to accurately interpret and execute the instructions.

Generic user prompt: "Provide travel recommendations for a week-long trip to Japan."

Improved user prompt: "Provide an itinerary for a weekly trip in Japan in a tabular format where the columns are the days and the rows are the places to visit."

ChatGPT Response:

—

Adding references. When the input is enhanced with pertinent information from a provided document, it becomes easy to instruct the model to include citations in its responses, drawn from the supplied documents.

User Prompt: "You will receive a document enclosed in triple quotes, followed by a specific question. Your task is to answer the question using only the information contained in the provided document. Make sure to cite the passage(s) from the document that you use to construct your answer. If the document lacks the necessary information to answer the question, please respond with 'Insufficient information.' When citing, use the format ({"citation": …}) to annotate relevant passages. Your attention to these instructions will ensure accurate and well-sourced responses."
Example:
""" [Document Content Here] """
Question:
What are the main findings of the research presented in the document?

ChatGPT Response:
The main findings of the research are that ... ({"citation": "Relevant passage from the document"}).

3. Adding contextual information

Offering overarching guidelines applicable to various scenarios tends to be more efficient than exhaustively illustrating every possible variation of a task through examples. However, in certain instances, adding examples might be more straightforward. This approach is particularly useful when the task involves replicating a specific response style that is challenging to define in words. Such a method of prompting, where only a limited number of examples are provided to guide the model's response, is referred to as few-shot prompting.

User Prompt: "I would like you to respond to questions in a concise, bullet-point format. Here are some examples to illustrate the style I'm looking for:
Example 1:
Question: What are key environmental considerations for businesses?
Response:
Reducing carbon footprint through sustainable practices
Implementing waste reduction and recycling programs
Investing in eco-friendly technologies
Example 2:
Question: How can companies improve their social responsibility?
Response:
Promote diversity and inclusion in the workplace
Engage in community development and philanthropy
Ensure fair labor practices and employee well-being
Based on these examples, please respond to the following question in a similar bullet-point format.
What are effective strategies for a company to achieve its sustainability and ESG goals?”

—

Reference articles. By supplying a model with reliable and pertinent information (e.g., articles, blog posts, URL links) that aligns with the user query, we can direct the model to utilize this specific information as the basis for formulating its response.

User Prompt:
Provided Articles:
"""
Article 1:
Title: Innovations in Electric Vehicle Technology
Electric vehicles (EVs) are revolutionizing the transport sector. Recent innovations include improved battery life and faster charging solutions...
Article 2:
Title: Conservation Efforts in Rainforests
Global efforts to conserve rainforests are intensifying. Key strategies include protecting wildlife habitats and promoting sustainable logging practices...
Article 3:
Title: The Role of Technology in Modern Education
Technology is increasingly integral to education. Interactive learning tools and online resources are enhancing student engagement and facilitating remote learning...
"""
Question: What are the recent innovations mentioned in the articles regarding electric vehicles?

ChatGPT Response:
Recent innovations in electric vehicles, as mentioned in the provided article, include improved battery life and faster charging solutions. ({"citation": "Electric vehicles (EVs) are revolutionizing the transport sector. Recent innovations include improved battery life and faster charging solutions..." - Article 1})

👉 More resources & strategies

https://platform.openai.com/docs/guides/prompt-engineering

https://cookbook.openai.com/articles/related_resources

Concluding thoughts

Navigating the terrain of AI product development, particularly in the training of LLMs, presents a tradeoff of progress versus cost. The astronomical expenses associated with developing models like GPT-3, reaching millions, and potentially billions for future iterations like GPT-5, pose a daunting challenge. This financial burden forces product leaders to make tough decisions, innovating within the limits of economic feasibility, a task that requires not just technological expertise but also strategic thinking.

At the heart of this challenge lie two innovative strategies: Retrieval-Augmented Generation (RAG) and prompt engineering. RAG revolutionizes the way AI models access and integrate external information, significantly enhancing their accuracy and contextual relevance. This method marks a shift from traditional, costly retraining processes to a more dynamic, cost-effective approach. On the other hand, prompt engineering represents a nuanced art of communication with AI, demanding meticulous attention to detail and context to extract precise and relevant responses. These strategies underscore a broader narrative in AI development: the constant balancing act between technological ambition and fiscal responsibility, reflecting a deeper, ongoing negotiation about the future trajectory and ethical considerations of AI's role in society.

Embarking on this journey requires a blend of technical expertise, strategic foresight, and a willingness to embrace new methodologies. As we look to the future, the promise of AI in transforming industries and everyday experiences is immense, but it hinges on our ability to navigate these challenges with creativity and precision. The road ahead is as exciting as it is daunting, and it is up to us to chart a course that maximizes the potential of AI while remaining mindful of the economic realities that accompany such groundbreaking work.

On behalf of the Hatch Labs team and Eva Agapaki, we wish you a wonderful holiday season and we will be back with more articles and exciting announcements for what is next for Hatch Labs in 2024!

Hatch Labs substack

Discussion about this post