A visual guide to using IBM Watsonx.ai for generative AI applications.

Comprehensive Guide to IBM Watsonx.ai and Generative AI

Tutorial: What is Generative AI and How Can It Be Applied?

Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on. A Large Language Model (LLM) is a type of language model notable for its ability to achieve general-purpose language understanding and generation.

The goal of this lab is to show how you can use prompt engineering with LLMs to elicit more accurate, relevant, and context-aware responses related to travel information on countries. We'll leverage that information when building the application in Lab 2.

Note: The following images show actual results from watsonx.ai prompt lab. The slight gray text is what we provided to the model. The blue highlighted text is how the model responded. Be aware that the outputs shown here may not resemble the outputs you receive when using the prompt lab.

Steps

Step 1: Getting Started

When you open up watsonx.ai prompt lab and click the Freeform mode option, this is what you will see. The large central text area is the prompt editor. On the right side, you can display the model parameters that you can use to optimize how the model responds to your prompt. At the bottom-left is a summary of the number of tokens used by your prompt during execution. For more information on prompt engineering, see watsonx prompt lab.

Note: The models provided in the prompt lab are foundation models hosted on IBM Cloud, enabling us to inference or call them. LLMs are a type of foundation model that we will use in this lab.

Step 2: First Prompt

When starting with the initial prompt, it's best to try something quickly, and if it doesn't give you the result you want, go ahead and improve it over time. We will do this in a step-by-step process.

Let's start with:

  • Model: flan-t5-xxl-11b
  • Prompt text: I am thinking of traveling to Thailand.

This produces an output when we call the model (in other words, click on the "Generate" button) that isn't very useful, as it doesn't provide information on Thailand. This is analogous to questioning someone about a specific topic. The more open-ended the questions, the more generic the answers. Conversely, more closed questions yield specific answers. Thus, we need to try again and provide more specific context in the prompt to help guide the model to generate relevant information about Thailand.

Step 3: Being More Direct

Let's be more direct in our prompt:

  • Prompt text: I am thinking of traveling to Thailand. Tell me about Thailand.

This produces a more promising output, as we are starting to receive some information about Thailand. However, it seems to finish mid-sentence. The summary of the number of tokens (on the bottom-left) indicates:

Stop reason: Max tokens parameter reached.

This means we do not have enough tokens to process the request. Tokens represent the smallest entity handled by the model architecture. For our discussion in this lab, we view tokens similarly to words. We therefore need to increase our "Max tokens."

Step 4: Tinkering with Model Parameters

When we increase Max tokens to 200 and run the same prompt again, we produce output similar to the following:

This time, it finished the sentence, and the summary of the number of tokens reads:

Stop reason: End of sequence token encountered.

All good from that aspect. Every time we query the model, however, it will return the same answer. This is because we are using Greedy decoding, which consistently asks the model to provide what it believes is the best or statistically most accurate response. Let's change the decoding method to Sampling and see what the model returns with the same prompt each time.

With this adjustment, you will witness different responses, reminiscent of the saying: "Variety is the spice of life!" There are additional parameters available for configuring the model, but we will not cover them in this lab. For more details on parameters and how to use them, see the watsonx prompt lab.

Step 5: Getting More Specific

Having tinkered with the prompt text and model parameters, we've guided the model to provide information about Thailand. However, this information is still quite generic. For the design of our travel application, we want tailored information for users' interests. So, let's update the prompt text as follows to obtain information on water sports and food:

  • Prompt text: I am thinking of traveling to Thailand. I like water sports and food. Tell me about Thailand.

This response is very limited, and it should be more informative for practical use in your application. We've experimented with various prompts and parameters, but we still lack the relevant information required. Perhaps it’s time to explore a different model?

Step 6: Checking Out Other Models

The watsonx.ai prompt lab provides information cards regarding its supported models. Each card includes:

  • Provider and source of the model
  • Tasks the model excels at
  • How to tune the model
  • White paper it is substantiated by
  • Bias, risks, and limitations

To access information on the models, click the dropdown alongside the model name and select "View all foundation models." Here’s a list of models currently supported on the free tier prompt lab:

Click on the llama-2-70b-chat model to check its information card. It states that the model is "optimized for dialogue use cases," which could make it a strong candidate for our information-gathering needs. Click on the "Select model" button, and let's try this model to see if it yields more information on Thailand.

Step 7: Using a Different Model

After selecting the llama-2-70b-chat model and maintaining the same prompt and parameters, let's check what this model returns.

This clearly appears to be an improvement, yet the response is once again cut-off—a pattern we saw previously. This is validated by the Stop reason indicating that Max tokens were reached.

As tokens increase, so does the cost of calling a model. This time, instead of merely increasing the token count, it would be prudent to set limitations to the prompt.

Step 8: Adding Limits to the Prompt

We will update the prompt text to include constraints on the response returned. This way, we guide the model to return information limited to our defined request:

  • Prompt text: I am thinking of traveling to Thailand. I like water sports and food. Give me 5 sentences on Thailand.

At last, we now have a response from our query regarding Thailand that is useful, informative, and tailored to user preferences. Additionally, it conforms to our maximum token allowance.

Conclusion and Next Steps

Prompt tuning of an LLM offers an alternative to training and creating a model tailored to your needs. As illustrated in the lab, this process necessitates iterative testing and tweaking to converge on a model and prompt that align with your specifications. While the examples here represent a basic case, the principles are transferable to various scenarios. More context in the prompt, or even examples (i.e., shot prompting), may be needed to guide the responses accurately.

It was highlighted earlier that the models are hosted on IBM Cloud and are accessed through inference/calling by clicking the "Generate" button. You can check out the REST API by clicking the "View code" dropdown as shown, allowing you to see all details of the API call, including the model name, prompt text, and parameters.

Back to blog

Leave a comment