Deep Learning about Spring AI: multimodularity, prompts, and…

Do you want our logo?

Do you want our logo description

In the previous post, we took our first dive into Spring AI and its APIs. We talked about classes like ChatClient and ChatModel, which are the main entry points for interacting with LLMs. In this post, we continue exploring the module to look at features like multimodularity, prompts, and something as cross-cutting as observability in models.

Multimodularity

Humans acquire knowledge from various "data sources" (images, sound, text, etc.), so it's fair to say that our experiences are multimodal. In contrast, Machine Learning traditionally focused on a single modality—until recently, when models began to appear that can process multiple inputs, such as image and text, or audio and video at the same time.

Multimodularity refers to a model's ability to process information from multiple sources, and Spring AI, through its Message API, provides this support. The content field in the UserMessage class is used for text, while the media field allows the inclusion of additional content such as images, audio, or video, specified by the corresponding MimeType.

@RestController
public class MultimodalController {

    private OllamaChatModel chatModel;
    
    public MultimodalController(OllamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/multimodal")
    String multimodal() {


    ClassPathResource imageResource = new ClassPathResource("/static/multimodal.jpg");

    UserMessage userMessage = new UserMessage("Explain to me what you see in the image", new Media(MimeTypeUtils.IMAGE_JPEG, imageResource));

    return chatModel.call(new Prompt(List.of(userMessage))).getResult().getOutput().getText();
    }
    
}

To see it in action, we’ll request a description of the following image:

robot arm pointing upward against a blue background with white dots connected by white lines

We have asked him in Spanish and we get responses like:

The image shows a blue background with a series of white dots, similar to a star constellation or a star map. In the foreground, there is a person holding what appears to be a smartphone or tablet in their hands, with the screen showing more of these white dots, possibly indicating connectivity or data transmission. The overall effect gives the impression of digital interconnectivity, with the human element suggesting that technology is an essential part of our lives.

1 What we see in the image is a scene from a futuristic or technological environment. In the background, there is a figure illuminated by a blue light source, with silhouettes of several data points or connector nodes forming a network. This suggests that the setting is a place where advanced technologies are used, possibly related to neural networks, IoT (Internet of Things), Big Data, or similar. 2 3 In the foreground, a hand emerging from the left side of the frame appears to be reaching for or manipulating the virtual data network. This action may represent real-time data system control or manipulation, indicating a context of security, application development, or science. 4 5 The environment is dark, but the blue lighting creates a magical and technological atmosphere, combining elements of fantasy with the reality of technology. The image appears to be part of a presentation or promotional material related to technology and its impact on the future.

As shown, the model used can produce strange responses, even mixing languages. Another important aspect to keep in mind is the response time (in general, for all model calls), so it's essential to handle asynchrony as well as performance improvements in the code (such as using virtual threads).

Prompts

As mentioned earlier, prompts are the inputs sent to the models. Their design and structure significantly influence the model’s responses. Ultimately, Spring AI handles prompts with models similarly to how the “view” layer is handled in Spring MVC. This involves creating markers that are replaced with the appropriate value to send content dynamically.

The structure of prompts has evolved along with AI. Initially, they were just simple strings, but now many include placeholders for specific inputs (for example: USER:<user>). OpenAI has introduced more structured prompts, categorizing different strings by roles.

Roles

Each message is associated with a role. Roles categorize the message, providing context and purpose for the model, improving response effectiveness. The main roles are:

System: guides the behavior and style of the response by setting rules for how it should respond. It’s like giving instructions before starting a conversation.
User: the input provided by the user. This is the primary role to generate a response.
Assistant: the AI’s “reply” to the user that maintains the flow of the conversation. Tracking these responses ensures coherent interactions. It may also include “Function Tool Call” requests to perform functions such as calculations or retrieving external data.
Tool/Function: focuses on returning additional information in response to Tool Call Assistant messages.

In Spring AI, roles are defined as an enum.

public enum MessageType {

    USER("user"),

    ASSISTANT("assistant"),

    SYSTEM("system"),

    TOOL("tool");

    ...
}

Prompt Template

The PromptTemplate class is key to managing prompt templates, making it easier to create structured prompts.

public class PromptTemplate implements PromptTemplateActions, PromptTemplateMessageActions {}

The interfaces implemented by the class provide different aspects for prompt creation:

PromptTemplateStringActions: for creating prompts based on Strings.
PromptTemplateMessageActions: for creating prompts using Message objects.
PromptTemplateActions: for creating a Prompt object that can be sent to the ChatModel.

Some examples of using PromptTemplate:

@GetMapping("/simple")
String simplePromptTemplate(@RequestParam String adjective, @RequestParam String country) {

    PromptTemplate promptTemplate = new PromptTemplate("Give a {adjective} city from {country}");

    Prompt prompt = promptTemplate.create(Map.of("adjective", adjective, "country", country));

    return chatModel.call(prompt).getResult().getOutput().getText();
}

@GetMapping("/system")

String systemPromptTemplate(@RequestParam String topic) {

    String userText = """
            Give me information about Barcelona.
            Respond with at least 5 lines.
            """;

    Message userMessage = new UserMessage(userText);

    String systemText = """
            You are an AI assistant that helps people with information about cities.
            You must respond with information about the city on the topic {topic}.
            Respond as if you were writing a travel blog.
            """;

    SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemText);
    Message systemMessage = systemPromptTemplate.createMessage(Map.of("topic", topic));

    Prompt prompt = new Prompt(List.of(userMessage, systemMessage));

    return chatModel.call(prompt).getResult().getOutput().getText();
}

In addition to using Strings for prompt creation, Spring AI supports creating prompts through resources (Resource), allowing you to define the prompt in a file that can then be used as a PromptTemplate:

@Value("classpath:/static/prompts/system-message.st")
private Resource systemResource;

SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemResource);

Prompt Engineering

As mentioned earlier, the quality and structure of prompts have a significant impact on model responses, giving rise to a new profession. In the developer community, there's ongoing analysis and sharing of best practices to improve prompts in various scenarios, which leads to several key points for creating effective prompts, such as:

Instructions: Create clear and concise instructions, just as you would when communicating with a person. This helps the AI understand what is expected.
External context: Provide relevant contextual information to help the AI understand the broader scenario.
User input: This is the user’s direct question or request — the core of the prompt.
Output format: Specify the desired response format, even though this may sometimes be loosely followed. Providing request-response examples helps the AI better understand user intent.

Spring’s official documentation links to multiple resources to improve prompts, and there is also a large community online dedicated to this task.

Observability

As with any existing Spring framework/module, the observability section is essential to maintain and understand how applications behave, especially when issues arise. Spring AI provides metrics and traces for various components discussed earlier, such as ChatClient, ChatModel, and others to be covered later.

Some of the metrics and traces offer insights like parameters passed to the model, response times, tokens, and models used, etc. Special care must be taken with what gets logged, as it could potentially leak sensitive user data. You can find all available metrics and data in the official documentation.

By enabling tracing with Actuator, Zipkin, and OpenTelemetry dependencies, you can access metrics such as:

These include information such as tokens, model used, response times, etc.

Metric extraction for gen ai client operation

Moreover, when enabling traceability with Zipkin and OpenTelemetry, you will also get traces in the Zipkin server:

If you look closely, you can see the user input (property spring.ai.chat.client.user.text). This is possible because it was enabled using the property spring.ai.chat.client.observations.include-input, though Spring AI does show a warning about this in the trace logs, as it may expose sensitive user information.

Spring AI warning about sensitive data exposure

Demo with Ollama

With all the prior knowledge, we can now build a demo using Ollama. Once Ollama is up and running in your system, the app can be created with various endpoints to test different Spring AI features:

travel-recommendation: an endpoint that recommends travel destinations (it actually accepts any user input, so it could be used for other types of queries).

Example recommendation of small cities to travel to, with descriptions

travel-recommendation-population: an endpoint that recommends cities below a specified population. It illustrates how to pass a parameter to the system input. You may notice the model has limited accuracy when filtering results.

Example of travel recommendation filtered by population under 1000

travel-recommendation/chat-response: works similarly to the first endpoint but returns a ChatResponse object that includes metadata such as tokens and model used (the text field contains the actual reply, shortened here for simplicity).

{
    "result": {
        "output": {
            "messageType": "ASSISTANT",
            "metadata": {
                "messageType": "ASSISTANT"
            },
            "toolCalls": [],
            "media": [],
            "text": "..."
        },
        "metadata": {
            "finishReason": "stop",
            "contentFilters": [],
            "empty": true
        }
    },
    "metadata": {
        "id": "",
        "model": "llama3.2:1b",
        "rateLimit": {
            "requestsLimit": 0,
            "requestsRemaining": 0,
            "requestsReset": "PT0S",
            "tokensLimit": 0,
            "tokensRemaining": 0,
            "tokensReset": "PT0S"
        },
        "usage": {
            "promptTokens": 34,
            "completionTokens": 428,
            "totalTokens": 462
        },
        "promptMetadata": [],
        "empty": false
    },
    "results": [
        {
            "output": {
                "messageType": "ASSISTANT",
                "metadata": {
                    "messageType": "ASSISTANT"
                },
                "toolCalls": [],
                "media": [],
                "text": "..."
            },
            "metadata": {
                "finishReason": "stop",
                "contentFilters": [],
                "empty": true
            }
        }
    ]
}

travel-recommendation/entity: works just like the initial endpoint, but transforms the model response into domain model classes from our application:

Note that this endpoint is very prone to failure since correctly transforming the model’s response into JSON format often fails (this is still one of the key areas pending improvement in both Spring AI and the models, as they frequently do not generate valid JSON).

chat-client-programmatically: this endpoint illustrates how to create a ChatClient bean programmatically instead of letting Spring auto-inject it. This gives more flexibility when using multiple models in the same application. Functionally, it behaves the same as the travel-recommendation endpoint.
multimodality: this endpoint allows you to send both text and image inputs to the model. In this case, the model is asked to describe the image. This example is detailed in the multimodality section.
city-name-generation: generates 5 fictional city names. This serves as an example of how to modify the model's "temperature" setting (for more creative responses). When calling the endpoint, the response might look like this (ALT in English):

1 Sure! Here are 5 made-up city names: 2 3 1. Aerthys - a city nestled in a valley surrounded by mountains, known for its natural beauty. 4 2. Eldrida - a medieval city with ancient walls and a historic bridge crossing the river. 5 3. Nyxoria - a futuristic city on the west coast, known for its solar towers and vibrant nightlife. 6 4. Luminaria - a charming city with white stone-paved streets and glass-roofed buildings reflecting sunlight. 7 5. Caelum - a centrally located city known for its modern architecture and stunning botanical gardens.

/prompt-template/simple: endpoint that provides information about a city using an adjective (e.g., large, small, coastal) and country. It serves as an example of how prompt templates are generated.

Madrid is the capital and largest city of Spain. It is located in the center of the country and is known for its rich history, vibrant culture, and modern architecture. Some popular attractions in Madrid include the Royal Palace, Prado Museum, Puerta del Sol, and Retiro Park.

/prompt-template/system: endpoint that provides information on a specific topic (technology, business, industry, etc.) related to the city of Barcelona. It showcases how to generate system prompt templates (ALT in English).

Hi! As an AI assistant, I’m excited to help with any question you have about the city of Barcelona. Barcelona is a vibrant and exciting city in northeastern Spain. It’s known for its beautiful beaches, unique architecture, and delicious traditional dishes like paella and fried calamari. It’s also a city that has developed into one of the most technological hubs in the world. Known for being a pioneer in the tech sector with leading companies like La Caixa Móvil, Vodafone, and Nokia, among others. Innovation in science and health. Numerous startups are working on technological solutions that could change the world. If you’re interested in technology, you may want to visit key companies or attend tech events and conferences held regularly in Barcelona. You can also visit science museums like the Natural and Technological Sciences Museum to learn more about technology and its advances. If you’re thinking of moving to or working in Barcelona, it’s worth noting that the city also has top educational institutions like the University of Barcelona and Pompeu Fabra University offering various programs and tech specializations. In short, if you're into tech, Barcelona is a vibrant and leading city in the field.

/prompt-template/resource: this endpoint performs the same task as the previous one but retrieves the systemPromptTemplate from an external file (ALT in English).

London is a city full of history and culture, located in southern England. It's known as one of the most important financial centers in the world, home to numerous banks and companies. In terms of business and commerce, London offers a wide range of job opportunities across sectors. The city hosts major multinational firms as well as many small and medium enterprises that drive economic growth. The city also holds key trade shows and events year-round such as the International Fashion and Accessories Fair, the London Auto Show, and the London International Exhibition Fair. But it's not just a business city — it also provides visitors with a rich and diverse cultural atmosphere. Known for its Gothic and Neoclassical architecture, London features many historic and artistic landmarks. It’s also home to renowned museums and art galleries with valuable collections. Lastly, London offers a unique culinary experience with restaurants and pubs offering traditional English fare and global cuisines. In summary, London provides a unique mix of culture, history, and business opportunities.

The application includes the dependencies and configuration properties needed to enable observability of the various requests made to the models. To export traces to a Zipkin server, you can spin up a local Zipkin instance with Docker using the following command:

docker run -d -p 9411:9411 openzipkin/zipkin

In general, it should also be noted that depending on the prompt provided and the accuracy of the model, nonsensical responses or outputs that do not meet the specified requirements may occur. This can result in formatting errors or even memory-related issues, with traces such as:

java.lang.RuntimeException: [500] Internal Server Error - {"error":"model requires more system memory (6.1 GiB) than is available (5.3 GiB)"}

You can download the sample application code at this link.

Conclusion

In this second part of the Spring AI series, we’ve explored how to run multimodal models as well as how to manage model input through various prompt techniques. Another crucial section in any Spring module is observability, and here it offers relevant metrics on model behavior, which are essential for both the application’s performance and potential LLM usage billing.

While what we’ve seen so far can be considered the "Hello World" of AI, it’s important to note that this is the most basic layer of the Spring AI module — and it already provides plenty of capabilities to consider for real-world use cases, especially those involving chatbots.

In future posts, we’ll dive into more advanced features of Spring AI. Let me know what you think in the comments! 👇

References

Simón Rodríguez

Passionate about science and technology, to which I dedicate a large part of my life, both professionally and personally. Closely connected to backend software development, cloud, and DevOps, but always open to exploring any other existing or future technological fields. Continuously learning about everything around me—technology, people, nature—in order to contribute my small part to this world.

View more of Simón.