In the previous post, we took our first dive into Spring AI and its APIs. We talked about classes like ChatClient and ChatModel, which are the main entry points for interacting with LLMs. In this post, we continue exploring the module to look at features like multimodularity, prompts, and something as cross-cutting as observability in models.

Multimodularity

Humans acquire knowledge from various "data sources" (images, sound, text, etc.), so it's fair to say that our experiences are multimodal. In contrast, Machine Learning traditionally focused on a single modality—until recently, when models began to appear that can process multiple inputs, such as image and text, or audio and video at the same time.

Multimodularity refers to a model's ability to process information from multiple sources, and Spring AI, through its Message API, provides this support. The content field in the UserMessage class is used for text, while the media field allows the inclusion of additional content such as images, audio, or video, specified by the corresponding MimeType.

@RestController
public class MultimodalController {

    private OllamaChatModel chatModel;
    
    public MultimodalController(OllamaChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/multimodal")
    String multimodal() {


    ClassPathResource imageResource = new ClassPathResource("/static/multimodal.jpg");

    UserMessage userMessage = new UserMessage("Explain to me what you see in the image", new Media(MimeTypeUtils.IMAGE_JPEG, imageResource));

    return chatModel.call(new Prompt(List.of(userMessage))).getResult().getOutput().getText();
    }
    
}

To see it in action, we’ll request a description of the following image:

robot arm pointing upward against a blue background with white dots connected by white lines

We have asked him in Spanish and we get responses like:

The image shows a blue background with a series of white dots, similar to a star constellation or a star map. In the foreground, there is a person holding what appears to be a smartphone or tablet in their hands, with the screen showing more of these white dots, possibly indicating connectivity or data transmission. The overall effect gives the impression of digital interconnectivity, with the human element suggesting that technology is an essential part of our lives.
1 What we see in the image is a scene from a futuristic or technological environment. In the background, there is a figure illuminated by a blue light source, with silhouettes of several data points or connector nodes forming a network. This suggests that the setting is a place where advanced technologies are used, possibly related to neural networks, IoT (Internet of Things), Big Data, or similar. 2 3 In the foreground, a hand emerging from the left side of the frame appears to be reaching for or manipulating the virtual data network. This action may represent real-time data system control or manipulation, indicating a context of security, application development, or science. 4 5 The environment is dark, but the blue lighting creates a magical and technological atmosphere, combining elements of fantasy with the reality of technology. The image appears to be part of a presentation or promotional material related to technology and its impact on the future.

As shown, the model used can produce strange responses, even mixing languages. Another important aspect to keep in mind is the response time (in general, for all model calls), so it's essential to handle asynchrony as well as performance improvements in the code (such as using virtual threads).

Prompts

As mentioned earlier, prompts are the inputs sent to the models. Their design and structure significantly influence the model’s responses. Ultimately, Spring AI handles prompts with models similarly to how the “view” layer is handled in Spring MVC. This involves creating markers that are replaced with the appropriate value to send content dynamically.

The structure of prompts has evolved along with AI. Initially, they were just simple strings, but now many include placeholders for specific inputs (for example: USER:<user>). OpenAI has introduced more structured prompts, categorizing different strings by roles.

Roles

Each message is associated with a role. Roles categorize the message, providing context and purpose for the model, improving response effectiveness. The main roles are:

In Spring AI, roles are defined as an enum.

public enum MessageType {

    USER("user"),

    ASSISTANT("assistant"),

    SYSTEM("system"),

    TOOL("tool");

    ...
}

Prompt Template

The PromptTemplate class is key to managing prompt templates, making it easier to create structured prompts.

public class PromptTemplate implements PromptTemplateActions, PromptTemplateMessageActions {}

The interfaces implemented by the class provide different aspects for prompt creation:

Some examples of using PromptTemplate:

@GetMapping("/simple")
String simplePromptTemplate(@RequestParam String adjective, @RequestParam String country) {

    PromptTemplate promptTemplate = new PromptTemplate("Give a {adjective} city from {country}");

    Prompt prompt = promptTemplate.create(Map.of("adjective", adjective, "country", country));

    return chatModel.call(prompt).getResult().getOutput().getText();
}
@GetMapping("/system")

String systemPromptTemplate(@RequestParam String topic) {

    String userText = """
            Give me information about Barcelona.
            Respond with at least 5 lines.
            """;

    Message userMessage = new UserMessage(userText);

    String systemText = """
            You are an AI assistant that helps people with information about cities.
            You must respond with information about the city on the topic {topic}.
            Respond as if you were writing a travel blog.
            """;

    SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemText);
    Message systemMessage = systemPromptTemplate.createMessage(Map.of("topic", topic));

    Prompt prompt = new Prompt(List.of(userMessage, systemMessage));

    return chatModel.call(prompt).getResult().getOutput().getText();
}

In addition to using Strings for prompt creation, Spring AI supports creating prompts through resources (Resource), allowing you to define the prompt in a file that can then be used as a PromptTemplate:

@Value("classpath:/static/prompts/system-message.st")
private Resource systemResource;

SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemResource);

Prompt Engineering

As mentioned earlier, the quality and structure of prompts have a significant impact on model responses, giving rise to a new profession. In the developer community, there's ongoing analysis and sharing of best practices to improve prompts in various scenarios, which leads to several key points for creating effective prompts, such as:

Spring’s official documentation links to multiple resources to improve prompts, and there is also a large community online dedicated to this task.

Observability

As with any existing Spring framework/module, the observability section is essential to maintain and understand how applications behave, especially when issues arise. Spring AI provides metrics and traces for various components discussed earlier, such as ChatClient, ChatModel, and others to be covered later.

Some of the metrics and traces offer insights like parameters passed to the model, response times, tokens, and models used, etc. Special care must be taken with what gets logged, as it could potentially leak sensitive user data. You can find all available metrics and data in the official documentation.

By enabling tracing with Actuator, Zipkin, and OpenTelemetry dependencies, you can access metrics such as:

list of metrics that can be enabled

These include information such as tokens, model used, response times, etc.

Metric extraction for token
Metric extraction for gen ai client operation

Moreover, when enabling traceability with Zipkin and OpenTelemetry, you will also get traces in the Zipkin server:

Zipkin server traces screenshot
More traces in the Zipkin server

If you look closely, you can see the user input (property spring.ai.chat.client.user.text). This is possible because it was enabled using the property spring.ai.chat.client.observations.include-input, though Spring AI does show a warning about this in the trace logs, as it may expose sensitive user information.

Spring AI warning about sensitive data exposure

Demo with Ollama

With all the prior knowledge, we can now build a demo using Ollama. Once Ollama is up and running in your system, the app can be created with various endpoints to test different Spring AI features:

Example recommendation of small cities to travel to, with descriptions
Example of travel recommendation filtered by population under 1000
{
    "result": {
        "output": {
            "messageType": "ASSISTANT",
            "metadata": {
                "messageType": "ASSISTANT"
            },
            "toolCalls": [],
            "media": [],
            "text": "..."
        },
        "metadata": {
            "finishReason": "stop",
            "contentFilters": [],
            "empty": true
        }
    },
    "metadata": {
        "id": "",
        "model": "llama3.2:1b",
        "rateLimit": {
            "requestsLimit": 0,
            "requestsRemaining": 0,
            "requestsReset": "PT0S",
            "tokensLimit": 0,
            "tokensRemaining": 0,
            "tokensReset": "PT0S"
        },
        "usage": {
            "promptTokens": 34,
            "completionTokens": 428,
            "totalTokens": 462
        },
        "promptMetadata": [],
        "empty": false
    },
    "results": [
        {
            "output": {
                "messageType": "ASSISTANT",
                "metadata": {
                    "messageType": "ASSISTANT"
                },
                "toolCalls": [],
                "media": [],
                "text": "..."
            },
            "metadata": {
                "finishReason": "stop",
                "contentFilters": [],
                "empty": true
            }
        }
    ]
}
Travel recommendation / entity

Note that this endpoint is very prone to failure since correctly transforming the model’s response into JSON format often fails (this is still one of the key areas pending improvement in both Spring AI and the models, as they frequently do not generate valid JSON).

1 Sure! Here are 5 made-up city names: 2 3 1. Aerthys - a city nestled in a valley surrounded by mountains, known for its natural beauty. 4 2. Eldrida - a medieval city with ancient walls and a historic bridge crossing the river. 5 3. Nyxoria - a futuristic city on the west coast, known for its solar towers and vibrant nightlife. 6 4. Luminaria - a charming city with white stone-paved streets and glass-roofed buildings reflecting sunlight. 7 5. Caelum - a centrally located city known for its modern architecture and stunning botanical gardens.
Madrid is the capital and largest city of Spain. It is located in the center of the country and is known for its rich history, vibrant culture, and modern architecture. Some popular attractions in Madrid include the Royal Palace, Prado Museum, Puerta del Sol, and Retiro Park.
Hi! As an AI assistant, I’m excited to help with any question you have about the city of Barcelona. Barcelona is a vibrant and exciting city in northeastern Spain. It’s known for its beautiful beaches, unique architecture, and delicious traditional dishes like paella and fried calamari. It’s also a city that has developed into one of the most technological hubs in the world. Known for being a pioneer in the tech sector with leading companies like La Caixa Móvil, Vodafone, and Nokia, among others. Innovation in science and health. Numerous startups are working on technological solutions that could change the world. If you’re interested in technology, you may want to visit key companies or attend tech events and conferences held regularly in Barcelona. You can also visit science museums like the Natural and Technological Sciences Museum to learn more about technology and its advances. If you’re thinking of moving to or working in Barcelona, it’s worth noting that the city also has top educational institutions like the University of Barcelona and Pompeu Fabra University offering various programs and tech specializations. In short, if you're into tech, Barcelona is a vibrant and leading city in the field.
London is a city full of history and culture, located in southern England. It's known as one of the most important financial centers in the world, home to numerous banks and companies. In terms of business and commerce, London offers a wide range of job opportunities across sectors. The city hosts major multinational firms as well as many small and medium enterprises that drive economic growth. The city also holds key trade shows and events year-round such as the International Fashion and Accessories Fair, the London Auto Show, and the London International Exhibition Fair. But it's not just a business city — it also provides visitors with a rich and diverse cultural atmosphere. Known for its Gothic and Neoclassical architecture, London features many historic and artistic landmarks. It’s also home to renowned museums and art galleries with valuable collections. Lastly, London offers a unique culinary experience with restaurants and pubs offering traditional English fare and global cuisines. In summary, London provides a unique mix of culture, history, and business opportunities.

The application includes the dependencies and configuration properties needed to enable observability of the various requests made to the models. To export traces to a Zipkin server, you can spin up a local Zipkin instance with Docker using the following command:

docker run -d -p 9411:9411 openzipkin/zipkin

In general, it should also be noted that depending on the prompt provided and the accuracy of the model, nonsensical responses or outputs that do not meet the specified requirements may occur. This can result in formatting errors or even memory-related issues, with traces such as:

java.lang.RuntimeException: [500] Internal Server Error - {"error":"model requires more system memory (6.1 GiB) than is available (5.3 GiB)"}

You can download the sample application code at this link.

Conclusion

In this second part of the Spring AI series, we’ve explored how to run multimodal models as well as how to manage model input through various prompt techniques. Another crucial section in any Spring module is observability, and here it offers relevant metrics on model behavior, which are essential for both the application’s performance and potential LLM usage billing.

While what we’ve seen so far can be considered the "Hello World" of AI, it’s important to note that this is the most basic layer of the Spring AI module — and it already provides plenty of capabilities to consider for real-world use cases, especially those involving chatbots.

In future posts, we’ll dive into more advanced features of Spring AI. Let me know what you think in the comments! 👇

References

Tell us what you think.

Comments are moderated and will only be visible if they add to the discussion in a constructive way. If you disagree with a point, please, be polite.

Subscribe