Typesense, an open-source search engine software

Typesense is an open-source search engine software that can be installed on-premise, although it also offers a SaaS service. It is known for being a lightweight, open-source search engine that supports typo-tolerance, voice-query, image-query, among many other features. It’s heavily focused on retail, where such products are often mission-critical.

Retail focused

Typo tolerance

It offers vector search optimized for similarity-based queries and is especially powerful at handling typos, since it relies on LLM algorithms to build related vectors.

While a traditional “%{like}%” search can match a pattern, that {like} value must still exist as is and be part of a tuple.

In Typesense (which is more Document DB-oriented), the search is done across one or more fields of the entity, and it’s typo-tolerant out of the box, requiring no additional setup in your queries.

Analytics rules

Another advantage is that it includes event-based rules for collection items. In a retail search engine, concepts like popularity and ranking are crucial. Our search system should be “alive,” meaning it must continuously adapt based on user activity.

For example, a product that is frequently viewed should appear higher in search results, and a product that converts (is sold) should gain even more popularity than one that's simply browsed.

Typesense simplifies the implementation of this behavior.

Features of Typesense

In addition to the features mentioned above, Typesense offers:

Faceted navigation: search by facets.
Geo-Search: GIS-based search. Supports radius and polygon-based searches.
LLM Augmentation: technology powered by Large Language Models.
Voice query: voice-based search.
Image Query: image-based search.
Federated MultiSearch: allows parallel queries across multiple collections in a single request.
JOIN Queries: enables filtered queries across 2 or more collections, including one-to-one, one-to-many, many-to-many, and left joins.
Synonyms: allows you to define synonyms for search terms. For instance, searching “nike” can also match “footwear” if defined as a synonym.
Alias: query aliases let you target different versions of the same collection, enabling updates or reindexing without downtime.
In-Memory Search: to boost performance, Typesense operates entirely in memory for backup and restarts—meaning you’ll need a powerful host.

It also supports clustering and node synchronization, giving you the horizontal scalability needed for production environments.

Client integration

Typesense offers a wide range of client libraries for many programming languages.

It also provides web integration libraries that allow you to build a fully functional search experience with just a few lines of code.

On the server side, it offers client wrappers that make it easy to integrate with the Typesense HTTP API.

Use case: building a video store

Let’s put the theory into practice and build a search engine for an online store, step by step.

1 Infrastructure

We deploy a Docker image to test the server. Here's a sample docker-compose.yaml:

version: '3.7'

networks:
  typesense-demo-network:
    name: typesense-demo-network

services:
  typesense:
    image: typesense/typesense:27.1
    ports:
      - "8108:8108"
    volumes:
      - ./typesense-data:/data
    command: '--data-dir /data --api-key=xyz --enable-cors --enable-search-analytics=true --analytics-dir=/analytics-data --analytics-flush-interval=60'
    networks:
      - typesense-demo-network

Although it's self-explanatory, it's worth noting that the –enable-search-analytics=true parameter must be set to “true” in order to use the analytics rules we discussed earlier.

UI Infrastructure

By default, Typesense servers don’t include a web admin interface, but you can use this utility to manage it locally. When accessing the site, provide your localhost credentials, which simply means entering the API key: “xyz”.

Typesense dashboard login screen with "API key" field to be filled with "xyz"

Once logged in, you’ll access the dashboard where you can view all your CPU cores and their statuses. As you can see, the admin interface is quite comprehensive.

2 Administration: creating collections

The first step is to define a collection with its attributes. In this case, we’ll create a collection where we can add movies to our video store. We define the structure of the elements in a file:

films_collection_v1.json

{
    "name": "films_v1",
    "fields": [
      {
        "name": "filmId",
        "type": "string",
        "optional": false
      },
      {
        "name": "name_es_ES",
        "type": "string"
      },
      {
        "name": "name_en_GB",
        "type": "string"
      },
      {
        "name": "actors",
        "type": "string[]",
        "facet": true
      },
      {
        "name": "popularity",
        "type": "int32",
        "sort": true,
        "optional": false
      },
      {
        "name": "image",
        "type": "string",
        "facet": false
      },
      {
        "name": "quantity",
        "type": "int64",
        "optional": false
      }
    ],
    "default_sorting_field": "popularity"
  }

It’s worth highlighting the attributes:

default_sorting_field: specifies that when performing a GET on the collection, the movies will be returned sorted by this field (it should obviously be numeric).
type: indicates the data type.
optional: defines whether the field is optional or required (in our case, popularity, id, and quantity are mandatory).
facet: enables faceting.

Next, we create the v1 collection using the API:

curl "http://localhost:8108/collections" \
      -X POST \
      -H "X-TYPESENSE-API-KEY: xyz" \
      --data-binary @./products_collection_v1.json

NOTE: we could also have created the collection using the Java SDK.

We can view the collection through the web UI under the “Collections” section:

Typesense "collection" section where we can browse our collections

Registering movies

Let’s register a few movies and their quantities. The API accepts a JSONList (.jsonl), which is a file where each row is a complete JSON object (each row is separated only by a carriage return).

films.jsonl:

{"id":"0", "filmId": "001-0","name_es_ES": "Sueños de fuga", "name_en_GB": "The Shawshank Redemption", "actors": ["Tim Robbins", "Morgan Freeman", "Bob Gunton"], "popularity": 8,"image": "https://picsum.photos/200", "quantity":23  }
{"id":"1","filmId": "001-1","name_es_ES": "Origen", "name_en_GB": "Inception", "actors": ["Leonardo DiCaprio", "Joseph Gordon-Levitt", "Ellen Page"], "popularity": 1,"image": "https://picsum.photos/200", "quantity":100 }
{"id":"2","filmId": "022-2","name_es_ES": "El caballero oscuro", "name_en_GB": "The Dark Knight","actors": ["Christian Bale", "Heath Ledger", "Aaron Eckhart"],  "popularity": 2,"image": "https://picsum.photos/200", "quantity":33 } 
{"id":"3","filmId": "023-3","name_es_ES": "Pulp Fiction","name_en_GB": "Pulp Fiction", "actors": ["John Travolta", "Uma Thurman", "Samuel L. Jackson"], "popularity": 8,"image": "https://picsum.photos/200", "quantity":47 }
{"id":"4","filmId": "023-4","name_es_ES": "Forrest Gump","name_en_GB": "Forrest Gump","actors": ["Tom Hanks", "Robin Wright", "Gary Sinise"], "popularity": 5,"image": "https://picsum.photos/200", "quantity":125 }
{"id":"5","filmId": "d32-5","name_es_ES": "El padrino","name_en_GB": "The Godfather","actors": ["Marlon Brando", "Al Pacino", "James Caan"], "popularity": 5,"image": "https://picsum.photos/200", "quantity":1727 }
{"id":"6","filmId": "011-6","name_es_ES": "Matrix","name_en_GB": "Matrix","actors": ["Keanu Reeves", "Laurence Fishburne", "Carrie-Anne Moss"], "popularity": 1,"image": "https://picsum.photos/200", "quantity":345 }
{"id":"7","filmId": "011-7","name_es_ES": "Gladiator","name_en_GB": "Gladiator","actors": ["Russell Crowe", "Joaquin Phoenix", "Connie Nielsen"], "popularity": 1,"image": "https://picsum.photos/200", "quantity":876 }
{"id":"8","filmId": "077-8","name_es_ES": "El rey león","name_en_GB": "The Lion King","actors": ["Matthew Broderick", "James Earl Jones", "Jeremy Irons"], "popularity": 2,"image": "https://picsum.photos/200", "quantity":734 }
{"id":"9","filmId": "077-9","name_es_ES": "Titanic","name_en_GB": "Titanic","actors": ["Leonardo DiCaprio", "Kate Winslet", "Billy Zane"], "popularity": 5,"image": "https://picsum.photos/200", "quantity":976 }
{"id":"10","filmId":"df0-10","name_es_ES": "Los vengadores","name_en_GB": "The Avengers","actors": ["Robert Downey Jr", "Chris Hemsworth", "Scarlett Johansson"], "popularity": 0,"image": "https://picsum.photos/200", "quantity":35 }
{"id":"11","filmId":"a01-11","name_es_ES": "Parque Jurásico","name_en_GB": "Jurassic Park","actors": [ "Sam Neill", "Laura Dern", "Jeff Goldblum"], "popularity": 5,"image": "https://picsum.photos/200", "quantity":11 }
{"id":"12","filmId":"001-12","name_es_ES": "El lobo de Wall Street","name_en_GB": "The Wolf of Wall Street","actors": ["Leonardo DiCaprio", "Jonah Hill", "Margot Robbie"], "popularity": 3,"image": "https://picsum.photos/200", "quantity":437 }
{"id":"13","filmId":"023-13","name_es_ES": "El padrino: Parte II","name_en_GB": "The Godfather: Part II","actors": [ "Al Pacino", "Robert De Niro", "Diane Keaton"], "popularity": 7,"image": "https://picsum.photos/200", "quantity":221 }
{"id":"14","filmId":"002-14","name_es_ES": "El silencio de los corderos","name_en_GB": "The Silence of the Lambs","actors": [ "Jodie Foster", "Anthony Hopkins", "Lawrence A. Bonney"], "popularity": 6,"image": "https://picsum.photos/200", "quantity":732 }
{"id":"15","filmId":"000-15","name_es_ES": "La vida es bella","name_en_GB": "La vita e bella","actors": [ "Roberto Benigni", "Horst Buchholz", "Marisa Paredes"], "popularity": 12,"image": "https://picsum.photos/200", "quantity":15 }
....
....

Next, we publish it through the API:

curl "http://localhost:8108/collections/films_v1/documents/import?action=create" \
      -X POST \
      -H "X-TYPESENSE-API-KEY: xyz" \
      --data-binary @./films.jsonl

Retrieving movies through the API

Let's request a complete list without applying any filters:

curl "http://localhost:8108/collections/films_v1/documents/search?q=*" \
      -X GET \
      -H "X-TYPESENSE-API-KEY: xyz" | jq .

{
  "facet_counts": [],
  "found": 16,
  "hits": [
    {
      "document": {
        "actors": [
          "Roberto Benigni",
          "Horst Buchholz",
          "Marisa Paredes"
        ],
        "filmId": "000-15",
        "id": "15",
        "image": "https://picsum.photos/200",
        "name_en_GB": "La vita e bella",
        "name_es_ES": "La vida es bella",
        "popularity": 12,
        "quantity": 15
      },
      "highlight": {},
      "highlights": []
    },
    {
      "document": {
        "actors": [
          "John Travolta",
          "Uma Thurman",
          "Samuel L. Jackson"
        ],
        "filmId": "023-3",
        "id": "3",
        "image": "https://picsum.photos/200",
        "name_en_GB": "Pulp Fiction",
        "name_es_ES": "Pulp Fiction",
        "popularity": 8,
        "quantity": 47
      },

.....

It returns the elements in the hits array sorted by the “default_sorting_field”: “popularity”.

Fast UI Creation

We can find Typesense libraries on various CDNs that abstract away development and allow us to build a functional search engine in just a few minutes.

<script src="https://cdn.jsdelivr.net/npm/instantsearch.js@4.44.0"></script>
<script src="https://cdn.jsdelivr.net/npm/typesense-instantsearch-adapter@2/dist/typesense-instantsearch-adapter.min.js"></script>

We will use the example from their official GitHub and adapt it to the model of our collection. We change the connector and widget section:

<script>
    const typesenseInstantsearchAdapter = new TypesenseInstantSearchAdapter({
        server: {
            apiKey: 'xyz', // Be sure to use an API key that only allows searches, in production
            nodes: [
                {
                    host: 'localhost',
                    port: '8108',
                    protocol: 'http',
                },
            ],
        },
        // The following parameters are directly passed to Typesense's search API endpoint.
        //  So you can pass any parameters supported by the search endpoint below.
        //  queryBy is required.
        //  filterBy is managed and overridden by InstantSearch.js. To set it, you want to use one of the filter widgets like refinementList or use the `configure` widget.
        additionalSearchParameters: {
            queryBy: 'name_es_ES,name_en_GB,actors',
        },
    });
    const searchClient = typesenseInstantsearchAdapter.searchClient;

    const search = instantsearch({
        searchClient,
        indexName: 'films_v1',
    });

    search.addWidgets([
        instantsearch.widgets.searchBox({
            container: '#searchbox',
        }),
        instantsearch.widgets.configure({
            hitsPerPage: 8,
        }),
        instantsearch.widgets.hits({
            container: '#hits',
            templates: {
                item(item) {
                    return `
                        <div>
                          <img src="${item.image}" alt="${item.name_es_ES}" height="100" />
                          <div class="hit-name">
                            ${item._highlightResult.name_es_ES.value} (${item._highlightResult.name_en_GB.value})
                          </div>
                          <div class="hit-authors">
                          ${item._highlightResult.actors.map((a) => a.value).join(', ')}
                          </div>
                          <div class="hit-publication-year">Quantity ${item.quantity}</div>
                          <div class="hit-rating">${item.popularity} rating</div>
                        </div>
                      `;
                },
            },
        }),
        instantsearch.widgets.pagination({
            container: '#pagination',
        }),
    ]);

    search.start();
</script>

We visit the page and... voilà! We have the items paginated and sorted by popularity.

Collections on the page sorted by popularity

Search Text Box

If we look at the widget instantiation in the additionalSearchParameters field, we specify which fields should be matched against the user input value. In this case, the input will be matched against name_es_ES, name_en_GB, and actors.

additionalSearchParameters: {
            queryBy: 'name_es_ES,name_en_GB,actors',
},

Typo Tolerant

Let’s see how it behaves when we search for “obert”:

Results when searching for "obert" on the portal

However, if we search for just “obe”, we see that no results are returned:

Strange, right? This happens because, to avoid unnecessary searches, you need to specify the minimum number of characters in the search value for typo correction to be applied.

This value is called min_len_1typo, and its default is 4. If a search term has fewer than 4 characters, “typo tolerant” search is not applied. This makes sense—searching for “a”, for example, would yield infinite results, which is not what we want.

min_len_1typo configuration: default minimum length is 4

To test it out, let’s make a request specifying typo tolerance with a minimum of 3 characters, and we’ll see how results are returned this time:

curl "http://localhost:8108/collections/films_v1/documents/search?q=obe&query_by=name_es_ES,name_en_GB,actors&min_len_1typo=3" \
      -X GET \
      -H "X-TYPESENSE-API-KEY: xyz" | jq .

{
  "facet_counts": [],
  "found": 3,
  "hits": [
    {
      "document": {
        "actors": [
          "Roberto Benigni",
          "Horst Buchholz",
          "Marisa Paredes"
        ],
        "filmId": "000-15",
        "id": "15",
        "image": "https://picsum.photos/200",
        "name_en_GB": "La vita e bella",
        "name_es_ES": "La vida es bella",
        "popularity": 12,
        "quantity": 15
      },
      "highlight": {
        "actors": [
          {
            "matched_tokens": [

.... 
.... 
.... 
      "document": {
        "actors": [
          "Al Pacino",
          "Robert De Niro",
          "Diane Keaton"
        ],
        "filmId": "023-13",
        "id": "13",
        "image": "https://picsum.photos/200",
        "name_en_GB": "The Godfather: Part II",
        "name_es_ES": "El padrino: Parte II",
        "popularity": 7,
        "quantity": 221
      },
      "highlight": {
        "actors": [
....
...
....
    {
      "document": {
        "actors": [
          "Robert Downey Jr",
          "Chris Hemsworth",
          "Scarlett Johansson"
        ],
        "filmId": "df0-10",
        "id": "10",
        "image": "https://picsum.photos/200",
        "name_en_GB": "The Avengers",
        "name_es_ES": "Los vengadores",
        "popularity": 0,
        "quantity": 35
      },
      "highlight": {
        "actors": [
          {
  ...
  ...

Additionally, it continues to respect the popularity order. You can check all the typo-tolerance configuration parameters here.

3 Analytics Rules

As we mentioned at the beginning of this post, we can create rules based on events occurring within our search system.

Let’s create a rule so that, when a movie is viewed, its popularity increases by 1 point, and if a unit of the movie is purchased, its popularity increases by 2.

We define the rule films_v1_click_rule.json:

{
    "name": "films_click_events",
    "type": "counter",
    "params": {
        "source": {
            "collections": ["films_v1"],
            "events":  [
                {"type": "click", "weight": 1, "name": "films_click_events"},
                {"type": "conversion","weight": 2,"name": "films_purchase_event"}
            ]
        },
        "destination": {
            "collection": "films_v1",
            "counter_field": "popularity"
        }
    }
}

Typesense supports these 3 types of events: click, conversion, and visit:

Table showing the event types supported by Typesense: click, conversion, visit

We use the API operation to create the 2 rules associated with the films_v1 collection:

curl "http://localhost:8108/analytics/rules" \
      -X POST \
      -H "X-TYPESENSE-API-KEY: xyz" \
      -H "Content-Type: application/json" \
      --data-binary @./films_v1_click_rule.json

Triggering the rule

If we recall, the movie "Life is Beautiful" (id:15) has a popularity of 12. Let’s fire an event indicating that someone has viewed that movie:

curl "http://localhost:8108/analytics/events" -X POST \
     -H "X-TYPESENSE-API-KEY: xyz" \
     -d '{
            "type": "click",
            "name": "films_click_events",
            "data": {
                  "doc_id": "15",
                  "user_id": "Antonio Volkaniski Garcia"
            }
        }'

{"ok": true}

Note: when querying again, it’s very likely that the popularity value hasn’t yet increased because, after changing a record’s value (which should now become 13), a “mini reindexing” must occur — a costly process for the collection. The analytics-flush-interval=60 setting indicates that events will be stored, but only materialized every 60 seconds, so this process runs just once for all modified items.

After 60 seconds (analytics-flush-interval) we can see that the rule has been materialized and the movie has gained one popularity point, increasing to 13.

In the case of a conversion event, it would make more sense not to trigger it from the frontend but from a business process on the server, as it’s tied to a purchase flow. The official documentation shows how to trigger this rule from Java, and the following would be an example approach:

AnalyticsEventCreateSchema analyticsEvent = new AnalyticsEventCreateSchema()
        .type("conversion")
        .name("films_purchase_event")
        .data(Map.of(
                "doc_id", "15",
                "user_id", "Paco el de los palotes",
                "amount", 1"
        ));

client.analytics().events().create(analyticsEvent);

4 Alias Creation

It’s highly recommended to access collection queries through an alias.
Imagine a reindexing process triggered because we want to modify the schema of our movie model (films_v1).

It’s necessary to reindex all the documents again to prepare the searches.
This is a costly process and the collection will be locked and inaccessible until it’s finished.
In a production environment, this is not acceptable.

Since an alias has the flexibility to point to a collection, we create one pointing to films_v1.

curl "http://localhost:8108/aliases/films/" -X PUT \
    -H "Content-Type: application/json" \
    -H "X-TYPESENSE-API-KEY: xyz" -d '{
        "collection_name": "films_v1"
    }'

Now, all queries will point to the alias “films”, which will internally reference the collection films_v1.

curl "http://localhost:8108/collections/films/documents/search?q=*" \
      -X GET \
      -H "X-TYPESENSE-API-KEY: xyz"

Using alias-based access makes it very easy to perform changes without service disruption:

We can create a new collection called films_v2 with the new fields and start a data migration process from films_v1 to films_v2.
Our alias points to films_v1, so there is no service downtime.
Once the migration is complete, we simply update the alias to point to films_v2.

curl "http://localhost:8108/aliases/films/" -X PUT \
    -H "Content-Type: application/json" \
    -H "X-TYPESENSE-API-KEY: xyz" -d '{
        "collection_name": "films_v2"
    }'

Conclusions

We’ve seen that Typesense is lightweight, highly optimized (voice_query, image_query), typo-tolerant, deeply retail-focused, and, above all, very easy to use.

Additionally, it is already integrated into the new Spring Boot starters based on AI, even if still in its 1.0.0 Beta version.