Continuing the series on running LLMs locally, in this post we’ll look at an alternative to Ollama that is also widely used in the market, so we can better understand their differences and similarities. In this case, we’ll focus on LM Studio and how it works.

Would you like to check out the previous posts in the series?

What is LM Studio?

Like Ollama, LM Studio is an application for managing LLMs locally, which you can install on different operating systems (macOS, Linux, and Windows) with the corresponding minimum system requirements. Its key features include:

Installation

In this case, we run LM Studio using the Linux installation. Once the program is downloaded (you may need to download Chrome and grant execution permissions to the installer using the command chmod +x LM-Studio-0.3.23-3-x64.AppImage), the following command is executed:

./LM-Studio-0.3.23-3-x64.AppImage --no-sandbox

With this, the application will be launched:

LM Studio running
LM Studio running

By following the installer steps, you can choose the level of customization you want:

Customization level in LM Studio
Customization level in LM Studio

Next, you’ll be presented with the chat interface (if a step to download a model directly appears, it can be skipped):

Chat interface in LM Studio
Chat interface in LM Studio

In addition to searching for models within the application itself, the LM Studio website also provides a list of available models, including descriptions and possible configurations:

Available models in LM Studio
Available models in LM Studio

CLI commands

LM Studio provides a CLI to interact with models via commands. The CLI is one of the sections included in the graphical interface:

CLI in LM Studio
CLI in LM Studio

Another option is to install the CLI directly on our system so it can be executed from the terminal. This is the option we’ll follow in this article using the Ubuntu operating system. To do so, you need to run the following command:

npx lmstudio install-cli
CLI commands

The commands available in LM Studio are:

  1. LOAD

A command that loads a model into memory. You can specify parameters such as context length, GPU disabling, or TTL. It’s important to note that there is no command to directly interact with the loaded model via the CLI; instead, the model is loaded so it becomes available for interaction through the graphical interface.

lms load google/gemma-3-1b

lms load google/gemma-3-1b --context-length 4096 

lms load google/gemma-3-1b --gpu off

lms load google/gemma-3-1b --ttl 3600
CLI commands: load
CLI commands: load context length
CLI commands: load gpu off
  1. UNLOAD

A command that unloads a model from memory. You can specify the --all option to unload all models.

lms unload google/gemma-3-1b

lms unload --all
CLI commands: unload
  1. GET

A command used to search for and download models from remote repositories. If no model name is specified, some recommended models are displayed. Downloaded models are typically located in the ~/.cache/lm-studio/ or ~/.lm-studio/models directories.

lms get google/gemma-3-1b

lms get --mlx    # filter by MLX model format

lms get --gguf   # filter by GGUF model format

lms get --limit 5   # limit the number of results
CLI commands: get
CLI commands: get mlx
CLI commands: get gguf
CLI commands: get limit 5
  1. SERVER START

Command used to start the local LM Studio server, allowing you to specify the port and enable CORS support.

lms server start

lms server start --port 3000

lms server start --cors
CLI commands: server start
  1. SERVER STATUS

Command that shows the current status of the local LM Studio server, as well as its configuration.

lms server status

lms server status --verbose

lms server status --quiet

lms server status --log-level debug
CLI commands: server status
  1. SERVER STOP

Command used to stop the local server.

lms server stop
CLI commands: server stop
  1. LS

Command that lists the models downloaded locally, showing information such as size, architecture, and number of parameters.

lms ls

lms ls --llm        # only show LLM-type models

lms ls --embedding  # only show embedding models

lms ls --detailed

lms ls --json
CLI commands: LS
CLI commands: LS LLM
CLI commands: LS JSON
  1. PS

Command that lists the models currently loaded in memory.

lms ps

lms ps --json
CLI commands: PS
CLI commands: PS JSON
  1. CLONE

Command used to download the model.yaml files (this file is explained in more detail in a later section), the README, and other metadata files (it does not download the model weights).

lms clone google/gemma-3-1b
CLI commands: clone
  1. LOG STREAM

Command that allows you to inspect the prompts that are sent to the model exactly as they are.

lms log stream
CLI commands: log stream
  1. PUSH

Command that packages the contents of the current directory and uploads it to the LM Studio Hub so models can be shared with other users.

lms push

API

As with other tools, there are two types of API endpoints: OpenAI-compatible endpoints and native ones. This functionality is especially important from a developer’s perspective, as it enables integrations with applications.

OpenAI-compatible endpoints

In this case, the available endpoints are:

/v1/models
/v1/models
/v1/chat/completions
/v1/chat/completions
/v1/embeddings
/v1/embeddings
/v1/completions
/v1/completions

Native endpoints

It is important to note that this is a beta feature and requires LM Studio version 0.3.6 or higher. The available endpoints are:

/api/v0/models
/api/v0/models
/api/v0/models/{model}
/api/v0/models/{model}
/api/v0/chat/completions
/api/v0/chat/completions
/api/v0/completions
/api/v0/completions
/api/v0/embeddings
/api/v0/embeddings

In addition to the interaction methods described above (UI, CLI, and API), there are SDKs for Python and TypeScript, allowing you to call LM Studio directly using preconfigured methods and functions.

Model.yaml

LM Studio is also building (still in draft form) a centralized and standardized way to manage different models. In this case, it does so through a yaml file, which allows you to describe a model and all its variants, custom metadata, or even custom logic. This approach delegates responsibility to the runtime, which then selects the most appropriate model variant to download and run.

There are several sections for building a model.yaml:

model: google/gemma-3-1b
base:
  - key: lmstudio-community/gemma-3-1B-it-QAT-GGUF
    sources:
      - type: huggingface
        user: lmstudio-community
        repo: gemma-3-1B-it-QAT-GGUF
  - key: mlx-community/gemma-3-1b-it-qat-4bit
    sources:
      - type: huggingface
        user: mlx-community
        repo: gemma-3-1b-it-qat-4bit
metadataOverrides:
  domain: llm
  architectures:
    - gemma3
  compatibilityTypes:
    - gguf
    - safetensors
  paramsStrings:
    - 1B
  minMemoryUsageBytes: 754974720
  trainedForToolUse: false
  vision: false
config:
  operation:
    fields:
      - key: llm.prediction.topKSampling
        value: 20
      - key: llm.prediction.minPSampling
        value:
          checked: true
          value: 0
customFields:
  - key: enableThinking
    displayName: Enable Thinking
    description: Controls whether the model will think before replying
    type: boolean
    defaultValue: true
    effects:
      - type: setJinjaVariable
        variable: enable_thinking

For this example to work, the Jinja template must have the enable_thinking variable defined.

suggestions:
  - message: The following parameters are recommended for thinking mode
    conditions:
      - type: equals
        key: $.enableThinking
        value: true
    fields:
      - key: llm.prediction.temperature
        value: 0.6

I’m sharing here an example of a complete file.

It is important to note that, at the time of writing this post, model.yaml is focused on customizing models to publish them on the LM Studio Hub, so they can later be downloaded (using the lms get command) and used.

Since this feature is currently marked as beta, it is possible that in the future it will work in a way similar to Ollama and its Modelfiles, allowing you to create and run these custom models directly from your own machine, without the need to upload them to a registry or Hub.

Importing external models

Also in an experimental phase, LM Studio allows importing GGUF-format models that were downloaded outside of the LM Studio ecosystem. To use these models, we first run the import command:

lms import ./llama-3.2-1b-instruct-q4_k_m.gguf

And then we can run it just like any other model already available in the system.

Importing external models in LM Studio

Conclusions

Continuing the series on running LLMs locally, we’ve explored LM Studio as an alternative to Ollama. LM Studio offers a set of features very similar to Ollama’s, while also providing a user interface for model interaction and management, making it a solid option for running LLMs locally.

In the next post, we’ll talk about a third option: Llamafile, and finally about running LLMs locally with Docker. I’ll see you in the comments!

References

Tell us what you think.

Comments are moderated and will only be visible if they add to the discussion in a constructive way. If you disagree with a point, please, be polite.

Subscribe