Dan Davis

Local Function Calling With Mistral 7B and vLLM

Setup

My server specs:

I recommend using Ubuntu 22.04 to simplify setting up the Nvidia drivers and CUDA. Also be sure to download the Nvidia Container Toolkit.

I am using the full precision bfoat16 version of Mistral-7B-Instruct-v0.3 instead of a quantized model because I want to optimize for throughput over latency. See this article from Neural Magic for more info on quantization trade-offs.

Here is how to run:

sudo docker run \
    --runtime nvidia \
    --gpus all \
    -v $HOME/.cache/huggingface:/root/.cache/huggingface \
    -v $HOME/chat-templates/tool_chat_template_mistral.jinja:/root/tool_chat_template_mistral.jinja \
    -p 8000:8000 \
    --ipc=host \
    -it vllm/vllm-openai:latest \
    --model mistralai/Mistral-7B-Instruct-v0.3 \
    --served-model-name mistral-7B \
    --gpu-memory-utilization 0.95 \
    --tool-call-parser mistral \
    --chat-template /root/tool_chat_template_mistral.jinja \
    --enable-auto-tool-choice

I had to download the correct chat template for function calling and make it available in the docker container.

Ignore this warning:

FutureWarning: It is strongly recommended to run mistral models with `--tokenizer_mode "mistral"` to ensure correct encoding and decoding.

Function calling will not work if you pass this flag to vLLM as it does not allow changing the chat template.

Here is a quick script to check if function calling is working:

import argparse

import instructor
import requests
from openai import OpenAI
from openai.types.chat import ChatCompletionMessageParam
from pydantic import BaseModel, Field


# Model must match vllm --served-model-name
MODEL = "mistral-7B"
# set a default seed and temperature for more determinism
SEED = 32**4
TEMPERATURE = 0
# HOST is the ip of vLLM server
HOST = "192.168.2.9"


client = instructor.from_openai(
    OpenAI(base_url=f"http://{HOST}:8000/v1", api_key="mistral"),
    mode=instructor.Mode.TOOLS,
)

# make pyright/mypy happy
Messages = list[ChatCompletionMessageParam]


class WeatherForcast(BaseModel):
    city: str
    state: str = Field(
        description="Either the state abbreviation if US or ISO 3166-1 alpha-2 country code"
    )

    def execute(self) -> str:
        url = f"https://wttr.in/{self.city},{self.state}"
        r = requests.get(url)
        return r.text


def main():
    parser = argparse.ArgumentParser(description="Mistral weather bot")
    parser.add_argument("prompt", type=str, help="Your prompt")
    args = parser.parse_args()

    messages: Messages = [
        {
            "role": "user",
            "content": args.prompt,
        }
    ]

    forecast, completion = client.chat.completions.create_with_completion(
        model=MODEL,
        seed=SEED,
        temperature=TEMPERATURE,
        response_model=WeatherForcast,
        tool_choice={"function": {"name": "auto"}},
        messages=messages,
    )

    # mistral tool call ids must be exactly 9 characters
    openai_tool_call_id = completion.choices[0].message.tool_calls[0].id
    mistral_tool_call_id = openai_tool_call_id[-9:]

    messages.append(
        {
            "role": "tool",
            "name": "WeatherForecast",
            "content": forecast.execute(),
            "tool_call_id": mistral_tool_call_id,
        }
    )

    answer = client.chat.completions.create(
        model=MODEL, response_model=str, messages=messages
    )

    print(f"\n> {answer}")


if __name__ == "__main__":
    main()

Now we can run this like:

$ python mistral-tools.py "I'm flying to new york tomorrow, How should I pack?"

> To pack for your trip to New York, consider the following: Since the weather forecast shows overcast and cloudy conditions with temperatures ranging from 62°F to 68°F, you should pack layers, including a light jacket or sweater. Rain is expected on Wednesday, so it would be a good idea to bring an umbrella and waterproof shoes. The wind is expected to be around 8-14 mph, so remember to pack a hat or headwear to protect against the wind. Finally, be prepared for some rain showers on Thursday with  temperatures ranging from 73°F to 78°F. Have a great trip!