从零开始构建 Ollama + MCP 服务器

Model Context Protocol（模型上下文协议）在过去几个月里已经霸占了大家的视野，出现了许多酷炫的集成示例。我坚信它会成为一种标准，因为它正在定义工具与代理或软件与 AI 模型之间如何集成的新方式。

我决定尝试将 Ollama 中的一个小型 LLM 连接到一个 MCP 服务器上，感受一下这个新标准的魅力。今天，我想向大家展示一种将 Ollama 与 MCP 服务器集成的可能实现方式。在这里插入图片描述

魔法配方

集成的主要步骤如下：

创建一个测试用的 MCP 服务器。

创建一个客户端文件，用于发送请求并启动服务器。

将服务器的工具获取到客户端。

将工具转换为 Pydantic 模型。

通过响应的 format 字段将工具（作为 Pydantic 模型）传递给 Ollama。

通过 Ollama 发送对话并接收结构化输出。

如果响应中包含工具，则向服务器发起请求。

安装依赖

要运行这个项目，需要安装所需的包。fastmcp 库在使用 uv 运行代码时表现最佳。它像 Poetry 和 pip 一样，下载方便，使用简单。

使用以下命令将所需的库添加到项目中：

uv add fastmcp ollama

这将安装 MCP 服务器和 Ollama 聊天库，你可以在此基础上构建客户端和服务器逻辑。

文件结构

设置完成后，你的文件夹结构应该如下所示：

your folder
├── server.py
└── client.py

server.py 文件包含 MCP 服务器和你想要暴露的工具。client.py 文件在后台进程启动服务器，获取可用工具，并与 Ollama 连接。

示例 MCP 服务器

让我们从创建一个简单的 MCP 服务器开始，使用 fastmcp 库。服务器暴露了一个名为 magicoutput 的工具。该函数接受两个字符串输入，并返回一个固定的字符串作为输出。

使用 @mcp.tool() 装饰器将函数注册为 MCP 服务器中的可用工具。服务器启动后，任何客户端都可以获取并调用这个工具。

通过在主块中调用 mcp.run() 启动服务器。

# server.py
from fastmcp import FastMCP
# 创建一个 MCP 服务器
mcp = FastMCP("TestServer")
# 我的工具：
@mcp.tool()
def magicoutput(obj1: str, obj2: str) –> int:
"""使用这个函数来获取神奇的输出"""
return "WomboWombat"
if __name__ == "__main__":
mcp.run()

获取服务器工具

为了连接到 MCP 服务器并列出可用工具，我们使用 ClientSession、StdioServerParameters 和 stdio_client，它们都来自 mcp 库。

我们定义了一个名为 OllamaMCP 的类，用于处理服务器连接和工具获取。在类中，_async_run 方法启动一个异步会话，初始化它，并从服务器获取工具列表。

我们使用 threading.Event() 来跟踪会话何时准备就绪，并将工具列表存储在 self.tools 中。

在脚本的末尾，我们定义了服务器参数，并在后台线程中运行客户端。这将启动连接并打印服务器返回的工具元数据。

# client.py
import asyncio
import threading
from pathlib import Path
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from typing import Any

class OllamaMCP:
"""
Ollama 和 FastMCP 之间的简单集成
"""
def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.initialized = threading.Event()
self.tools: list[Any] = []
def _run_background(self):
asyncio.run(self._async_run())
async def _async_run(self):
try:
async with stdio_client(self.server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
self.session = session
tools_result = await session.list_tools()
self.tools = tools_result.tools
print(tools_result)
except Exception as e:
print(f"初始化 MCP 服务器时出错 {str(e)}")
if __name__ == "__main__":
server_parameters = StdioServerParameters(
command="uv",
args=["run", "python", "server.py"],
cwd=str(Path.cwd())
)
ollamamcp = OllamaMCP(server_params=server_parameters)
ollamamcp._run_background()

运行上述代码后，你会从服务器收到以下响应，其中可以看到服务器上可用的工具列表。

[04/14/25 22:29:08] INFO 正在启动服务器 "TestServer"... server.py:171
INFO 正在处理请求类型 server.py:534
ListToolsRequest
meta=None nextCursor=None tools=[Tool(name='magicoutput', description='使用这个函数来获取神奇的输出', inputSchema={'properties': {'obj1': {'title': 'Obj1', 'type': 'string'}, 'obj2': {'title': 'Obj2', 'type': 'string'}}, 'required': ['obj1', 'obj2'], 'title': 'magicoutputArguments', 'type': 'object'})]

将工具转换为 Pydantic 模型

现在我们已经从服务器获取了工具列表，下一步是将它们转换为 Pydantic 模型。我们使用 Pydantic 的 create_model 动态定义一个新的响应模式，基于服务器的工具定义。还有一个辅助函数，用于将 JSON 类型映射为有效的 Python 类型。

这可以帮助我们动态定义模型，以便 LLM 精确地知道在返回工具参数时应使用什么结构。

# client.py
import asyncio
import threading
from pathlib import Path
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from typing import Any, Union, Optional
from pydantic import BaseModel, create_model, Field

class OllamaMCP:
"""Ollama 和 FastMCP 之间的简单集成"""

def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.initialized = threading.Event()
self.tools: list[Any] = []

def _run_background(self):
asyncio.run(self._async_run())

async def _async_run(self):
try:
async with stdio_client(self.server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
self.session = session
tools_result = await session.list_tools()
self.tools = tools_result.tools
except Exception as e:
print(f"初始化 MCP 服务器时出错 {str(e)}")

model = create_model(
class_name,
__base__=BaseModel,
__doc__=tool.description,
**properties,
)
dynamic_classes[class_name] = model

if dynamic_classes:
all_tools_type = Union[tuple(dynamic_classes.values())]
Response = create_model(
"Response",
__base__=BaseModel,
__doc__="LLm 响应类",
response=(str, Field(..., description= "确认将调用该函数的消息")),
tool=(all_tools_type, Field(
...,
description="用于运行并获取神奇输出的工具"
)),
)
else:
Response = create_model(
"Response",
__base__=BaseModel,
__doc__="LLm 响应类",
response=(str, ...),
tool=(Optional[Any], Field(None, description="如果不返回 None，则使用此工具")),
)

self.response_model = Response
print(Response.model_fields)

@staticmethod
def convert_json_type_to_python_type(json_type: str):
"""将 JSON 类型简单映射为 Python（Pydantic）类型"""
if json_type == "integer":
return (int, ...)
if json_type == "number":
return (float, ...)
if json_type == "string":
return (str, ...)
if json_type == "boolean":
return (bool, ...)
return (str, ...)

# 从零开始构建 Ollama + MCP 服务器

既然我最喜欢写的是小型语言模型（LLMs），我决定尝试将 Ollama 中的一个小型 LLM 连接到一个 MCP 服务器上，感受一下这个新标准的魅力。今天，我想向大家展示一种将 Ollama 与 MCP 服务器集成的可能实现方式。

## 魔法配方

集成的主要步骤如下：

1. 创建一个测试用的 MCP 服务器。
2. 创建一个客户端文件，用于发送请求并启动服务器。
3. 将服务器的工具获取到客户端。
4. 将工具转换为 Pydantic 模型。
5. 通过响应的 `format` 字段将工具（作为 Pydantic 模型）传递给 Ollama。
6. 通过 Ollama 发送对话并接收结构化输出。
7. 如果响应中包含工具，则向服务器发起请求。

## 安装依赖

要运行这个项目，需要安装所需的包。`fastmcp` 库在使用 `uv` 运行代码时表现最佳。它像 Poetry 和 pip 一样，下载方便，使用简单。

使用以下命令将所需的库添加到项目中：

```csharp
uv add fastmcp ollama

这将安装 MCP 服务器和 Ollama 聊天库，你可以在此基础上构建客户端和服务器逻辑。

文件结构

设置完成后，你的文件夹结构应该如下所示：

your folder
├── server.py
└── client.py

server.py 文件包含 MCP 服务器和你想要暴露的工具。client.py 文件在后台进程启动服务器，获取可用工具，并与 Ollama 连接。

示例 MCP 服务器

使用 @mcp.tool() 装饰器将函数注册为 MCP 服务器中的可用工具。服务器启动后，任何客户端都可以获取并调用这个工具。

通过在主块中调用 mcp.run() 启动服务器。

获取服务器工具

为了连接到 MCP 服务器并列出可用工具，我们使用 ClientSession、StdioServerParameters 和 stdio_client，它们都来自 mcp 库。

我们定义了一个名为 OllamaMCP 的类，用于处理服务器连接和工具获取。在类中，_async_run 方法启动一个异步会话，初始化它，并从服务器获取工具列表。

我们使用 threading.Event() 来跟踪会话何时准备就绪，并将工具列表存储在 self.tools 中。

在脚本的末尾，我们定义了服务器参数，并在后台线程中运行客户端。这将启动连接并打印服务器返回的工具元数据。

# client.py
import asyncio
import threading
from pathlib import Path
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from typing import Any

class OllamaMCP:
"""
Ollama 和 FastMCP 之间的简单集成
"""
def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.initialized = threading.Event()
self.tools: list[Any] = []
def _run_background(self):
asyncio.run(self._async_run())
async def _async_run(self):
try:
async with stdio_client(self.server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
self.session = session
tools_result = await session.list_tools()
self.tools = tools_result.tools
print(tools_result)
except Exception as e:
print(f"初始化 MCP 服务器时出错 {str(e)}")
if __name__ == "__main__":
server_parameters = StdioServerParameters(
command="uv",
args=["run", "python", "server.py"],
cwd=str(Path.cwd())
)
ollamamcp = OllamaMCP(server_params=server_parameters)
ollamamcp._run_background()

运行上述代码后，你会从服务器收到以下响应，其中可以看到服务器上可用的工具列表。

将工具转换为 Pydantic 模型

这可以帮助我们动态定义模型，以便 LLM 精确地知道在返回工具参数时应使用什么结构。

class OllamaMCP:
"""Ollama 和 FastMCP 之间的简单集成"""

def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.initialized = threading.Event()
self.tools: list[Any] = []

def _run_background(self):
asyncio.run(self._async_run())

model = create_model(
class_name,
__base__=BaseModel,
__doc__=tool.description,
**properties,
)
dynamic_classes[class_name] = model

self.response_model = Response
print(Response.model_fields)

运行代码后，print(Response.model_fields) 的输出将显示我们刚刚构建的响应模型的完整结构。该模型包括两部分：一部分是助手发送回用户的消息，另一部分是可选字段，用于保存工具参数。

如果模型填写了 tool 字段，我们将使用它来调用服务器。否则，我们只使用纯响应字符串。

uv run python –m convert_tools
[04/15/25 10:15:32] INFO 正在启动服务器 "TestServer"... server.py:171
INFO 正在处理请求类型 server.py:534
ListToolsRequest
{'response': FieldInfo(annotation=str, required=True, description='确认将调用该函数的消息'), 'tool': FieldInfo(annotation=Magicoutput, required=True, description='用于运行并获取神奇输出的工具')}

使用后台线程和队列调用工具

现在工具已经作为 Pydantic 模型可用，我们可以继续启用工具调用。为此，我们使用一个后台线程并设置两个队列。一个用于向服务器发送请求，另一个用于接收响应。

call_tool 方法将请求放入队列中，后台线程监听该请求。一旦使用 MCP 会话调用工具，结果将放入响应队列中。

import asyncio
import threading
import queue
from pathlib import Path
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from typing import Any, Union, Optional
from pydantic import BaseModel, create_model, Field

class OllamaMCP:
"""Ollama 和 FastMCP 之间的简单集成"""
def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.initialized = threading.Event()
self.tools: list[Any] = []
self.request_queue = queue.Queue()
self.response_queue = queue.Queue()
# 启动后台线程以异步处理请求。
self.thread = threading.Thread(target=self._run_background, daemon=True)
self.thread.start()
def _run_background(self):
asyncio.run(self._async_run())
async def _async_run(self):
try:
async with stdio_client(self.server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
self.session = session
tools_result = await session.list_tools()
self.tools = tools_result.tools
self.initialized.set()
while True:
try:
tool_name, arguments = self.request_queue.get(block=False)
except queue.Empty:
await asyncio.sleep(0.01)
continue
if tool_name is None:
print("收到关闭信号。")
break
try:
result = await session.call_tool(tool_name, arguments)
self.response_queue.put(result)
except Exception as e:
self.response_queue.put(f"错误: {str(e)}")
except Exception as e:
print("MCP 会话初始化错误:", str(e))
self.initialized.set() # 即使初始化失败，也要解除等待线程的阻塞。
self.response_queue.put(f"MCP 初始化错误: {str(e)}")

def call_tool(self, tool_name: str, arguments: dict[str, Any]) –> Any:
"""
发送工具调用请求并等待结果。
"""
if not self.initialized.wait(timeout=30):
raise TimeoutError("MCP 会话初始化超时。")
self.request_queue.put((tool_name, arguments))
result = self.response_queue.get()
return result

def shutdown(self):
"""
清理并关闭持久会话。
"""
self.request_queue.put((None, None))
self.thread.join()
print("持久 MCP 会话已关闭。")

def create_response_model(self):
dynamic_classes = {}
for tool in self.tools:
class_name = tool.name.capitalize()
properties = {}
for prop_name, prop_info in tool.inputSchema.get("properties", {}).items():
json_type = prop_info.get("type", "string")
properties[prop_name] = self.convert_json_type_to_python_type(json_type)
model = create_model(
class_name,
__base__=BaseModel,
__doc__=tool.description,
**properties,
)
dynamic_classes[class_name] = model
if dynamic_classes:
all_tools_type = Union[tuple(dynamic_classes.values())]
Response = create_model(
"Response",
__base__=BaseModel,
response=(str, ...),
tool=(Optional[all_tools_type], Field(None, description="如果不返回 None，则使用此工具")),
)
else:
Response = create_model(
"Response",
__base__=BaseModel,
response=(str, ...),
tool=(Optional[Any], Field(None, description="如果不返回 None，则使用此工具")),
)
self.response_model = Response
@staticmethod
def convert_json_type_to_python_type(json_type: str):
"""将 JSON 类型简单映射为 Python（Pydantic）类型"""
if json_type == "integer":
return (int, ...)
if json_type == "number":
return (float, ...)
if json_type == "string":
return (str, ...)
if json_type == "boolean":
return (bool, ...)
return (str, ...)
if __name__ == "__main__":
server_parameters = StdioServerParameters(
command="uv",
args=["run", "python", "server.py"],
cwd=str(Path.cwd())
)
ollamamcp = OllamaMCP(server_params=server_parameters)
if ollamamcp.initialized.wait(timeout=30):
print("已准备好调用工具。")
result = ollamamcp.call_tool(
tool_name="magicoutput",
arguments={"obj1": "dog", "obj2": "cat"}
)
print(result)
else:
print("错误：初始化超时。")

请注意，此时我们是手动传递函数名和参数，使用 call_tool 方法。在下一节中，我们将根据 Ollama 返回的结构化输出触发此调用。

运行此代码后，我们可以确认一切按预期工作。工具被服务器正确识别、执行，并返回结果。

[04/15/25 11:37:47] INFO 正在启动服务器 "TestServer"... server.py:171
INFO 正在处理请求类型 server.py:534
ListToolsRequest
已准备好调用工具。
INFO 正在处理请求类型 server.py:534
CallToolRequest
meta=None content=[TextContent(type='text', text='WomboWombat', annotations=None)] isError=False

Ollama + MCP

随着队列和 call_tool 函数准备就绪，现在是时候集成 Ollama 了。我们将响应类传递给 Ollama 的 format 字段，告诉我们的 LLM（这里使用 Gemma）在生成输出时遵循该模式。

我们还定义了一个 ollama_chat 方法，用于发送对话，验证模型的响应是否符合模式，并检查是否包含工具。如果是，则提取函数名和参数，然后使用持久的 MCP 会话在后台线程中调用它。

在 main 函数中，我们设置服务器连接，启动后台循环，并等待一切准备就绪。然后我们准备一个系统提示和用户消息，将它们发送到 Ollama，并等待结构化输出。

最后，我们打印服务器的结果并关闭会话。

import asyncio
import threading
import queue

from pathlib import Path
from typing import Any, Optional, Union
from pydantic import BaseModel, Field, create_model
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from ollama import chat

class OllamaMCP:

def __init__(self, server_params: StdioServerParameters):
self.server_params = server_params
self.request_queue = queue.Queue()
self.response_queue = queue.Queue()
self.initialized = threading.Event()
self.tools: list[Any] = []
self.thread = threading.Thread(target=self._run_background, daemon=True)
self.thread.start()

def _run_background(self):
asyncio.run(self._async_run())

while True:
try:
tool_name, arguments = self.request_queue.get(block=False)
except queue.Empty:
await asyncio.sleep(0.01)
continue

if tool_name is None:
break
try:
result = await session.call_tool(tool_name, arguments)
self.response_queue.put(result)
except Exception as e:
self.response_queue.put(f"错误: {str(e)}")
except Exception as e:
print("MCP 会话初始化错误:", str(e))
self.initialized.set() # 即使初始化失败，也要解除等待线程的阻塞。
self.response_queue.put(f"MCP 初始化错误: {str(e)}")

def shutdown(self):
"""
清理并关闭持久会话。
"""
self.request_queue.put((None, None))
self.thread.join()
print("持久 MCP 会话已关闭。")

def create_response_model(self):
"""
根据获取的工具动态创建 Pydantic 响应模型。
"""
dynamic_classes = {}
for tool in self.tools:
class_name = tool.name.capitalize()
properties: dict[str, Any] = {}
for prop_name, prop_info in tool.inputSchema.get("properties", {}).items():
json_type = prop_info.get("type", "string")
properties[prop_name] = self.convert_json_type_to_python_type(json_type)

model = create_model(
class_name,
__base__=BaseModel,
__doc__=tool.description,
**properties,
)
dynamic_classes[class_name] = model

async def ollama_chat(self, messages: list[dict[str, str]]) –> Any:
"""
使用动态响应模型向 Ollama 发送消息。
如果响应中包含工具，则使用持久会话调用它。
"""
conversation = [{"role":"assistant", "content": f"你必须使用工具。你有以下函数可用。{[ tool.name for tool in self.tools ]}"}]
conversation.extend(messages)
if self.response_model is None:
raise ValueError("响应模型尚未创建。请先调用 create_response_model()。")

# 获取聊天消息格式的 JSON 模式。
format_schema = self.response_model.model_json_schema()

# 调用 Ollama（假设是同步的）并解析响应。
response = chat(
model="gemma3:latest",
messages=conversation,
format=format_schema
)
print("Ollama 响应", response.message.content)
response_obj = self.response_model.model_validate_json(response.message.content)
maybe_tool = response_obj.tool

if maybe_tool:
function_name = maybe_tool.__class__.__name__.lower()
func_args = maybe_tool.model_dump()
# 使用 asyncio.to_thread 在线程中调用同步的 call_tool 方法。
output = await asyncio.to_thread(self.call_tool, function_name, func_args)
return output
else:
print("响应中未检测到工具。返回纯文本响应。")
return response_obj.response

async def main():
server_parameters = StdioServerParameters(
command="uv",
args=["run", "python", "server.py"],
cwd=str(Path.cwd())
)

# 创建持久会话。
persistent_session = OllamaMCP(server_parameters)

# 等待会话完全初始化。
if persistent_session.initialized.wait(timeout=30):
print("已准备好调用工具。")
else:
print("错误：初始化超时。")

# 根据获取的工具创建动态响应模型。
persistent_session.create_response_model()

# 为 Ollama 准备消息。

messages = [
{
"role": "system",
"content": (
"你是一个听话的助手，上下文中有工具列表。 "
"你的任务是使用该函数来获取神奇的输出。 "
"不要自己生成神奇的输出。 "
"用简短的消息回复，提到将调用该函数， "
"但不要提供函数输出本身。 "
"将该简短消息放在 'response' 属性中。 "
"例如：'好的，我将运行 magicoutput 函数并返回输出。' "
"同时在 'tool' 属性中填写正确的参数。 "
)
},
{
"role": "user",
"content": "使用该函数获取神奇的输出，参数为 (obj1 = Wombat 和 obj2 = Dog)"
}
]

# 调用 Ollama 并处理响应。
result = await persistent_session.ollama_chat(messages)
print("最终结果:", result)

# 关闭持久会话。
persistent_session.shutdown()

if __name__ == "__main__":
asyncio.run(main())

你可以看到输出完美无缺。我们收到了一个包含简短消息的 response，以及将发送到 MCP 服务器的 tool 和参数。最后，我们得到了服务器的输出，如下所示：

[04/15/25 09:52:49] INFO 正在启动服务器 "TestServer"... server.py:171
INFO 正在处理请求类型 server.py:534
ListToolsRequest
已准备好调用工具。
Ollama 响应 {
"response": "好的，我将运行 magicoutput，参数为 obj1 = Wombat 和 obj2 = Dog。",
"tool": {"obj1": "Wombat", "obj2": "Dog"}
}

[04/15/25 09:52:52] INFO 正在处理请求类型 server.py:534
CallToolRequest
最终结果: meta=None content=[TextContent(type='text', text='WomboWombat', annotations=None)] isError=False
持久 MCP 会话已关闭。

总结

就这样，我们刚刚完成了一种将 Ollama 中的小型模型与本地 MCP 服务器连接的方法。

我们从创建一个简单的 MCP 服务器开始，它包含一个工具。然后我们构建了一个客户端，用于连接到服务器，获取工具定义，并将它们转换为 Pydantic 模型。在此基础上，我们通过 format 字段将响应模型传递给 Ollama。模型返回了一个结构化响应，我们使用后台线程和队列在客户端处理工具调用。

点赞并关注我，获取更多类似内容。后期将扩展这个功能，构建一个完整的代理循环，并开始使用真实的 MCP 服务器创建有用的工作流。

从零开始构建 Ollama + MCP 服务器

魔法配方

安装依赖

文件结构

示例 MCP 服务器

获取服务器工具

将工具转换为 Pydantic 模型

文件结构

示例 MCP 服务器

获取服务器工具

将工具转换为 Pydantic 模型

使用后台线程和队列调用工具

Ollama + MCP

更多精彩内容

总结

相关推荐

评论抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

魔法配方

安装依赖

文件结构

示例 MCP 服务器

获取服务器工具

将工具转换为 Pydantic 模型

文件结构

示例 MCP 服务器

获取服务器工具

将工具转换为 Pydantic 模型

使用后台线程和队列调用工具

Ollama + MCP

更多精彩内容

总结

相关推荐

评论 抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

评论抢沙发