云计算百科
云计算领域专业知识百科平台

在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!

前提条件:服务器已安装docker 1.下载镜像: 1.0.0-300I-Duo-py311-openeuler24.03-lts 备注:官网镜像下载,需要申请,审批还得1,2天,这时你肯定想骂HW!没事,我已为您准备好了:请发私信! 申请地址: https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f 在这里插入图片描述 2.下载模型:魔乐社区(https://modelers.cn/models/Models_Ecosystem/QwQ-32B)

服务器上安装社区下载的比较快:

pip install modelscope

modelscope download "Qwen/QwQ-32B" –local_dir "/home/models/qwq"

注意事项:模型上传到服务器需要给于模型下config.json权限

chmod 750 config.json

3.docker 启动

注意映射的模型文件到服务器中:

docker run -it -d –net=host –shm-size=50g –privileged –name qwq-i –device=/dev/davinci_manager –device=/dev/hisi_hdc –device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro -v /home/models/qwq:/home/models/qwq:rw swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-300I-Duo-py311-openeuler24.03-lts

4.进入docker容器中(以下的操作全部是在docker容器中)

编辑配置文件:

注意点: ipAddress: 本地服务器IP httpsEnabled : false, 关闭https modelName:模型名称 modelWeightPath:模型路径(容器内的) npuDeviceIds:显卡ID (根据自己情况,npu-smi info 查看)

vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json

{
"Version" : "1.1.0",
"LogConfig" :
{
"logLevel" : "Info",
"logFileSize" : 20,
"logFileNum" : 20,
"logPath" : "logs/mindservice.log"
},

"ServerConfig" :
{
"ipAddress" : "192.168.0.203",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : true,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm"
},

"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 32580,
"maxInputTokenLen" : 30000,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "qwen",
"modelWeightPath" : "/home/models/qwq",
"worldSize" : 4,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},

"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,

"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 30000,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,

"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,

"maxBatchSize" : 200,
"maxIterTimes" : 4096,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}

启动

cd /usr/local/Ascend/mindie/latest/mindie-service/bin

./mindieservice_daemon

看到如下界面就启动成功了!

在这里插入图片描述

测试:如果防火墙没关,请放开1025端口!

sudo firewall-cmd –permanent –add-port=1025/tcp

sudo firewall-cmd –reload

接口地址: post:http://192.168.0.202:1025/v1/chat/completions

{
"model": "qwen",
"messages": [{"role": "user", "content": "你是谁"}],
"max_tokens": 32768,
"stream": false
}

在这里插入图片描述 显卡使用情况:达到88% 在这里插入图片描述 deepseek: 310P 芯片仅支持FP16精度,并不支持BF16或INT8等数据类型,因此需要到模型权重文件中修改config.json: 和上述的操作一致:只需要将下载的模型的config.json中的 dtype改为:float16后保存 在这里插入图片描述

赞(0)
未经允许不得转载:网硕互联帮助中心 » 在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!
分享到: 更多 (0)

评论 抢沙发

评论前必须登录!