ASP源码
PHP源码
.NET源码
JSP源码
Qwen TTS 是一款旨在提供高保真语音合成且无需依赖云端 API 的本地 AI 技能。通过利用 Qwen3-TTS-12Hz-1.7B-CustomVoice 模型,这一 Openclaw Skills 库的新成员允许开发人员通过自然语言指令,以 10 种不同的语言生成语音,并对情感、语气和风格进行细粒度控制。在初始模型下载后,该技能完全脱机运行,确保了语音应用的数据隐私并降低了延迟。
该技能具有 9 个优质说话人配置文件,涵盖从年轻明亮到成熟醇厚的多种风格,使其在各种内容创作需求中具有极高的通用性。对于需要在保持对硬件和数据完全控制的同时,寻求可靠、高质量音频生成的开发人员来说,这是一个理想的解决方案。作为 Openclaw Skills 的一个组件,它可以无缝集成到更广泛的自动化和智能代理工作流中。
下载入口:https://github.com/openclaw/skills/tree/main/skills/paki81/qwen-tts
从源直接安装技能的最快方式。
npx clawhub@latest install qwen-tts
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 qwen-tts。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
首先,导航到技能目录并运行安装脚本以创建虚拟环境并安装依赖项:
cd skills/public/qwen-tts
bash scripts/setup.sh
请注意,首次生成语音时,系统会自动从 Hugging Face 下载 1.7GB 的模型。确保您的环境中有 Python 3.10-3.12 可用。对于特定地区的用户,可以在运行脚本之前通过设置 export HF_ENDPOINT=https://hf-mirror.com 来使用镜像。
该技能通过 CLI 接口和本地文件存储管理数据。它生成未压缩的音频文件并利用本地模型缓存。
| 属性 | 规格 |
|---|---|
| 输出格式 | WAV (未压缩) |
| 采样率 | 16kHz |
| 模型大小 | ~1.7GB (存储在本地缓存) |
| 环境大小 | ~500MB |
| 支持语言 | 中文、英语、日语、韩语、德语、法语、俄语、葡萄牙语、西班牙语、意大利语 |
| 元数据 | 说话人名称、语言标签和指令字符串 |
name: qwen-tts
description: Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.
Generate speech from text:
scripts/tts.py "Ciao, come va?" -l Italian -o output.wav
With voice instruction (emotion/style):
scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav
Different speaker:
scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav
First-time setup (one-time):
cd skills/public/qwen-tts
bash scripts/setup.sh
This creates a local virtual environment and installs qwen-tts package (~500MB).
Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.
scripts/tts.py [options] "Text to speak"
-o, --output PATH - Output file path (default: qwen_output.wav)-s, --speaker NAME - Speaker voice (default: Vivian)-l, --language LANG - Language (default: Auto)-i, --instruct TEXT - Voice instruction (emotion, style, tone)--list-speakers - Show available speakers--model NAME - Model name (default: CustomVoice 1.7B)Basic Italian speech:
scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav
With emotion/instruction:
scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav
Different speaker:
scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav
List available speakers:
scripts/tts.py --list-speakers
The CustomVoice model includes 9 premium voices:
| Speaker | Language | Description |
|---|---|---|
| Vivian | Chinese | Bright, slightly edgy young female |
| Serena | Chinese | Warm, gentle young female |
| Uncle_Fu | Chinese | Seasoned male, low mellow timbre |
| Dylan | Chinese (Beijing) | Youthful Beijing male, clear |
| Eric | Chinese (Sichuan) | Lively Chengdu male, husky |
| Ryan | English | Dynamic male, rhythmic |
| Aiden | English | Sunny American male |
| Ono_Anna | Japanese | Playful female, light nimble |
| Sohee | Korean | Warm female, rich emotion |
Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).
Use -i, --instruct to control emotion, tone, and style:
Italian examples:
"Parla con entusiasmo""Tono serio e professionale""Voce calma e rilassante""Leggi come un narratore"English examples:
"Speak with excitement""Very happy and energetic""Calm and soothing voice""Read like a narrator"The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:
# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav
Setup fails:
# Ensure Python 3.10-3.12 is available
python3.12 --version
# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh
Model download slow/fails:
# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav
Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.
Audio quality issues:
--list-speakers-i "Speak clearly and slowly"-l Italian for Italian text