POST TIME:2021-08-22 21:18
通過實現(xiàn) UniMRCP 的 plugin,我們可以封裝訊飛、百度、阿里等廠家的 ASR 接口,實現(xiàn)我們自己的 MRCP 服務(wù)器。
媒體資源控制協(xié)議(Media Resource Control Protocol, MRCP)是一種通訊協(xié)議,用于媒體資源服務(wù)器向客戶端提供各種語音服務(wù),目前已定義的媒體資源服務(wù)有語音識別(Speech Recognition)、語音合成(Speech Synthesis)、錄音(Recording)、說話人鑒別和確認(Speaker Verification and Identifi-cation)。MRCP并不定義會話連接,不關(guān)心服務(wù)器與客戶端是如何連接的,MRCP消息使用RTSP、SIP等作為控制協(xié)議,目前最新的MRCPv2版本使用SIP控制協(xié)議。(本文使用的是MRCPv2)。
本文所有操作均在 CentOS 7 下進行。
UniMRCP is an open source cross-platform implementation of the MRCP client and server in the C/C++ language distributed under the terms of the Apache License 2.0. The implementation encapsulates SIP, RTSP, SDP, MRCPv2, RTP/RTCP stacks and provides integrators with an MRCP version consistent API.
首先去官網(wǎng)下載“UniMRCP 1.5.0”和“UniMRCP Deps 1.5.0”。
切換到 root 賬戶,首先進入 Deps 目錄進行依賴安裝:
1 |
./build-dep-libs.sh |
UniMRCP 安裝可參考官網(wǎng):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
./bootstrap The usual "configure", "make", "make install" sequence of commands should follow in order to build and install the project from source. ./configure make make install As a result, the project will be installed in the directory "/usr/local/unimrcp" with the following layout: bin binaries (unimrcpserver, unimrcpclient, ...) conf configuration files (unimrcpserver.xml, unimrcpclient.xml, ...) data data files include header files lib shared (convenience) libraries log log files plugin run-time loadable modules |
安裝完成后,可進入/usr/local/unimrcp/bin目錄下,運行 server:
1 |
./unimrcpserver -o 3 |
啟動成功后會提示“MRCP Server Started”。我們可以使用提供的 Client 進行驗證:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
./unimrcpclient
.
.
.
>help
usage:
- run [app_name] [profile_name] (run demo application)
app_name is one of 'synth', 'recog', 'bypass', 'discover'
profile_name is one of 'uni2', 'uni1', ...
examples:
run synth
run recog
run synth uni1
run recog uni1
|
如上圖所示,啟動完 Client 后,可輸入run synth等命令,可以觀察 Server 和 Client 端的日志,synth 是語音合成,recog 是語音解析。
直接從源代碼切入其實是比較費勁的,我們可以結(jié)合服務(wù)器端的日志打印,從源代碼中找出相應(yīng)的調(diào)用過程。調(diào)用過程較復(fù)雜,后面只列出較為關(guān)鍵的部分。
首先看日志,這里我們篩選了 Demo Recog 的日志,其他 plugin 道理上是一樣的:
1 2 3 4 |
[INFO] Load Plugin [Demo-Recog-1] [/usr/local/unimrcp/plugin/demorecog.so] [INFO] Register MRCP Engine [Demo-Recog-1] [INFO] Open Engine [Recorder-1] [INFO] Start Task [Demo Recog Engine] |
通過上面的信息我們可以去搜索源代碼,查看一個 plugin 的加載流程。
下面是從配置文件解析到 plugin 到 .so 被加載的流程:
1 2 3 4 5 6 7 |
unimrcp_server.c
/** Load plugin */
static apt_bool_t unimrcp_server_plugin_load(unimrcp_server_loader_t *loader, const apr_xml_elem *root) {
...
engine = mrcp_server_engine_load(loader->server,plugin_id,plugin_path,config);
...
}
|
1 2 3 4 5 6 7 8 9 10 11 |
mrcp_server.c
/** Load MRCP engine */
MRCP_DECLARE(mrcp_engine_t*) mrcp_server_engine_load(
mrcp_server_t *server,
const char *id,
const char *path,
mrcp_engine_config_t *config) {
...
engine = mrcp_engine_loader_plugin_load(server->engine_loader,id,path,config);
...
}
|
1 2 3 4 5 6 7 |
mrcp_engine_loader.h
/** Load engine plugin */
MRCP_DECLARE(mrcp_engine_t*) mrcp_engine_loader_plugin_load(mrcp_engine_loader_t *loader, const char *id, const char *path, mrcp_engine_config_t *config) {
...
apr_dso_load(&plugin,path,loader->pool)
...
}
|
load 成功之后,注冊了該 engine:
1 2 3 4 5 6 7 |
unimrcp_server.c
/** Load plugin */
static apt_bool_t unimrcp_server_plugin_load(unimrcp_server_loader_t *loader, const apr_xml_elem *root) {
...
return mrcp_server_engine_register(loader->server,engine);
...
}
|
最終會加到 hash 表中:
1 2 3 4 5 6 7 8 |
mrcp_engine_factory.c
/** Register new engine */
MRCP_DECLARE(apt_bool_t) mrcp_engine_factory_engine_register(mrcp_engine_factory_t *factory, mrcp_engine_t *engine)
{
...
apr_hash_set(factory->engines,engine->id,APR_HASH_KEY_STRING,engine);
...
}
|
上面是 unimrcp_server_load調(diào)用后的一系列加載,成功之后將會啟動服務(wù)器:
1 2 3 4 5 6 7 8 9 10 |
unimrcp_server.c
/** Start UniMRCP server */
MRCP_DECLARE(mrcp_server_t*) unimrcp_server_start(apt_dir_layout_t *dir_layout)
{
...
unimrcp_server_load(server,dir_layout,pool)
...
mrcp_server_start(server)
...
}
|
1 2 3 4 5 6 7 |
apt_bool_t mrcp_engine_virtual_open(mrcp_engine_t *engine) {
...
mrcp_engine_iface.c
/** Open engine */
engine->method_vtable->open(engine)
...
}
|
method_vtable 就涉及到 plugin 具體是如何被調(diào)用的了。
通過查看具體的調(diào)用流程,在對比官網(wǎng) plugin 實現(xiàn)手冊,就很容易理解手冊里需要我們實現(xiàn)的接口具體是什么作用。
具體調(diào)用細節(jié)這里就不詳細展開了,最終對 plugin 的所有操作,都是通過下面三個虛表中的函數(shù)指針來進行回調(diào)觸發(fā)。
首先是 engine 層面的回調(diào),其實對應(yīng)的就是 plugin 的創(chuàng)建、打開、關(guān)閉、刪除:
1 2 3 4 5 6 7 8 9 10 11 |
/** Table of MRCP engine virtual methods */
struct mrcp_engine_method_vtable_t {
/** Virtual destroy */
apt_bool_t (*destroy)(mrcp_engine_t *engine);
/** Virtual open */
apt_bool_t (*open)(mrcp_engine_t *engine);
/** Virtual close */
apt_bool_t (*close)(mrcp_engine_t *engine);
/** Virtual channel create */
mrcp_engine_channel_t* (*create_channel)(mrcp_engine_t *engine, apr_pool_t *pool);
};
|
客戶端與服務(wù)器 plugin 通信時,在一個 session 內(nèi)會創(chuàng)建 channel,并在會話終止時銷毀該 channel。以下就是 channel 相關(guān)的回調(diào):
1 2 3 4 5 6 7 8 9 10 11 |
/** Table of channel virtual methods */
struct mrcp_engine_channel_method_vtable_t {
/** Virtual destroy */
apt_bool_t (*destroy)(mrcp_engine_channel_t *channel);
/** Virtual open */
apt_bool_t (*open)(mrcp_engine_channel_t *channel);
/** Virtual close */
apt_bool_t (*close)(mrcp_engine_channel_t *channel);
/** Virtual process_request */
apt_bool_t (*process_request)(mrcp_engine_channel_t *channel, mrcp_message_t *request);
};
|
當使用 ASR 時需要有音頻數(shù)據(jù)的流入,TTS 時需要有音頻數(shù)據(jù)的流出,下面的回調(diào)就是為了處理音頻數(shù)據(jù):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
/** Table of audio stream virtual methods */
struct mpf_audio_stream_vtable_t {
/** Virtual destroy method */
apt_bool_t (*destroy)(mpf_audio_stream_t *stream);
/** Virtual open receiver method */
apt_bool_t (*open_rx)(mpf_audio_stream_t *stream, mpf_codec_t *codec);
/** Virtual close receiver method */
apt_bool_t (*close_rx)(mpf_audio_stream_t *stream);
/** Virtual read frame method */
apt_bool_t (*read_frame)(mpf_audio_stream_t *stream, mpf_frame_t *frame);
/** Virtual open transmitter method */
apt_bool_t (*open_tx)(mpf_audio_stream_t *stream, mpf_codec_t *codec);
/** Virtual close transmitter method */
apt_bool_t (*close_tx)(mpf_audio_stream_t *stream);
/** Virtual write frame method */
apt_bool_t (*write_frame)(mpf_audio_stream_t *stream, const mpf_frame_t *frame);
/** Virtual trace method */
void (*trace)(mpf_audio_stream_t *stream, mpf_stream_direction_e direction, apt_text_stream_t *output);
};
|
通過對上面三個虛表內(nèi)回調(diào)方法的實現(xiàn),就可以對客戶端發(fā)送過來的相應(yīng)請求進行處理。
修改 configure.ac
因為 unimrcp 使用 automake 進行源碼編譯管理,所以除了添加源代碼,我們還需要進行相應(yīng)配置添加。
首先編輯 configure.ac 文件,添加如下,其實是一個宏定義會在后面的 Makefile 中使用到,以及添加后面我們新增的 Makefile:
1 2 3 4 5 6 7 8 9 10 11 12 |
dnl XFyun recognizer plugin.
UNI_PLUGIN_ENABLED(xfyunrecog)
AM_CONDITIONAL([XFYUNRECOG_PLUGIN],[test "${enable_xfyunrecog_plugin}" = "yes"])
...
plugins/xfyun-recog/Makefile
...
echo XFyun recognizer plugin....... : $enable_xfyunrecog_plugin
|
新增源代碼及目錄
在 plugin 目錄下,新建 xfyun-recog 目錄,并在該目錄下新建 src 目錄,可以將 demo_recog_engine.c 拷貝到該目錄下改名為 xfyun_recog_engine.c,并將源代碼中的所有“demo”替換為“xfyun”,當然也可以自己從 0 開始敲一遍。
新建 Makefile.am 文件,內(nèi)容如下:
1 2 3 4 5 6 7 8 |
AM_CPPFLAGS = $(UNIMRCP_PLUGIN_INCLUDES) plugin_LTLIBRARIES = xfyunrecog.la xfyunrecog_la_SOURCES = src/xfyun_recog_engine.c xfyunrecog_la_LDFLAGS = $(UNIMRCP_PLUGIN_OPTS) include $(top_srcdir)/build/rules/uniplugin.am |
修改 plugin 目錄下的 Makefile.am 文件,新增如下內(nèi)容:
1 2 3 |
if XFYUNRECOG_PLUGIN SUBDIRS += xfyun-recog endif |
XFYUNRECOG_PLUGIN 就是 configure.ac 里面我們添加的內(nèi)容。
最終目錄結(jié)構(gòu)如下圖(請忽略紅框外的文件):

完成后我們可以從第一步開始重新把 UniMRCP 編譯一遍,應(yīng)該可以看到 xfyun_recog_engine.so 的生成。
首先去訊飛開放平臺下載語言聽寫及在線語音合成(后面 TTS 實現(xiàn)時用到)的SDK。
在 plugin 目錄下新建 third-party 目錄,將訊飛的 SDK 拷貝進去:

修改 xfyun_recog_engine 的 Makefile.am,添加對訊飛庫的鏈接及安裝時的拷貝:
1 2 3 4 5 6 7 8 9 10 11 12 |
plugin_LTLIBRARIES = xfyunrecog.la
xfyunrecog_la_SOURCES = src/xfyun_recog_engine.c
xfyunrecog_la_LDFLAGS = $(UNIMRCP_PLUGIN_OPTS) \
-L$(top_srcdir)/plugins/third-party/xfyun/libs/x64 \
-lmsc -ldl -lpthread -lrt
xfyunrecog_ladir = $(libdir)
xfyunrecog_la_DATA = $(top_srcdir)/plugins/third-party/xfyun/libs/x64/libmsc.so
include $(top_srcdir)/build/rules/uniplugin.am
UNIMRCP_PLUGIN_INCLUDES += -I$(top_srcdir)/plugins/third-party/xfyun/include
|
訊飛的實現(xiàn)可以參考官方文檔和 SDK 里面提供的 asr_sample。

引用頭文件
1 2 3 4 |
#include <stdlib.h> #include "qisr.h" #include "msp_cmn.h" #include "msp_errors.h" |
channel 新增變量
1 2 3 4 5 6 |
struct xfyun_recog_channel_t {
...
const char *session_id; //訊飛session_id
const char *last_result; //存放識別結(jié)果
apt_bool_t recog_started; //是否已開始識別
};
|
訊飛 login
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
static apt_bool_t xfyun_login()
{
int ret = MSP_SUCCESS;
const char* login_params = "appid = 5ac1c462, work_dir = ."; // 登錄參數(shù),appid與msc庫綁定,請勿隨意改動
/* 用戶登錄 */
ret = MSPLogin(NULL, NULL, login_params); //第一個參數(shù)是用戶名,第二個參數(shù)是密碼,均傳NULL即可,第三個參數(shù)是登錄參數(shù)
if (MSP_SUCCESS != ret)
{
apt_log(RECOG_LOG_MARK,APT_PRIO_ERROR,"[xfyun] MSPLogin failed , Error code %d.", ret);
return FALSE; //登錄失敗,退出登錄
}
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] MSPLogin success");
return TRUE;
}
|
我們在創(chuàng)建 engine 的時候調(diào)用該函數(shù)即可。
訊飛 session 創(chuàng)建、終止
首先我們需要找到 session 創(chuàng)建、終止的時機。xfyun_recog_msg_process是處理 channel 中的 request 的回調(diào),RECOGNIZER_RECOGNIZE 正是請求識別,所以我們在請求時創(chuàng)建 session,識別結(jié)束或者 RECOGNIZER_STOP 時終止該 session。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
/** Process RECOGNIZE request */
static apt_bool_t xfyun_recog_channel_recognize(mrcp_engine_channel_t *channel, mrcp_message_t *request, mrcp_message_t *response)
{
...
/* reset */
int errcode = MSP_SUCCESS;
const char* session_begin_params = "sub = iat, domain = iat, language = zh_cn, accent = mandarin, sample_rate = 8000, result_type = plain, result_encoding = utf8";
recog_channel->session_id = QISRSessionBegin(NULL, session_begin_params, &errcode); //聽寫不需要語法,第一個參數(shù)為NULL
if (MSP_SUCCESS != errcode)
{
apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRSessionBegin failed! error code:%d\n", errcode);
return FALSE;
}
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] QISRSessionBegin suceess!");
recog_channel->last_result = NULL;
recog_channel->recog_started = FALSE;
recog_channel->recog_request = request;
}
void xfyun_recog_end_session(xfyun_recog_channel_t *recog_channel){
if(recog_channel->session_id) {
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] QISRSessionEnd suceess!");
QISRSessionEnd(recog_channel->session_id, "mrcp channel closed");
recog_channel->session_id = NULL;
}
}
|
處理語音流
xfyun_recog_stream_write是收到語音流的回調(diào),很顯然具體的識別處理應(yīng)該在這個里面進行調(diào)用,下面是具體的識別函數(shù):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
static apt_bool_t xfyun_recog_stream_recog(xfyun_recog_channel_t *recog_channel,
const void *voice_data,
unsigned int voice_len
) {
// int MSPAPI QISRAudioWrite(const char* sessionID, const void* waveData, unsigned int waveLen, int audioStatus, int *epStatus, int *recogStatus);
int aud_stat = MSP_AUDIO_SAMPLE_CONTINUE; //音頻狀態(tài)
int ep_stat = MSP_EP_LOOKING_FOR_SPEECH; //端點檢測
int rec_stat = MSP_REC_STATUS_SUCCESS; //識別狀態(tài)
int ret = 0;
if(FALSE == recog_channel->recog_started) {
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] start recog");
recog_channel->recog_started = TRUE;
aud_stat = MSP_AUDIO_SAMPLE_FIRST;
} else if(0 == voice_len) {
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] finish recog");
aud_stat = MSP_AUDIO_SAMPLE_LAST;
}
if(NULL == recog_channel->session_id) {
return FALSE;
}
ret = QISRAudioWrite(recog_channel->session_id, voice_data, voice_len, aud_stat, &ep_stat, &rec_stat);
if (MSP_SUCCESS != ret)
{
apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRAudioWrite failed! error code:%d", ret);
return FALSE;
}
if(MSP_REC_STATUS_SUCCESS != rec_stat && MSP_AUDIO_SAMPLE_LAST != aud_stat) {
// apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] no need recog,rec_stat=%d,aud_stat=%d",rec_stat,aud_stat);
return TRUE;
}
while (1)
{
const char *rslt = QISRGetResult(recog_channel->session_id, &rec_stat, 0, &ret);
if (MSP_SUCCESS != ret)
{
apt_log(RECOG_LOG_MARK,APT_PRIO_WARNING,"[xfyun] QISRGetResult failed, error code: %d", ret);
return FALSE;
}
if (NULL != rslt)
{
if(NULL == recog_channel->last_result) {
recog_channel->last_result = apr_pstrdup(recog_channel->channel->pool,rslt);
} else {
// recog_channel->last_result = apr_psprintf(recog_channel->channel->pool,"%s%s",recog_channel->last_result,rslt);
recog_channel->last_result = apr_pstrcat(recog_channel->channel->pool, recog_channel->last_result,rslt);
}
}
apt_log(RECOG_LOG_MARK,APT_PRIO_INFO,"[xfyun] Get recog result:%s",rslt);
if(MSP_AUDIO_SAMPLE_LAST == aud_stat && MSP_REC_STATUS_COMPLETE != rec_stat) {
usleep(150*1000);
continue;
}
break;
}
return TRUE;
}
|
發(fā)送識別結(jié)果
當xfyun_recog_stream_write中檢測到語音結(jié)束或者沒有任何輸入時,調(diào)用xfyun_recog_recognition_complete發(fā)送結(jié)束的消息,在該函數(shù)中我們就可以讀出最終的解析結(jié)果發(fā)送出去:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
/* Load xfyun recognition result */
static apt_bool_t xfyun_recog_result_load(xfyun_recog_channel_t *recog_channel, mrcp_message_t *message)
{
apt_str_t *body = &message->body;
if(!recog_channel->last_result) {
return FALSE;
}
body->buf = apr_psprintf(message->pool,
"<?xml version=\"1.0\"?>\n"
"<result>\n"
" <interpretation confidence=\"%d\">\n"
" <instance>%s</instance>\n"
" <input mode=\"speech\">%s</input>\n"
" </interpretation>\n"
"</result>\n",
99,
recog_channel->last_result,
recog_channel->last_result);
if(body->buf) {
mrcp_generic_header_t *generic_header;
generic_header = mrcp_generic_header_prepare(message);
if(generic_header) {
/* set content type */
apt_string_assign(&generic_header->content_type,"application/x-nlsml",message->pool);
mrcp_generic_header_property_add(message,GENERIC_HEADER_CONTENT_TYPE);
}
body->length = strlen(body->buf);
}
return TRUE;
}
|
端點檢測問題
下面的方法進行了語音的端點檢測,在實際調(diào)試時,有遇到通話的 level 最低始終是 8,低于默認的閾值 2,可以適當?shù)恼{(diào)高默認值,從而避免出現(xiàn)始終不會識別到語音結(jié)束的情況。
1 |
MPF_DECLARE(mpf_detector_event_e) mpf_activity_detector_process(mpf_activity_detector_t *detector, const mpf_frame_t *frame) |
重新編譯安裝后,我們還需要修改配置文件,使用我們自己的 engine。編輯conf/unimrcpserver.xml文件,啟用我們自己的 engine:
1 2 |
<engine id="Demo-Recog-1" name="demorecog" enable="false"/> <engine id="XFyun-Recog-1" name="xfyunrecog" enable="true"/> |
運行后就可以看到 xfyunrecog 被加載了。