The VCDx API

This section defines the API endpoints for the VCDx service.

The https://{{host}}:{{port}}/tenants/{{tenant}}/audios/{{serviceName}} endpoint is the only exposed endpoint and it can be used for both health check and audio analysis based on the header and parameters passed in.

Example:

HTTP
    
​x
 
POST https://xDeTECH.customer.com:8099/tenants/idxauth/audios/testServiceContent-Type: application/jsonx-api-key: JKUGCNMVIA76UOU4VMSIWDSYIQ.A9D2EW1A37E8E2B5C097ACF49010AA68stream-id: 7f3a323b-a19d-4fab-8e43-8b2d71460598​{    "audioDataWav": "UklGRlSWAQB...."}
Copy

The {{tenant}} represents the tenant within TrustX that the x-api-key will be validated against. This tenant will be provided to you by support@daon.com. See Initial Configuration for details.

{{serviceName}} is required and is used for tracking purposes. See Metrics (VCDx < v1.1.2.15) for details.

Headers

The following headers are supported by the VCDx service.

Header Name	Description	Required
Content-Type	Set to application/json	Y
x-api-key	Provided by support@daon.com. See Initial Configuration for details.	Y
stream-id	Audio streams are identified by the stream-id HTTP header. This is used to group related requests together and to allow customers to identify the unique number of streams that have been processed. In the case of the voice gateway being used, the voice gateway will specify the stream-id and ensure that the ID is consistent between requests for the same audio call. If calling VCDx directly, specify a stream-id that is shared across multiple call segments, or generate a unique guid for each sample is treating each sample as a standalone audio sample.	Y
x-sp-process	If x-sp-process is included and set to `NO_PROCESS` , no audio processing will be performed. This is useful for the purpose of a ping/health check.	N

Ping/Health Check

Request

HTTP
    
 
GET https://xDeTECH.customer.com:8099/ping
Copy

Response

Success
    
 
{  "serviceName": "SentinelVCDX",  "serviceId": "1",  "systemTime": "2025-09-30T10:06:21.993+00:00",  "sentinelVersion": "2.0.0.0",  "redisEnabled": true,  "redisAvailable": true,  "daonVoiceCloneNativeVersion": "2.0.0.1",  "daonVoiceQCNativeVersion": "2.0.1.0",  "daonVoiceReplayNativeVersion": "2.2.0.1"}
Copy

Health Actuator

Request

HTTP
    
 
GET https://xDeTECH.customer.com:8099/actuator/health
Copy

Response

Success
    
 
{   "status":"UP",   "components":{      "livenessState":{         "status":"UP"      },      "readinessState":{         "status":"UP"      }   },   "groups":[      "liveness",      "readiness"   ]}
Copy

Analyze Audio Segment

Request

Simple Request
Specify Thresholds
    
 
POST https://xDeTECH.customer.com:8099/tenants/idxauth/audios/testServiceContent-Type: application/jsonx-api-key: JKUGCNMVIA76UOU4VMSIWDSYIQ.A9D2EW1A37E8E2B5C097ACF49010AA68stream-id: 7f3a323b-a19d-4fab-8e43-8b2d71460598​{    "audioDataWav": "UklGRnhIAQBXQVZFSlVOS..."}
Copy

Specifying Thresholds

The following parameters may be specified in the json request (See Specify Thresholds example above).

Property	Default Value	Description
replayConfidenceR1	`0.6293`	`r1` is the score value obtained from the replay detection model that detects low quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof
replayConfidenceR2	`0.05794`	`r2`is the score value obtained from the replay detection model that detects high quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof
cloneThresholds	`{` `"v1" : 0.01625,` `"v2" : 0.06082,` `"v3" : 0.678,` `"v4" : 0.4003` `}`	This config allows clone detection thresholds to be set. v1, v2, v3 and v4 are internal models. These thresholds should not be adjusted without confirmation from Daon Support.
minStoi	`0.73707`	Short-Time Objective Intelligibility, an objective measure designed to predict the intelligibility of speech, especially in noisy or processed audio.
minSiSdr	`-7.478`	Scale-Invariant Signal-to-Distortion Ratio, an objective metric that calculates the ratio of the power of the original signal to the power of the distortion (e.g., noise, artifacts introduced during processing).
minPesq	`1.2676`	Perceptual Evaluation of Speech Quality, standardized as ITU-T P.862, it is an objective metric designed to assess the quality of narrowband and wideband speech signals. It models human perception of speech quality by analysing both time and frequency domains and is often used to evaluate the quality of speech codecs, and other processing that affects audio quality
noQcTemplateIfQcFailed	`false`	This defines if the QC snapin will still return QC template in case of the failed quality check. Default value false indicates that the QC template will be returned even in the case of the failed quality check.
inferenceDuration	`4`	This defines the minimum amount of speech required to trigger a replay and a voice cloning inference
qcTemplateUpdateRatio	`1`	Defines the percentage of the audio buffer that is discarded (or shifted) after an inference is computed.
minSpeechRatio	`0.55`	This is the ratio of speech to total audio length. For example, a 7 second wav file may contain only 3 seconds of speech.
minSpeechDuration	`2`	Minimum seconds of speech detected in the submitted audio sample.
minSnr	`5`	Quality metric that should only be modified after consulting with Daon Support. This is the threshold for signal to noise ratio that factors into quality assessment of the audio sample provided.
minLoudness1	-0.6	Quality metric that should only be modified after consulting with Daon Support.
minLoudness2	`-38`	Quality metric that should only be modified after consulting with Daon Support.
maxSaturation	`-0.04`	Quality metric that should only be modified after consulting with Daon Support.
maxFrameSaturationRatio	`-0.3`	Quality metric that should only be modified after consulting with Daon Support.
codecDiscard	`amr-nb-0,amr-nb-1,amr-nb-2,amr-nb-3`	Quality metric that should only be modified after consulting with Daon Support.

Before modifying configurations, it is recommended to discuss with Daon Support to understand the impact of changes made.

Response

Valid Sample
Clone
    
 
{    "result": "PROCESSED",    "streamResult": "NO_ANOMALY_DETECTED",    "confidenceIndicator": "HIGH",    "replayResponse": {        "model": "DaonVoice-ReplayTelephony-2.2.0-JIT",        "version": "2.2.0",        "speechDuration": 4.0,        "frequency": 8000,        "processCompleted": true,        "isReplay": false,        "r1": 0.9991,        "r2": 0.8951,        "processingTimeMillis": 93    },    "cloneResponse": {        "model": "DaonVoice-VoiceCloningTelephony-2.0.0-JIT",        "version": "2.0.0",        "speechDuration": 4.0,        "frequency": 8000,        "processCompleted": true,        "spoof": false,        "scores": {            "v1": 0.9989,            "v2": 0.896,            "v3": 1.0,            "v4": 0.9991        },        "processingTimeMillis": 138    },    "qcResponse": {        "model": "DaonVoice-QC-1.1.1.1-JIT",        "version": "1.1.1.1",        "speechDuration": 3.93,        "frequency": 8000,        "processCompleted": true,        "audioDuration": 4.04,        "snr": 1000.0,        "loudness1": 10.365,        "loudness2": -17.063,        "maxSaturationRatio": -0.025,        "frameSaturationRatio": -0.0,        "stoi": 0.88184,        "pesq": 1.8722,        "siSdr": 10.175,        "aggregatedSpeechInTpl": 7.8602,        "qcFailed": false,        "codec": "raw",        "qcStatus": 0,        "speechRatio": 0.97277,        "processingTimeMillis": 435    },    "processingTimeMillis": 596}
Copy

Property	Description
result	This is the chunk level processing of a single audio sample. `PROCESSED`: Enough valid speech has been found for processing. `NOT_PROCESSED`: No valid speech has been found yet.
streamResult	This is the top level processing result of the audio sample(s) grouped by stream-id header. `NOT_PROCESSED`: No valid speech has been found yet. `NO_ANOMALY_DETECTED`: No anomalies (replay or cloned voice) have been detected yet. `ANOMALY_DETECTED`: An anomaly (replay or cloned voice) has been detected.
confidenceIndicator	an overall confidence indicator based on the BPCER of the lowest confidence score. The confidence is either: `HIGH` if an anomaly is detected or there is a high confidence of no anomaly. `MEDIUM` if there is a medium confidence of no anomaly
replayResponse	Results of replay detection
speechDuration	Duration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection.
isReplay	true/false indicator that replay was detected within the segment being processed. It is possible for isReplay to not match result if multiple audio segments have been processed for the same stream-id.
cloneResponse	Results of clone detection
speechDuration	Duration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection.
clone	true/false indicator that clone or synthetic voice was detected within the segment being processed. It is possible for clone to not match result if multiple audio segments have been processed for the same stream-id.
qcResponse	Results of quality assessment.
qcFailed	true/false indicator that segment passed quality processing. If QC processing fails, an HTTP
qcStatus	If qcFailed==true, a qcStatus and qcMessage will be present. See QC Responses below.
qcMessage	If qcFailed==true, a qcStatus and qcMessage will be present. See QC Responses below.
speechDuration	Amount of speech detected in audio sample.

QC Responses

If a quality error is present (qcFailed==true), a qcStatus and qcMessage will be present.

qcStatus	qcMessage
111	Audio is too soft
112	Audio is too noisy
113	Audio has poor intelligibility (Audio too soft)
115	Audio is too loud
116	Too few speech samples are being detected.
117	Audio recording is saturated.
118	Network coverage is too poor
119	Failed objective quality check (stoi, pesq and si-sdr).

Error Responses

Example error responses are provided below:

Audio Length Too Long
Invalid Audio
    
 
{    "code": "7",    "text": "Request is not valid as audio length is exceeding the configured maximum value of 15000.0 millis"}
Copy

Error	Code	Description
UNEXPECTED_ERROR	1	An unexpected error occurred.
VOICE_REQUEST_INVALID	2	Voice request is not valid - request is null or audioDataWav inside it is null.
REQUEST_NOT_READABLE	3	Http request is not readable, client aborted.
VALIDATION_FAILED	4	Request validation failed
INPUT_STRING_DOT_VALIDATION_FAILED	5	Request is not valid - %s contains dot (.), which is forbidden.
API_KEY_NOT_PROVIDED	6	Request is not valid as api key header is not provided (x-api-key must be passed).
AUDIO_LENGTH_IS_GREATER_THAN_MAX_ALLOWED	7	Request is not valid as audio length is exceeding the configured maximum value of %s millis
REQUEST_HEADER_IS_MISSING	8	Request is not valid as mandatory %s header is missing
INVALID_AUDIO_FORMAT_ERROR	12	Invalid audio data or audio of wrong format was supplied. Error message: %s
AUDIO_DURATION_CALCULATION_ERROR	13	Error happened when trying to calculate audio duration : %s.
MINIMUM_THREAD_POOL_SIZE_ERROR	14	Defined thread pool size: %s is lower than minimum allowed thread pool size: %s.
ACCESS_TOKEN_EMPTY	100	API Key can't be empty.
ARTHR_SERVICE_DOWN	101	Arthr service is down.
SENTINEL_SERVICE_DOWN	102	Arthr service is down.
JWT_NOT_VALIDATED	104	Arthr service is down.
JWT_NOT_VALID	105	Provided JWT is not valid.

Error Responses

Example error responses are provided below:

Audio Length Too Long
Invalid Audio
    
 
{    "code": "7",    "text": "Request is not valid as audio length is exceeding the configured maximum value of 15000.0 millis"}
Copy

Error	Code	Description
UNEXPECTED_ERROR	1	An unexpected error occurred.
VOICE_REQUEST_INVALID	2	Voice request is not valid - request is null or audioDataWav inside it is null.
REQUEST_NOT_READABLE	3	Http request is not readable, client aborted.
VALIDATION_FAILED	4	Request validation failed
INPUT_STRING_DOT_VALIDATION_FAILED	5	Request is not valid - %s contains dot (.), which is forbidden.
API_KEY_NOT_PROVIDED	6	Request is not valid as api key header is not provided (x-api-key must be passed).
AUDIO_LENGTH_IS_GREATER_THAN_MAX_ALLOWED	7	Request is not valid as audio length is exceeding the configured maximum value of %s millis
REQUEST_HEADER_IS_MISSING	8	Request is not valid as mandatory %s header is missing
INVALID_AUDIO_FORMAT_ERROR	12	Invalid audio data or audio of wrong format was supplied. Error message: %s
AUDIO_DURATION_CALCULATION_ERROR	13	Error happened when trying to calculate audio duration : %s.
MINIMUM_THREAD_POOL_SIZE_ERROR	14	Defined thread pool size: %s is lower than minimum allowed thread pool size: %s.
ACCESS_TOKEN_EMPTY	100	API Key can't be empty.
ARTHR_SERVICE_DOWN	101	Arthr service is down.
SENTINEL_SERVICE_DOWN	102	Arthr service is down.
JWT_NOT_VALIDATED	104	Arthr service is down.
JWT_NOT_VALID	105	Provided JWT is not valid.

Last updated on