The VCDx API
This section defines the API endpoints for the VCDx service.
The https://{{host}}:{{port}}/tenants/{{tenant}}/audios/{{serviceName}} endpoint is the only exposed endpoint and it can be used for both health check and audio analysis based on the header and parameters passed in.
Example:
POST https://xDeTECH.customer.com:8099/tenants/idxauth/audios/testService
Content-Type: application/json
x-api-key: JKUGCNMVIA76UOU4VMSIWDSYIQ.A9D2EW1A37E8E2B5C097ACF49010AA68
stream-id: 7f3a323b-a19d-4fab-8e43-8b2d71460598
{
"audioDataWav": "UklGRlSWAQB...."
}
The {{tenant}} represents the tenant within TrustX that the x-api-key will be validated against. This tenant will be provided to you by support@daon.com. See Initial Configuration for details.
{{serviceName}} is required and is used for tracking purposes. See Metrics (VCDx < v1.1.2.15) for details.
Headers
The following headers are supported by the VCDx service.
Header Name | Description | Required |
---|---|---|
Content-Type | Set to application/json | Y |
x-api-key | Provided by support@daon.com. See Initial Configuration for details. | Y |
stream-id | Audio streams are identified by the stream-id HTTP header. This is used to group related requests together and to allow customers to identify the unique number of streams that have been processed. In the case of the voice gateway being used, the voice gateway will specify the stream-id and ensure that the ID is consistent between requests for the same audio call. If calling VCDx directly, specify a stream-id that is shared across multiple call segments, or generate a unique guid for each sample is treating each sample as a standalone audio sample. | Y |
x-sp-process | If x-sp-process is included and set to NO_PROCESS , no audio processing will be performed. This is useful for the purpose of a ping/health check. | N |
Ping/Health Check
Request
GET https://xDeTECH.customer.com:8099/ping
Response
{
"serviceName": "SentinelVCDX",
"serviceId": "1",
"systemTime": "2025-09-30T10:06:21.993+00:00",
"sentinelVersion": "2.0.0.0",
"redisEnabled": true,
"redisAvailable": true,
"daonVoiceCloneNativeVersion": "2.0.0.1",
"daonVoiceQCNativeVersion": "2.0.1.0",
"daonVoiceReplayNativeVersion": "2.2.0.1"
}
Health Actuator
Request
GET https://xDeTECH.customer.com:8099/actuator/health
Response
{
"status":"UP",
"components":{
"livenessState":{
"status":"UP"
},
"readinessState":{
"status":"UP"
}
},
"groups":[
"liveness",
"readiness"
]
}
Analyze Audio Segment
Request
POST https://xDeTECH.customer.com:8099/tenants/idxauth/audios/testService
Content-Type: application/json
x-api-key: JKUGCNMVIA76UOU4VMSIWDSYIQ.A9D2EW1A37E8E2B5C097ACF49010AA68
stream-id: 7f3a323b-a19d-4fab-8e43-8b2d71460598
{
"audioDataWav": "UklGRnhIAQBXQVZFSlVOS..."
}
Specifying Thresholds
The following parameters may be specified in the json request (See Specify Thresholds example above).
Property | Default Value | Description |
---|---|---|
replayConfidenceR1 | 0.6293 | r1 is the score value obtained from the replay detection model that detects low quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof |
replayConfidenceR2 | 0.05794 | r2 is the score value obtained from the replay detection model that detects high quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof |
cloneThresholds |
| This config allows clone detection thresholds to be set. v1, v2, v3 and v4 are internal models. These thresholds should not be adjusted without confirmation from Daon Support. |
minStoi | 0.73707 | Short-Time Objective Intelligibility, an objective measure designed to predict the intelligibility of speech, especially in noisy or processed audio. |
minSiSdr | -7.478 | Scale-Invariant Signal-to-Distortion Ratio, an objective metric that calculates the ratio of the power of the original signal to the power of the distortion (e.g., noise, artifacts introduced during processing). |
minPesq | 1.2676 | Perceptual Evaluation of Speech Quality, standardized as ITU-T P.862, it is an objective metric designed to assess the quality of narrowband and wideband speech signals. It models human perception of speech quality by analysing both time and frequency domains and is often used to evaluate the quality of speech codecs, and other processing that affects audio quality |
noQcTemplateIfQcFailed | false | This defines if the QC snapin will still return QC template in case of the failed quality check. Default value false indicates that the QC template will be returned even in the case of the failed quality check. |
inferenceDuration | 4 | This defines the minimum amount of speech required to trigger a replay and a voice cloning inference |
qcTemplateUpdateRatio | 1 | Defines the percentage of the audio buffer that is discarded (or shifted) after an inference is computed. |
minSpeechRatio | 0.55 | This is the ratio of speech to total audio length. For example, a 7 second wav file may contain only 3 seconds of speech. |
minSpeechDuration | 2 | Minimum seconds of speech detected in the submitted audio sample. |
minSnr | 5 | Quality metric that should only be modified after consulting with Daon Support. This is the threshold for signal to noise ratio that factors into quality assessment of the audio sample provided. |
minLoudness1 | -0.6 | Quality metric that should only be modified after consulting with Daon Support. |
minLoudness2 | -38 | Quality metric that should only be modified after consulting with Daon Support. |
maxSaturation | -0.04 | Quality metric that should only be modified after consulting with Daon Support. |
maxFrameSaturationRatio | -0.3 | Quality metric that should only be modified after consulting with Daon Support. |
codecDiscard | amr-nb-0,amr-nb-1,amr-nb-2,amr-nb-3 | Quality metric that should only be modified after consulting with Daon Support. |
Before modifying configurations, it is recommended to discuss with Daon Support to understand the impact of changes made.
Response
{
"result": "PROCESSED",
"streamResult": "NO_ANOMALY_DETECTED",
"confidenceIndicator": "HIGH",
"replayResponse": {
"model": "DaonVoice-ReplayTelephony-2.2.0-JIT",
"version": "2.2.0",
"speechDuration": 4.0,
"frequency": 8000,
"processCompleted": true,
"isReplay": false,
"r1": 0.9991,
"r2": 0.8951,
"processingTimeMillis": 93
},
"cloneResponse": {
"model": "DaonVoice-VoiceCloningTelephony-2.0.0-JIT",
"version": "2.0.0",
"speechDuration": 4.0,
"frequency": 8000,
"processCompleted": true,
"spoof": false,
"scores": {
"v1": 0.9989,
"v2": 0.896,
"v3": 1.0,
"v4": 0.9991
},
"processingTimeMillis": 138
},
"qcResponse": {
"model": "DaonVoice-QC-1.1.1.1-JIT",
"version": "1.1.1.1",
"speechDuration": 3.93,
"frequency": 8000,
"processCompleted": true,
"audioDuration": 4.04,
"snr": 1000.0,
"loudness1": 10.365,
"loudness2": -17.063,
"maxSaturationRatio": -0.025,
"frameSaturationRatio": -0.0,
"stoi": 0.88184,
"pesq": 1.8722,
"siSdr": 10.175,
"aggregatedSpeechInTpl": 7.8602,
"qcFailed": false,
"codec": "raw",
"qcStatus": 0,
"speechRatio": 0.97277,
"processingTimeMillis": 435
},
"processingTimeMillis": 596
}
Property | Description |
---|---|
result | This is the chunk level processing of a single audio sample.
|
streamResult | This is the top level processing result of the audio sample(s) grouped by stream-id header.
|
confidenceIndicator | an overall confidence indicator based on the BPCER of the lowest confidence score. The confidence is either:
|
replayResponse | Results of replay detection |
speechDuration | Duration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection. |
isReplay | true/false indicator that replay was detected within the segment being processed. It is possible for isReplay to not match result if multiple audio segments have been processed for the same stream-id. |
cloneResponse | Results of clone detection |
speechDuration | Duration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection. |
clone | true/false indicator that clone or synthetic voice was detected within the segment being processed. It is possible for clone to not match result if multiple audio segments have been processed for the same stream-id. |
qcResponse | Results of quality assessment. |
qcFailed | true/false indicator that segment passed quality processing. If QC processing fails, an HTTP |
speechDuration | Amount of speech detected in audio sample. |
Error Responses
Example error responses are provided below:
{
"code": "7",
"text": "Request is not valid as audio length is exceeding the configured maximum value of 15000.0 millis"
}
Error | Code | Description |
---|---|---|
UNEXPECTED_ERROR | 1 | An unexpected error occurred. |
VOICE_REQUEST_INVALID | 2 | Voice request is not valid - request is null or audioDataWav inside it is null. |
REQUEST_NOT_READABLE | 3 | Http request is not readable, client aborted. |
VALIDATION_FAILED | 4 | Request validation failed |
INPUT_STRING_DOT_VALIDATION_FAILED | 5 | Request is not valid - %s contains dot (.), which is forbidden. |
API_KEY_NOT_PROVIDED | 6 | Request is not valid as api key header is not provided (x-api-key must be passed). |
AUDIO_LENGTH_IS_GREATER_THAN_MAX_ALLOWED | 7 | Request is not valid as audio length is exceeding the configured maximum value of %s millis |
REQUEST_HEADER_IS_MISSING | 8 | Request is not valid as mandatory %s header is missing |
INVALID_AUDIO_FORMAT_ERROR | 12 | Invalid audio data or audio of wrong format was supplied. Error message: %s |
AUDIO_DURATION_CALCULATION_ERROR | 13 | Error happened when trying to calculate audio duration : %s. |
MINIMUM_THREAD_POOL_SIZE_ERROR | 14 | Defined thread pool size: %s is lower than minimum allowed thread pool size: %s. |
ACCESS_TOKEN_EMPTY | 100 | API Key can't be empty. |
ARTHR_SERVICE_DOWN | 101 | Arthr service is down. |
SENTINEL_SERVICE_DOWN | 102 | Arthr service is down. |
JWT_NOT_VALIDATED | 104 | Arthr service is down. |
JWT_NOT_VALID | 105 | Provided JWT is not valid. |