The VCDx API

This section defines the API endpoints for the VCDx service.

The https://{{host}}:{{port}}/tenants/{{tenant}}/audios/{{serviceName}} endpoint is the only exposed endpoint and it can be used for both health check and audio analysis based on the header and parameters passed in.

Example:

HTTP
Copy

The {{tenant}} represents the tenant within TrustX that the x-api-key will be validated against. This tenant will be provided to you by support@daon.com. See Initial Configuration for details.

{{serviceName}} is required and is used for tracking purposes. See Metrics (VCDx < v1.1.2.15) for details.

Headers

The following headers are supported by the VCDx service.

Header NameDescriptionRequired
Content-TypeSet to application/jsonY
x-api-keyProvided by support@daon.com. See Initial Configuration for details.Y
stream-idAudio streams are identified by the stream-id HTTP header. This is used to group related requests together and to allow customers to identify the unique number of streams that have been processed. In the case of the voice gateway being used, the voice gateway will specify the stream-id and ensure that the ID is consistent between requests for the same audio call. If calling VCDx directly, specify a stream-id that is shared across multiple call segments, or generate a unique guid for each sample is treating each sample as a standalone audio sample.Y
x-sp-processIf x-sp-process is included and set to NO_PROCESS , no audio processing will be performed. This is useful for the purpose of a ping/health check.N

Ping/Health Check

Request

HTTP
Copy

Response

Success
Copy

Health Actuator

Request

HTTP
Copy

Response

Success
Copy

Analyze Audio Segment

Request

Simple Request
Specify Thresholds
Copy

Specifying Thresholds

The following parameters may be specified in the json request (See Specify Thresholds example above).

PropertyDefault ValueDescription
replayConfidenceR10.6293r1 is the score value obtained from the replay detection model that detects low quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof
replayConfidenceR20.05794r2is the score value obtained from the replay detection model that detects high quality pattern devices, value can be from 0 to 1. The lower the score the more likely the file is a spoof
cloneThresholds

{

"v1" : 0.01625,

"v2" : 0.06082,

"v3" : 0.678,

"v4" : 0.4003

}

This config allows clone detection thresholds to be set. v1, v2, v3 and v4 are internal models. These thresholds should not be adjusted without confirmation from Daon Support.
minStoi0.73707Short-Time Objective Intelligibility, an objective measure designed to predict the intelligibility of speech, especially in noisy or processed audio.
minSiSdr-7.478Scale-Invariant Signal-to-Distortion Ratio, an objective metric that calculates the ratio of the power of the original signal to the power of the distortion (e.g., noise, artifacts introduced during processing).
minPesq1.2676Perceptual Evaluation of Speech Quality, standardized as ITU-T P.862, it is an objective metric designed to assess the quality of narrowband and wideband speech signals. It models human perception of speech quality by analysing both time and frequency domains and is often used to evaluate the quality of speech codecs, and other processing that affects audio quality
noQcTemplateIfQcFailedfalseThis defines if the QC snapin will still return QC template in case of the failed quality check. Default value false indicates that the QC template will be returned even in the case of the failed quality check.
inferenceDuration4This defines the minimum amount of speech required to trigger a replay and a voice cloning inference
qcTemplateUpdateRatio1Defines the percentage of the audio buffer that is discarded (or shifted) after an inference is computed.
minSpeechRatio0.55This is the ratio of speech to total audio length. For example, a 7 second wav file may contain only 3 seconds of speech.
minSpeechDuration2Minimum seconds of speech detected in the submitted audio sample.
minSnr5Quality metric that should only be modified after consulting with Daon Support. This is the threshold for signal to noise ratio that factors into quality assessment of the audio sample provided.
minLoudness1-0.6Quality metric that should only be modified after consulting with Daon Support.
minLoudness2-38Quality metric that should only be modified after consulting with Daon Support.
maxSaturation-0.04Quality metric that should only be modified after consulting with Daon Support.
maxFrameSaturationRatio-0.3Quality metric that should only be modified after consulting with Daon Support.
codecDiscardamr-nb-0,amr-nb-1,amr-nb-2,amr-nb-3Quality metric that should only be modified after consulting with Daon Support.

Before modifying configurations, it is recommended to discuss with Daon Support to understand the impact of changes made.

Response

Valid Sample
Clone
Copy
PropertyDescription
result

This is the chunk level processing of a single audio sample.

  • PROCESSED: Enough valid speech has been found for processing.
  • NOT_PROCESSED: No valid speech has been found yet.
streamResult

This is the top level processing result of the audio sample(s) grouped by stream-id header.

  • NOT_PROCESSED: No valid speech has been found yet.
  • NO_ANOMALY_DETECTED: No anomalies (replay or cloned voice) have been detected yet.
  • ANOMALY_DETECTED: An anomaly (replay or cloned voice) has been detected.
confidenceIndicator

an overall confidence indicator based on the BPCER of the lowest confidence score. The confidence is either:

  • HIGH if an anomaly is detected or there is a high confidence of no anomaly.
  • MEDIUM if there is a medium confidence of no anomaly
replayResponseResults of replay detection
speechDurationDuration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection.
isReplaytrue/false indicator that replay was detected within the segment being processed. It is possible for isReplay to not match result if multiple audio segments have been processed for the same stream-id.
cloneResponseResults of clone detection
speechDurationDuration of audio sample analyzed. Unlike results in qcResponse, this will show entire audio length rather than speech detected within the audio. This is because the entire sample is used for replay detection.
clonetrue/false indicator that clone or synthetic voice was detected within the segment being processed. It is possible for clone to not match result if multiple audio segments have been processed for the same stream-id.
qcResponseResults of quality assessment.
qcFailedtrue/false indicator that segment passed quality processing. If QC processing fails, an HTTP
speechDurationAmount of speech detected in audio sample.

Error Responses

Example error responses are provided below:

Audio Length Too Long
Invalid Audio
Copy
ErrorCodeDescription
UNEXPECTED_ERROR1An unexpected error occurred.
VOICE_REQUEST_INVALID2Voice request is not valid - request is null or audioDataWav inside it is null.
REQUEST_NOT_READABLE3Http request is not readable, client aborted.
VALIDATION_FAILED4Request validation failed
INPUT_STRING_DOT_VALIDATION_FAILED5Request is not valid - %s contains dot (.), which is forbidden.
API_KEY_NOT_PROVIDED6Request is not valid as api key header is not provided (x-api-key must be passed).
AUDIO_LENGTH_IS_GREATER_THAN_MAX_ALLOWED7Request is not valid as audio length is exceeding the configured maximum value of %s millis
REQUEST_HEADER_IS_MISSING8Request is not valid as mandatory %s header is missing
INVALID_AUDIO_FORMAT_ERROR12Invalid audio data or audio of wrong format was supplied. Error message: %s
AUDIO_DURATION_CALCULATION_ERROR13Error happened when trying to calculate audio duration : %s.
MINIMUM_THREAD_POOL_SIZE_ERROR14Defined thread pool size: %s is lower than minimum allowed thread pool size: %s.
ACCESS_TOKEN_EMPTY100API Key can't be empty.
ARTHR_SERVICE_DOWN101Arthr service is down.
SENTINEL_SERVICE_DOWN102Arthr service is down.
JWT_NOT_VALIDATED104Arthr service is down.
JWT_NOT_VALID105Provided JWT is not valid.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard