The PKI 2670 Speech Transcription automatically converts speech into plain text, which means that the entire content hidden in voice recordings can be easily searched. The technology details are based on state-of-the-art acoustic modelling techniques, including neural network-based functions. Among other things, the focus is also on spontaneous telephone conversations. PKI 2670 applies channel compensation techniques that are compatible with the widest possible range of audio sources: GSM/CDMA, 3G, VoIP, landline, satellite phone, etc. PKI 2670 Speech Transcription supports adding other words to the model in the latest generation of the model. New languages can be trained on request.
PKI 2670 has the following input requirements:WAV or RAW (unsigned PCM 8 or 16 bit, IEEE float 32 bit, A-law or Mu-law, ADPCM), FLAC, OPUS; 8kHz+ sampling (other audio formats are automatically converted).
PKI 2670 uses the following output formats: XML/ JSON format with all results or result files.
- One-best transcription
- N-best transcription
The following languages are supported by PKI 2670
4th and older genaration
The 5th generation PKI 2670 is approximately 7x faster than real-time processing on a CPU core.
The 4th and older generations of the model are about 1.2x faster.