Secure speech solutions
This section deals with secure speech equipment, such as voice encryption
devices, from a variety of manufacturers. Such devices come in many flavours,
ranging from simple speech scramblers to digital voice encryptors.
Most of the devices shown below, are also featured elsewhere on this website
as they fall into multiple categories.
Secure telephones are a class of their own,
but since they also belong to the group of voice encryption devices,
they are linked from this page
Voice encryption units on this website
Secure speech systems are known by various names, such as Voice Privacy
Unit, Secure Speech System, Voice Protection Device, Speech Encryptor, etc.
In principle, there are only two systems for voice protection:
- A.1 - Frequency domain voice scrambler
In this analogue system, the frequency domain of the human speech is mirrored
around a given center frequency, so that it becomes unintelligible.
Such systems can easily be broken, even if the audio band is split
into multiple smaller bands first.
- A.2 - Time domain voice scrambler
In this system, the human speech is first stored in some kind of memory,
after which the individual parts are then scrambled in the time domain.
It is more secure than a frequency domain scrambler, but can still be
broken as the individual sound samples still bear the properties of
- A.3 - Frequency and Time Domain voice scrambler
This system, also known as the F/T Scrambler, is a combination of the
above methods. It is the most complex one, but can still be broken with
the right equipment, no matter how complex the randomizer is, as the
individual samples still bear the properties of speech.
- B - Digital Encryption
This method uses a digital representation of the analogue voice signal
(samples), which is mixed with a digital key stream. This method is much
safer than the ones above and is the only one that can really be called
Before digital speech encryption became widely available,
an analogue technique was used to protect voice transmissions.
This technique is commonly known as voice scrambling and comes
in three flavours which are further explained below.
Scramblers are inherently insecure and only provide protection against
an occasional eavesdropper, such as the telephone exchange operator.
A.1 — Frequency domain scrambling
The oldest method uses frequency inversion
and is also known as voice inversion.
It is based on mirroring of the audio frequency spectrum around a given
center frequency, and can be applied to a discrete number of sub-bands.
This principle is best explained using a simplified model:
The audio spectrum of the voice data (1) is mixed with a fixed
carrier frequency fc (2).
This results in two spectra: one that is the sum
of the original sectrum and the carrier (3),
and one that is the difference of the two signals (4).
A low-pass filter (LPF) is then applied to filter-off the
sum and leave only the difference, effectively resulting in a mirrored
audio band (5).
At the receiving end, this process of mirroring the spectrum is repeated
to make the speech 'legible' again:
The advantage of this technique is that it completely takes place within the
audio bandwidth of a channel, whereas digital encryption generally requires
more space. This allows scrambling to be used in existing systems.
At the time, scramblers were also cheaper than
digital encryptors, which is why scramblers were used by the police
in many countries from the 1970's well into the 1990's.
The disadvantage of this method is that an evesdropper can easily reverse
the mirroring process with a simple electronic circuit.
In addition, experienced listeners could sometimes even extract useful
information from the seemingly garbled speech directly, without a descrambling
Although voice inversion is commonly achieved by using an electronic
diode-based ring mixer, the French inventor and engineer
Jules Carpentier showed in 1919
that the same effect can be obtained mechanically, by
using a motor-driven commutator running at the centre frequency.
In a more complex scheme, one could vary the carrier frequency and also
split-up the audio band in several (e.g. five) smaller bands that are then
mirrored individually. In addition, the individual frequency bands can be
swapped as shown in the rightmost diagram below.
Continuously varying these parameters by putting them
under digital control, can make it harder to decode the signal.
Examples of frequency domain scramblers
A.2 — Time domain scrambling
Another method for speech protection is the so-called time-division or
time-domain (TD) speech scrambling.
This method is more secure than the simpler
frequency-inversion system, but far less secure than modern
digital speech encryptors.
The simplified diagram below, shows how it works.
Sampled speech data is cut into a number of small fragments which are then
scrambled in an ever changing order. The order in which the packets are
scrambled is determined by a pseudo random number generator (PRNG)
which is seeded, or initialised, by the user by means of a
In the diagram above, the top row shows the clear speech (input) in time.
The second row shows the speech after it is scrambled.
The bottom row finally shows the speech once it is descrambled again (output).
The whole process of scrambling and descrambling, causes a noticable delay
which is typically in the range of 0.3 to 0.6 seconds.
These delays may lead to confusion.
As the time segments are scrambled in an ever changing pattern, it is important
that transmitter and receiver are correctly synchronised. To ensure that both
ends are kept in sync, a pilot signal is transmitted with the
scrambled speech by means of Audio Frequency Shift Keying (AFSK).
An example of a speech scrambler that uses Time Domain Scrambling, is the
BBC Cryptophon 1100.
Although scramblers of this type are not safe, many police and other
law enforcement agencies around the world, used this method for securing
their conversations for many years, as it has the advantage that it can be
used on existing narrow-band FM radio channels.
Despite the fact that the experienced listener can't make any sense
of the garbles, the system is prone to cryptanalytic attacks.
It is possible to reconstruct the original signal,
without knowning the key or the PRNG,
by using a computer to analyse the signal to find any discontinuities
and reorder the fragements.
Examples of time domain scramblers
A.3 — Frequency and Time domain scrambling
The third and most complex type of voice scrambler, is the so-called
Frequency and Time Domain Scrambler, also known as the F/T Scrambler,
which is basically a combination of the two methods explained above.
This solution is also known as two-dimensional voice scrambling.
Although scrambling and descrambling of this method is much more complex,
the system is equally prone to cryptanalysis as the previous ones.
Any kind of analogue scrambling is inherently insecure.
Below are some examples of scrambled speech.
These samples were recorded by Barry Wels  from the built-in analogue
voice scrambler of the Icom IC-H11 radio. If you listen carefully to the
scrambled audio, you may be able to descramble some of it yourself
with a little exercise.
Examples of frequency and time domain (F/T) scramblers
Most - if not all - modern secure voice terminals use digital encryption.
Speech is first digitized by means of an Analog-to-Digital Convertor (ADC) or
a Vocoder. The resulting digital data stream is then mixed by means of
an XOR-operation with a data stream from a pseudo-random number
generator (PRNG), which in turn is seeded by a KEY. This principle is also
known as the Vernam Cipher.
The resulting encrypted data stream is then transmitted by means of an
a so-called modulator-demodulator (MODEM).
This process is shown in the simplified diagram below:
The Pseudo Random Number Generator (PRNG) is seeded by a KEY that is either
entered manually or with a key fill device.
Modern systems sometimes use asymmetric encryption methods (e.g. AES)
to exchange the keys over an insecure channel. This is known as
Public Key Encryption (PKE).
In the 1970s, many systems (e.g. the KY-57)
used Continuous Variable Slope Delta Modulation (CVSD) to convert speech into digital data. This wide-band solution was only suitable for VHF and UHF radios.
In the 1980s, narrow-band systems were introduced,
such as the KY-99, that used (enhanced)
Linear Predictive Coding (LPC),
limiting the data-rate to 2400 baud or even 800 baud.
Before speech can be encrypted, is has to be converted from the analogue to the
digital domain, by means of a sampler, or digitizer, or vocoder.
Generally speaking, a digital signal needs more bandwidth than its analogue
equivalent (typically twice the bandwidth), but methods have been developed
to reduce this by analysing the properties of speech and
sending them as numeric values to the other end, where they are used to
reconstruct, or synthesize, the original signal.
This method is known as a Vocoder and is not always good enough to recognise a person's voice. The first vocoder,
named VODER, was developed
at Bell Labs in 1939. Its principle was first used during WWII on the
transatlantic SIGSALY cryptographic telephone.
A speech analyser/synthesizer is also known as a CODEC (coder-decoder).
➤ Different vocoders (speech CODECs)
Once analogue speech has been digitized, it can be encrypted digitally,
by means of a variety of encryption algorithms. Some devices use
publicly available algorithms such as DES,
Triple-DES (3DES) or AES,
but many others use proprietary
encryption algorithms that are kept secret.
Below are some sound samples of digitally encrypted speech,
recorded by Barry Wels  from an Icom IC-H10SR radio.
The first file contains the original audio file. The second file plays
the encrypted audio. The last file finally contains the resulting
audio once it has been decrypted.
Examples of digital voice encryptors
Any links shown in red are currently unavailable.
If you like the information on this website, why not make a donation?|
© Crypto Museum. Created: Tuesday 04 August 2009. Last changed: Monday, 06 March 2023 - 08:31 CET.