Learn :

Kits :



Make your Arduino really talk …or play sounds effects and music.

VoiceShield Intro



The Voice Shield™ (VS) is an analog audio shield for the Arduino that allows you to play audio sound bytes. It could be used in many different Arduino projects, such as; a talking clock, DIY Arduino talking GPS, robots, alarms, motion based sound effects etc ...

It uses a unique, yet very user friendly, way to access different sound bytes so it is easy to build a "talking" device. It can work with words, complete sentences, or sound effects. With the VS your Arduino can also build phrases "on the fly" that sound a little like an automated telephone operator, for example; "you" "entered" "one" "two" "three" "press" "pound" "if" "that" "is" "correct".


- The Voice Shield™:

The VS is designed to work with a "full size" Arduino as a plug-on shield. Though, you could use it with any Arduino but you'd have to wire it manually.

It has a stackable design, meaning that if you have other plug-on shields they can be stacked on top of the VS. (Make sure that they are not using the same pins! The VS uses digital pins 2,3,4 and 5 of the Arduino. )

The ISD runs at 3 volts but the Arduino uses 5 volts, to interface the two the VS has an on board 3 volt regulator with filtering capacitors, and level shifting circuitry to safely connect both devices.

There are two forms of audio output possible with the VS. One, is a simple on board audio AMP and speaker connectors with space for a 15mm speaker (a bigger speaker may be used but it won’t fit onto the VS PCB). It seems that the bigger the speaker the better the VS sounds. The second is a none amplified line out that is available though a 3.5mm stereo jack (though the VS is MONO), which you can connect to powered speakers or your sound system for really big sound! If the line out is plugged into the VS, the on board speaker and amp are automatically disconnected.

All of the audio input is analog via another 3.5mm stereo jack "Audio In". Even thought the VS is a MONO device you can input stereo, the VS blends the two channels though a set of coupling capacitors into a MONO signal.

Description of the Voice Shield parts

- The ChipCorder (Winbond / Nuvoton):

The VS is based on the ISD4003 from Winbond. This chip is called a "voice chip" because it was designed to record and playback messages for devices such as answering machines, talking cars etc. This version of the ISD can hold 4 minutes of audio at a sampling rate of 8khz, (the best quality presently available in this type of chip). There are others that can hold longer recordings but the sampling rate is lower. Spec sheet .

At an 8khz sampling rate, for 4 minutes of audio, the VS is not meant to replace your iPod! But, rather the VS is designed to be added to your electronics projects that you want to easily add the ability to talk or sound effects to. The VS even sounds better then most typical robots, because the voice can be a recorded human voice! The VS lets you really make it talk!

What does it do:

Check these videos, to see and hear the VS in action. (Find the Arduino Sketchs in the software section).

Demo of the VoiceShield used for sound effects.

Demo of the VoiceShield being used as a talking clock.

Demo of the VoiceShield being used as a talking volt meter.

Demo of the VoiceShield speaking a made up phrase.

How it works:

The VS taps into the ISD's ability to record to and play from specific memory addresses. The entire 4 minutes (240 seconds) of ISD memory is divided into 1200 unique addresses, which would give 1200 sound bytes of 0.2 seconds each! [240 seconds /1200 = 0.2 second] At 0.2 seconds, these samples are too small to be useful, so the VS lets you decide how to divide up the memory. For example if you choose to have 80 sound bytes they would be 3 seconds each.

You can think of the ISD memory as an audio tape, and the tape counter as the addresses. If you wanted to record or play something at particular address, you would fast forward to that address on the tape counter then press record or play.

When you are finished recording the VS inserts an EOM (End of Message) marker into the memory at the point “right” after the sound byte. This EOM is used during playback, to let the VS know when the sound is finished by pulling the EOM pin LOW. This way your Arduino does not have to handle this task, saving memory and processing power for the rest of your own Arduino project.

Illustration of the VS audio memory. 

The green bars represent the memory divided equally into sound slots, the left most side of each slot is where the sound starts to play when a particular slot is addressed. The red bars (EOM) (end of message) mark the end of a sound recording.  The EOM spacing coincides with the point where a recording is stopped, and are automatically inserted into the audio memory. The EOM marker could be at any point within the sound byte and depend on the length of the recorded sound.

In the illustration above, if you wanted the VS to say “Could you please take me to the hotel”, the Arduino would simply tell the VS to play, sound byte numbers (7, 16, 3, 12, 17, 20, 5, 1) in sequence, where in this example “Could” is recorded into sound byte # 7, etc...

SpikenzieLabs user’s VoiceShield Project:

We received a message from the a group at the University of Ottawa, Department of Mechanical Engineering that had just finished a 3rd year electronics for mechanical engineers’ course project. 

In their awesome project called the “ Coin Bot ” they used a VoiceShield for the audio feedback. You can find some great documentation on their project web page: Coin Bot .

“CoinBot plays Heads or Tails in a best 2 out of 3 match with a player. CoinBot will flip the coin using a solenoid attached to its hand, recognizes the outcome by using phototransistor with reflective properties of colors, and responds accordingly with his voice.  Tallies and announces the final score of the match.”

Copyright SpikenzieLabs 2019