Activity 2: Audio Recognition
Train and Test the Audio Classifier
The first step is to open Google Teachable Machine in order to create an audio recognition model.
You will need to be in a quiet room for this. Decide how many classes you want and name them. Each category should be a single word. For an audio model, the first category must always be “Background Noise.” The example shown here was trained with background noise, the word “go,” and the word “stop.”
The training will take about a minute. Make sure to leave the tab open while the model is training, even if your browser pops up a warning that the window is unresponsive.
When the training is complete, you will be able to test your model in the Preview panel. Make sure that your model works the way that you want it to before moving on. If it doesn’t, you may need to add more audio samples for each class and train again. When you are happy with your model, click Export Model.
Remember to save your model in case you want to reference or change it later. Click on the Teachable Machine menu and either download the file or save it to your Google drive.
Using the Audio Classifier in Snap!


Using the audio classifier in Snap! is very similar to the process you used in Activity 1 for the image classifier. If you are using the BlueBird Connector, open this project in Snap! and save a copy for yourself. Then click on the Settings menu and enable JavaScript extensions.
If you are using snap.birdbraintechnologies.com, import this project into Snap!.
Press the spacebar to see your classifier make predictions in Snap!. Remember, it will take up to a minute for the classification to start the first time you run the script. The prediction data is in the same format as it was for the image classifier. The table on the stage lists each classification class and the probability that the current sound belongs to that class.
Challenge: Write a program to make the Finch respond to each of your words. As you test your program, notice what happens if you say a word that your model does not know. What happens if a different voice says the trained words?