View on GitHub

birdsong-generation-project

Generating birdsong with WaveNet

Birdsong generation project

Generating birdsongs with Wavenet!

Table of Contents

Quick execution

Requirements

Command

# preparation
git clone https://github.com/shiba24/birdsong-generation-project.git
bash preparation.sh

# training  
cd tensorflow-wavenet
python train.py --data_dir=../corpus

# generation
cd tensorflow-wavenet
python generate.py --wav_out_path=generated.wav --samples 80000 logdir/train/{DATE_HERE}/model.ckpt-{XXX}

Generated song

Listen to natural song at soundcloud

Listen to generated song at soundcloud

Overview

Abstract in one sentence

Simulate bird song with WaveNet.

Background

What is songbird?

Songbird is one of the best model animals for the neuroscientific studies of human language, vocalization, and auditory processing. Many laboratories around the world including molecular biology, physiology, acoustics, and ethology, are using songbird to answer the questions: why only humans have language? and what is the neural mechanism of language?.

Song structure

Bird song is considered to have syntax like human language, although it does not have semantics within itself. In most species in songbirds only male sings while few species both sexes sing. One function of their song is considered to be sexual attraction to females. Below is typical song structure of songbirds. We can see a bout of several song elements (called syllable or note).

You can listen to an example of java sparrow’s (文鳥) song here. This is visualized image, or spectrogram, of zebra finch’s (錦華鳥) song. Alphabets on the spectrogram represent type of note. Both of java sparrow and zebra finch are songbirds.

And interestingly, the song structure is expressed as finite-state automaton model, which can be regarded as high-order Markov process. The transition of notes are probablistic, and song is expressed as probabilistic finite-state transition diagram. This is considered to be in parallel with human language (Berwick et al., 2011.) This is song expression as finite-state transition diagram. Line thickness represents the probability of transition from note to note.

(Both figures cited from Honda & Okanoya, 1999)

Brain structure

Many researchers approached what is the neural mechanism enabling finite-state vocalization. And one hypothesis is Markov chain-like representation within neurons in the motor areas. The right figure is neural pathway of vocalization (cited from Bouhuis et al., 2010). The more detailed brain circuitry including audition can be seen here for example.

We can see there is a brain region named HVC (proper name), which is pre-motor area. Many neurons show activity phase-locked to the song. HVC neurons are projecting to RA (robust nucleus of arcopallium), which is motor area, and RA outputs motor signal to mustles of vocal organ for the generation of song element.

The next figure (a) is another expression of finite-state transition of song. And (b) is a simple model of HVC and RA neurons. The hypothesis assumes neurons in HVC are firing in turn like _chain_. (Cited from Katahira et al, 2007)

There are many studies for modelling (even using neural network) the birdsong and its neural mechanism.

WaveNet

WaveNet is generative neural network model for raw audio file. The original paper was published by Google DeepMind team in 2016. It uses dilated convolutional neural network to generate audio wave. (Gif image cited from Blog post of DeepMind)

Inputs and outputs for the model are only waveform. Hence this model itself does NOT assume that the syllables expressed with finite state, nor Markov chain.

This project

This project is only my own (not belonging to my supervisor), combining latest machine learning result and knowledge of neuroscience about songbirds.

In one sentence: using WaveNet to simulate bird song.

As mentioned above, bird song itself is thought to have Markov-Model structure and syntax like human speech. However, song itself has no semantics.

If the mechanism should be similar between such birds and humans, WaveNet (original blog and implementation of tensorflow) might be successful for simulating birdsong, because it is succssful in generating completely meaningless but locally speech-like sound waveform.

Why is this interesting?

WaveNet itself doesn’t use Markov property of song. It only uses information of raw waveform. Therefore, if WaveNet succeeds in generating birdsong:

  1. WaveNet might have an ability to embed Markov property. This is not proved explicitly if we only generate human speech with this model.

  2. Representation obtained by trained model (i.e. activation pattern of neurons in the model) might be comparable with neural representation in actual brain of songbird.

  3. Similaity between human speech and birdsong as syntax could be further supported.

Model configuration

Datasize
??
Sampling rate
and
Other settings
will be here

Result

Training epoch

After 2500 epoch, loss is about 1.5~2.0.

Generated sound

Listen to natural song at soundcloud

Listen to generated song at soundcloud

It sounds like original (natural) song!! This is visualized image, or spectrogram, of song. (It tells us that the wave sound is a bit chattering, though.)

Discussion

The next step of this project would be:

  1. Investigating whether markov-chain structure in the generated song is similar to that in natural song.

  2. Comparing neural firing patterns known-to-date and activated neuron pattern in the model.

TODOs

Notice

Copyright

Implementation of Wavenet is done by ibab.

All rights reserved Shintaro Shiba.

Any questions or comments are welcomed! Thank you.