Update: This post now has a Part 2.
If you follow machine learning topics in the news, I am sure by now you would have come across Andrej Karpathy‘s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks.[1] Apart from the post itself, I have found it very fascinating to read about the diverse applications that its readers have found for it. Since then I have spent several hours hacking with different machine learning models to compose tabla rhythms:
Inspired by @seaandsailor, used @karpathy‘s char-rnn to make a tabla rhythm https://t.co/kqzZG3q2A2 Amazed how well it learnt on small data
— Gaurav Trivedi (@trivedigaurav) May 26, 2015
Although Tabla does not have a standardized musical notation that is accepted by all, it does have a language based on the bols (literally, verbalize in English) or the sounds of the strokes played on it. These bols may be expressed in written form which when pronounced in Indian languages sound like the drums. For example, the theka for the commonly used 16-beat cycle – Teental is written as follows:
Dha | Dhin | Dhin | Dha | Dha | Dhin | Dhin | Dha Dha | Tin | Tin | Ta | Ta | Dhin | Dhin | Dha
For this task, I made use of Abhijit Patait‘s software – TaalMala, which provides a GUI environment for composing Tabla rhythms in this language. The bols can then be synthesized to produce the sound of the drum. In his software, Abhijit extended the tabla language to make it easier for users to compose tabla rhythms by adding a square brackets after each bol that specify the number of beats within which it must be played. You could also lay more emphasis on a particular bol by adding ‘+’ symbols which increased their intensity when synthesized to sound. Variations of standard bols can be defined as well based on different the hand strokes used:
Dha1 = Na + First Closed then Open Ge
Now that we are armed with this background knowledge, it is easy to see how we may attempt to learn tabla like a language model using Natural Language Processing techniques. Predictive modeling of tabla has been previously explored in "N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition" (Avinash Sastry, 2011). But, I was not able to get access to the datasets used in the study and had to rely on the compositions that came with the TaalMala software.[2] This is comparatively a much smaller database than what you would otherwise use to train a neural network: It comprises of 207 rhythms with 6,840 bols in all. I trained a char-rnn and sampled some compositions after priming it with different seed text such as “Dha”, “Na” etc. Given below is a minute long composition sampled from my network. We can see that not only the network has learned the TaalMala notation but it has also understood some common phrases used in compositions such as the occurrence of the phrase “TiRa KiTa“, repetitions of “Tun Na” etc.:
Ti [0.50] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki | Ta | Tun [0.50] | Na | Dhin | Na | Tun | Na | Tun | Na | Dha | Dhet | Dha | Dhet | Dha | Dha | Tun | Na | Dha | Tun | Na | Ti | Na | Dha | Ti | Te | Ki | Ti | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] | Dhin | Dhin | Dha | Ge | Ne | Dha | Dha | Tun | Na | Ti [0.25] | Ra | Ki | Ta | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dha [1.00] | Ti | Dha | Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Dhet | Dhin | Dha | Tun | Na | Ti [0.25] | Ra | Ki | Ta | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Ti | Ka | Tra [0.50] | Ti | Ti | Te | Na [0.50] | Ki [0.50] | Dhin [0.13] | Ta | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] | Na [0.25] | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Dha [0.34] | Ti [0.25] | Ra | Ki | Ta | Tra | Ka | Tra [0.50] | Ki [0.50] | Tun [0.50] | Dha [0.50] | Ti [0.25] | Ra | Ki | Ta | Tra | Ka | Ta | Te | Ti | Ta | Kat | Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Dhin | Dhin | Dhin | Dha | Tun | Na | Ti | Na | Ki | Ta | Dha [0.50] | Dha | Ti [0.50] | Ra | Ki | Te | Tun [0.50] | Tra [0.25] | Ti [0.25] | Ra | Ki | Te | Tun | Ka | Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ki [0.25] | Ti | Dha | Ti | Ta | Dha | Ti | Dha [0.50] | Ti | Na | Dha | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] | Na | Ti [0.25] | Ra | Ki | Te | Tra | Ka | Dha [0.50] | Ti [0.50] | Ra | Ki | Te | Tun [0.50] | Na | Ki [0.25] | Te | Dha | Ki | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Ti [0.25] | Ra | Ki | Te | Dha [0.50] | Tun | Ti [0.25] | Ra | Ki | Te | Dhin [0.50] | Na | Ti [0.25] | Te | Dha | Ki [0.25] | Te | Ki | Te | Dhin [0.50] | Dhin | Dhin | Dhin | Dha | Dha | Tun | Na | Na | Na | Ti [0.25] | Ra | Ki | Ta | Ta | Ka | Dhe [0.50] | Ti [0.25] | Ra | Ki | Te | Ti | Re | Ki | Te | Dha [0.50] | Ti | Dha | Ge | Na | Dha | Ti [0.25] | Ra | Ki | Te | Ti | Te | Ti | Te | Ti | Te | Dha [0.50] | Ti [0.25] | Te | Ra | Ki | Te | Dha [0.50] | Ki | Te | Dha | Ti [0.25]
Here’s a loop that I synthesized by pasting a composition sampled 4 times one after the another:
Of course, I also tried training n-gram models and the smoothing methods using the SRILM toolkit. Adding spaces between letters is a quick hack that can be used to train character level models using existing toolkits. Which one produces better compositions? I can’t tell for now but I am trying to collect more data and hope to add updates to this post as and when I find time to work on it. I am not confident if simple perplexity scores may be enough to judge the differences between two models, specially on the rhythmic quality of the compositions. There are many ways in which one can extend this work. One there is a possibility of training on different kinds of compositions: kaidas, relas, laggis etc., different rhythm cycles and also from different gharanas. All of this would required collecting a bigger composition database:
If you have access to any good tabla compositions database(s) please do let me know. Thanks! — Gaurav Trivedi (@trivedigaurav) May 26, 2015
And then there is a scope for allowing humans to interactively edit compositions at places where AI goes wrong. You could also use the samples generated by it as an infinite source of inspiration.
Finally, here’s a link to the work in progress playlist of the rhythms I have sampled till now.
References
- Avinash Sastry (2011), N-gram modeling of tabla sequences using variable-length hidden Markov models for improvisation and composition. Available: https://smartech.gatech.edu/bitstream/handle/1853/42792/sastry_avinash_201112_mast.pdf?sequence=1.
Footnotes
- If you encountered a lot of new topics in this post, you may find this post on Understanding natural language using deep neural networks and the series of videos on Deep NN by Quoc Le helpful. ^
- On the other hand, Avinash Sastry‘s work uses a more elaborate Humdrum notation for writing tabla compositions but is not as easy to comprehend for tabla players. ^
Did you think about trying to use genetic algorithm to see where it could lead or something more advanced neurogenetic algorithm to improve the drums ?
Btw you should try with multiple things and combine them, such as multiple layers combined and see what you get.
And how did you establish in your algorithm that the current song is good enough ?
Hello Gaurav, I stumbled across this blog in some search. It is great that you have been able to use TaalMala’s composer in your machine learning experiments. It would be great to get in touch with you so that we can further collaborate on this, if you are interested.
Thanks Abhijit. I’d be happy to connect with you over email!