Chord Recognition System

Project Title

Project Area of Specialization

Software Engineering

Project Summary

we evaluate the use of a Convolutional Neural Network for the task of chord recognition using a Mel-spectrogram. Mel-Spectrogram is used due to its non-linear nature and its resemblance to the human auditory system. The data is obtained from MIREX which is annotated by Harte in his Ph.D. thesis. A 3 convolution 2d model is selected for training the system on major and minor chords primarily while expanding the use of the system on other kinds of chords gradually. The system proved to have an acceptable accuracy on major and minor chords since they’re the most widely used chords. Our goal is to improve the accuracy of chords on rare chords using augmentation. A comparison between CNN and RNN will be made to prove which model works better for the task of chord recognition.

Project Objectives

Music has been an inherent form of expression since the conception of mankind. It is an auditory medium to transfer and express emotions between player and listener. Music has been found to release dopamine in the brain of the listener and infer various kinds of feelings in the listener. Learning how to compose music and become an expert at various instruments is a dream pursued by many. Music is not arbitrary; it follows specific rules and regulations. Music has also been translated to mathematical expressions like the circle of fifths by Pythagoras that organizes pitches in the sequence of fifths which aids in memorizing the 24 minor and major keys and deeply understanding the inherent relationship between chords and keys.A chord is the most fundamental unit of the western tonal system. Chords are hierarchical compositions of fundamental elements. Singular pitches combine to form time-based intervals, which then combine to form chords, combine to form harmonic progressions and ultimately songs. That’s why their identification and extraction is very important for high-level music information retrieval such as genre . The task of musical information retrieval is very complex. Only professional musicians can extract musical information like key, scale and chords by just listening to a song. Even then they require years of experience in composing music to be able to identify the correct information completely accurately from a piece of music. The transcription of every single note played by every instrument is a very complex task. One chord exists in many forms. It can sound different on multiple instruments (multi-timbre), in multiple octaves, and inversions. Hence, looking at the diversity of each chord makes the process of chord recognition very tedious. The task of music information retrieval (MIR) has been researched for years by the MIREX (Musical Information Retrieval Evaluation Exchange) community every year. Chord Recognition still stands to be a complex task that is worked upon every year to improve the accuracy results. The purpose of music information retrieval has many benefits such as genre recognition, key recognition, tempo estimation, and harmonic analysis. A user-friendly application that depicts the sequence of chords and the start-end time of the chord will aid beginner musicians to hone their skills.

Project Implementation Method

We used two deep learning models:

M1: Predicting chords with a Convolutional Neural Network (CNN) A convolutional Neural Network is a type of Artificial Neural Network that consists of convoluted layers and fullyconnected layers. It takes an input Xinput and produces an output Yout Which is further controlled by its weight parameters Wi where the ‘i’ identifies the index of the network itself. The layers in a CNN are stacked according to a consecutive structure such that the output of the current layer is taken as an input to the next layer. Since CNN is a data-driven approach, it depends on the musical information of the data to optimize its numerical parameters. This can converge to an acceptable good solution if the data is in profusion. For that, augmentation is performed on the data set to increase the occurrences of some rare chords and make up for a lack of audio data. The training and ground truth data are provided in the annotations created by Harte [12]. Due to fewer occurrences of the rare chords and the existence of major and minor chords in profusion, 3 classes of chords are selected including 26 major chords, 26 minor chords, and a No chord class functioning as our wastebasket. An important thing to consider in the training strategy of the model is how the feature vectors are distributed over the audio information. A chord might be played for a long or short time and the model needs a uniform vector distribution. For that, we decide to segment the audio in intervals of 3 seconds and if there are remaining seconds of the same chord still playing, they will be accounted for by padding the signal. A method of segmentation will also be implemented in this system that is dividing the audio where the beat changes. Since it is common practice for a chord change to occur at a beat change, It will be implemented to see if it provides efficient performance. In our model, we used a 3 convolution 2D layer with batch normalization and pooling after each layer followed by a flatten layer and fully connected layer. The model was trained on i9-9900k, 1080ti and 32gb RAM. M2: Predicting chords with a Recurrent Neural Network (RNN)

M2: Predicting chords with a Recurrent Neural Network (RNN) A Recurrent Neural Network is also a type of Artificial Neural Network. In a Recurrent Neural Network, instead of having multiple hidden layers like in CNN, we have a middle layer that stores the output of an input layer and feeds it back to the same layer as an input in a loop in order to compute the final output.

Benefits of the Project

A chord is the most fundamental unit of the western tonal system. Chords are hierarchical compositions of fundamental elements. Singular pitches combine to form time-based intervals, which then combine to form chords, combine to form harmonic progressions and ultimately songs. That’s why their identification and extraction is very important for high-level music information retrieval such as genre. The task of musical information retrieval is very complex. Only professional musicians can extract musical information like key, scale and chords by just listening to a song. Even then they require years of experience in composing music to be able to identify the correct information completely accurately from a piece of music. The transcription of every single note played by every instrument is a very complex task. One chord exists in many forms. It can sound different on multiple instruments (multi-timbre), in multiple octaves, and inversions. Hence, looking at the diversity of each chord makes the process of chord recognition very tedious. The task of music information retrieval (MIR) has been researched for years by the MIREX (Musical Information Retrieval Evaluation Exchange) community every year. Chord Recognition still stands to be a complex task that is worked upon every year to improve the accuracy results. The purpose of music information retrieval has many benefits such as genre recognition, key recognition, tempo estimation, and harmonic analysis. A user-friendly application that depicts the sequence of chords and the start-end time of the chord will aid beginner musicians to hone their skills.

Technical Details of Final Deliverable

The overall model proved to have an accuracy of 63% The model was then retrained on augmented and non augmented data for the purpose of comparison. The accuracy of model on Augmented data for Major and Minor Chords proved to be 79% which was more than accuracy on non-augmented data. Non-Augmented accuracy was 75%. This proved that augmentation increases accuracy of the model. The model can be further improved by training on other types of chords beside major and minor chords.