Acoustic analysis of the voice depend upons parameter settings specific to sample characteristics such as intensity, duration, frequency and filtering . The acoustic properties of the voice and speech can be used to detect gender of speaker. warbleR R package is designed for acoustic analysis. The d
Voice based Gender Recognition Using Deep Learning
| Acoustic analysis of the voice depend upons parameter settings specific to sample characteristics such as intensity, duration, frequency and filtering . The acoustic properties of the voice and speech can be used to detect gender of speaker. warbleR R package is designed for acoustic analysis. The data set which have acoustic parameters can be obtained with this analysis. The data set can be trained with different machine learning algorithms. In this project, MLP has been used to obtain model. The results have been compared with related work. A web page has been designed to detect the gender of voice by using obtained model. |
Acoustic analysis of the voice depend upons parameter settings specific to sample characteristics such as intensity, duration, frequency and filtering . The acoustic properties of the voice and speech can be used to detect gender of speaker. warbleR R package is designed for acoustic analysis. The data set which have acoustic parameters can be obtained with this analysis. The data set can be trained with different machine learning algorithms. In this project, MLP has been used to obtain model. The results have been compared with related work. A web page has been designed to detect the gender of voice by using obtained model.
The main aim of this project is to detect the gender of a person by using his/her voice.
| All training, test and prediction codes have been written by using Python libraries. Data have been loaded from csv file into Numpy arrays with built-in Python libraries. Data set has been loaded from csv file into 2 dimension Python array. Each row has 20 parameters and 1 label. The array has been shuffled randomly. It has been splitted to 5 chunks. First 4 chunk has 633 data but last has 636 data. Also last column of data, which is label, has been converted integer as 0 for male and 1 for female and added to Python array to 5 chunks. 5-Fold cross validation has been used and average score has been obtained. Training and test loop have been run 5 times. On each run different chunk has been used for test, other chunks are concatenated to Numpy array and used for training. On each loop, 20% of data has been used for test and 10% of data has been used for validation. Keras has been used top of Tensorflow and has been configured to use GPU. 1 input layer, 4 hidden layers and 1 output layer have been used to build our model. Input layer has 20 inputs and connected to first hidden layer which has 64 perceptrons. Second and third hidden layers have each 256 perceptrons. Forth hidden layer has 64 perceptrons. The output layer has 2 perceptrons. Softmax activation function conducted in output layer to obtain the categorical distribution of the result for labels. Dropout 0.25 has been applied between each hidden layers. Dropout consists of randomly setting a fraction of input units to 0 at each update during training time. In this way, it helps to prevent overfitting. Nadam optimization algorithm in Keras has been used to train our model. The learning rate has been chosen 0.001. This gave us slow learning but it prevents us to miss minimum. By choosing lower learning rate our model has been trained with 150 epochs. Total training time is around 100-120 sec for each fold. Several loss function has been tested with our model and Kullback–Leibler divergence algorithm has been chosen which gave best performance and accuracy. |
All training, test and prediction codes have been written by using Python libraries. Data have been loaded from csv file into Numpy arrays with built-in Python libraries. Data set has been loaded from csv file into 2 dimension Python array. Each row has 20 parameters and 1 label. The array has been shuffled randomly. It has been splitted to 5 chunks. First 4 chunk has 633 data but last has 636 data. Also last column of data, which is label, has been converted integer as 0 for male and 1 for female and added to Python array to 5 chunks. 5-Fold cross validation has been used and average score has been obtained. Training and test loop have been run 5 times. On each run different chunk has been used for test, other chunks are concatenated to Numpy array and used for training. On each loop, 20% of data has been used for test and 10% of data has been used for validation. Keras has been used top of Tensorflow and has been configured to use GPU. 1 input layer, 4 hidden layers and 1 output layer have been used to build our model. Input layer has 20 inputs and connected to first hidden layer which has 64 perceptrons. Second and third hidden layers have each 256 perceptrons. Forth hidden layer has 64 perceptrons. The output layer has 2 perceptrons. Softmax activation function conducted in output layer to obtain the categorical distribution of the result for labels. Dropout 0.25 has been applied between each hidden layers. Dropout consists of randomly setting a fraction of input units to 0 at each update during training time. In this way, it helps to prevent overfitting. Nadam optimization algorithm in Keras has been used to train our model. The learning rate has been chosen 0.001. This gave us slow learning but it prevents us to miss minimum. By choosing lower learning rate our model has been trained with 150 epochs. Total training time is around 100-120 sec for each fold. Several loss function has been tested with our model and Kullback–Leibler divergence algorithm has been chosen which gave best performance and accuracy.
| The model obtained in project show us that we can use acoustic properties of the voices and speech to detect the voice gender. MLP has been used to obtain the model for classification from data set which have the parameters of voice samples. This will be very beneficial for the agencies. |
The model obtained in project show us that we can use acoustic properties of the voices and speech to detect the voice gender. MLP has been used to obtain the model for classification from data set which have the parameters of voice samples.
This will be very beneficial for the agencies.
In this project, a Multilayer Perceptron (MLP) deep learning model has been described to recognize voice gender. The data set have 3,168 recorded samples of male and female voices. The samples are produced by using acoustic analysis. An MLP deep learning algorithm has been applied to detect gender specific traits. Our model achieves 96.74% accuracy on the test data set. Also the interactive web page has been built for recognition gender of voice.
| All training, test and prediction codes have been written by using Python libraries. Data have been loaded from csv file into Numpy arrays with built-in Python libraries. Data set has been loaded from csv file into 2 dimension Python array. Each row has 20 parameters and 1 label. The array has been shuffled randomly. It has been splitted to 5 chunks. First 4 chunk has 633 data but last has 636 data. Also last column of data, which is label, has been converted integer as 0 for male and 1 for female and added to Python array to 5 chunks. 5-Fold cross validation has been used and average score has been obtained. Training and test loop have been run 5 times. On each run different chunk has been used for test, other chunks are concatenated to Numpy array and used for training. On each loop, 20% of data has been used for test and 10% of data has been used for validation. Keras has been used top of Tensorflow and has been configured to use GPU. 1 input layer, 4 hidden layers and 1 output layer have been used to build our model. Input layer has 20 inputs and connected to first hidden layer which has 64 perceptrons. Second and third hidden layers have each 256 perceptrons. Forth hidden layer has 64 perceptrons. The output layer has 2 perceptrons. Softmax activation function conducted in output layer to obtain the categorical distribution of the result for labels. Dropout 0.25 has been applied between each hidden layers. Dropout consists of randomly setting a fraction of input units to 0 at each update during training time. In this way, it helps to prevent overfitting. Nadam optimization algorithm in Keras has been used to train our model. The learning rate has been chosen 0.001. This gave us slow learning but it prevents us to miss minimum. By choosing lower learning rate our model has been trained with 150 epochs. Total training time is around 100-120 sec for each fold. Several loss function has been tested with our model and Kullback–Leibler divergence algorithm has been chosen which gave best performance and accuracy. |
Agriculture is a dominant employment hobby of Pakistan people which determines the Pakista...
This project Trip Planner helps the tourists in getting complete information of the area t...
Opinion mining or comments toward attitude evaluation, individual entity, are usually call...
This work has been proposed on a novel idea i.e reforming the conventional robotic archite...
This project creates a new architecture through the extension of an earlier developed gene...