APPSCI_Journal Paper (submitted)

  • Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks
    Sangeun Kum, Juhan Nam
    Preprints 2017, 2017110027 (doi: 10.20944/preprints201711.0027.v1)
    |PDF| |Demo|


Singing melody extraction is the task that identifies the melody pitch contour of singing voice from polyphonic music. Most of the traditional melody extraction algorithms are based on calculating salient pitch candidates or separating the melody source from the mixture. Recently, classification-based approach based on deep learning has drawn much attentions.

In this paper, we present a classification-based singing melody extraction model using deep convolutional neural networks. The proposed model consists of a singing pitch extractor (SPE) and a singing voice activity detector (SVAD).

  • The SPE is trained to predict a high-resolution pitch label of singing voice from a short segment of spectrogram. This allows the model to predict highly continuous curves. The melody contour is smoothed further by post-processing the output of the melody extractor.
  • The SVAD is trained to determine if a long segment of mel-spectrogram contains a singing voice. This often produces voice false alarm errors around the boundary of singing segments. We reduced them by exploiting the output of the SPE.

    Finally, we evaluate the proposed melody extraction model on several public datasets. The results show that the proposed model is comparable to state-of-the-art algorithms.


  • We target to annotate contemporary Korean pop music, often called "K-pop"
  • We collected a list of 114 singers from We obtained five audio files per singer and filtered out songs with duet, chorus singers or rap. As a result, we collected 469 songs.
  • Using a singing voice detector, we trimmed audio files into 10-sec long segments with voice, thereby obtaining 6787 examples.