Authors: Florian Eyben, Martin Wöllmer, Björn Schuller

The openSMILE tool enables you to extract large audio feature spaces in realtime. SMILE is an acronym for Speech & Music Interpretation by Large Space Extraction. It is written in C++ and is available as both a standalone commandline executable as well as a dynamic library. The main features of openSMILE are its capability of on-line incremental processing and its modularity. Feature extractor components can be freely interconnected to create new and custom features, all via a simple configuration file. New components can be added to openSMILE via an easy plugin interface and a comprehensive API. 

If you use openSMILE for your research, please cite the following paper: 

Florian Eyben, Martin Wöllmer, Björn Schuller: “openSMILE – The Munich Versatile and Fast Open-Source Audio Feature Extractor,” Proc. ACM Multimedia (MM), ACM, Firenze, Italy, 2010.


Authors: Maximillian Schmitt, Björn Schuller

openXBOW is an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input. In the BoW principle, word histograms were first used as features in document classification, but the idea was and can easily be adapted to, e.g., acoustic or visual low-level descriptors, introducing a prior step of vector quantisation. The openXBOW toolkit supports arbitrary numeric input features and text input and concatenates computed subbags to a final bag. It provides a variety of extensions and options.

If you use openXBOW for your research, please cite the following paper: 

Maximilian Schmitt, Björn Schuller: “openXBOW: Introducing the Passau Open-source Crossmodal Bag-of-Words Toolkit,” The Journal of Machine Learning Research 18.1: 3370-3374, 2017.


Authors: Michael Freitag, Shahin Amiriparian, Sergey Pugacheskiy, Björn Schuller

auDeep is a Python toolkit for deep unsupervised representation learning from acoustic data. It is based on a recurrent sequence to sequence autoencoder approach which can learn representations of time series data by taking into account their temporal dynamics. It provides an extensive command line interface in addition to a Python API for users and developers, both of which are comprehensively documented and publicly available. Experimental results indicate that auDeep features are competitive with state-of-the art audio classification.

If you use auDeep for your research, please cite the following paper: 

Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, Björn Schuller: “audeep: Unsupervised learning of representations from audio with deep recurrent neural networks,” The Journal of Machine Learning Research18(1), 6340-6344, 2017.


Authors: Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Björn Schuller

DeepSpectrum is a Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs). It features an extraction pipeline which first creates visual representations for audio data – plots of spectrograms or chromagrams – and then feeds them to a pre-trained Image CNN. Activations of a specific layer then form the final feature vectors.

If you use DeepSpectrum for your research, please cite the following paper:

Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Nicholas Cummins, Michael Freitag, Sergey Pugachevskiy, Björn Schuller, “Snore Sound Classification Using Image-based Deep Spectrum Features,” in Proceedings INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3512–3516, ISCA, August 2017.


Authors: Panagiotis Tzirakis, Stefanos Zafeiriou, Björn Schuller

End2You is the Imperial College London toolkit for multimodal profiling by end-to-end deep learning. End2You is an open-source toolkit implemented in Python and is based on Tensorflow. It provides capabilities to train and evaluate models in an end-to-end manner, i.e., using raw input. It supports input from raw audio, visual, physiological or other types of information or combination of those, and the output can be of an arbitrary representation, for either classification or regression tasks. End2You can provide comparable results to state-of-the-art methods despite no need of expert-alike feature representations, but self-learning these from the data “end to end”.

If you use end2you for your research, please cite the following paper: 

Panagiotis Tzirakis, Stefanos Zafeiriou, Björn W. Schuller: “End2You – The Imperial Toolkit for Multimodal Profiling by End-to-End Learning,” arXiv preprint arXiv:1802.01115, 2018.