The Babel Emotions Project uses an 8-label codebook with ‘No emotion or Neutral emotion’, ‘Fear’, ‘Sadness’, ‘Anger’, ‘Disgust’, ‘Success’, ‘Joy’ and ‘Trust’ labels, corresponding to the information available at ‘poltextlab/EngEmBERT8’ models HuggingFace model card. The category system is based on the Plutchik system with some modifications.
The model was prepared to make predictions on sentence-level data, meaning you should provide input that is segmented into sentences in order to achieve optimal performance. The Emotions Babel Machine currently uses a pooled model, that was trained on data in the following languages: Czech, English, French, German, Hungarian and Italian, but we encourage you to also submit datasets not covered under this list, as results may be useful for additional languages due to the nature of large language models.
You can upload your datasets here for automated emotions coding. If you wish to submit multiple datasets one after another, please wait 5-10 minutes between each of your submissions. There are two possibilities for upload: pre-coded datasets or non-coded datasets. The explanation of the form and the dataset requirement is available here.
The upload requires you to fill the following form on metadata regarding the dataset. Please upload your dataset, and in case of a pre-coded dataset, if available, please attach the codebook used beside the dataset
The non-coded datasets should contain an id and a text column. The column names must be in row 1. You are free to add supplementary variables to the dataset beyond the compulsory ones in the columns following them.
Pre-coded datasets must contain the following columns: id, text and label. The column names must be in row 1. Uploading a pre-coded sample is optional, but it can help us calculate performance metrics and fine-tune the language model behind Babel Machine. The detailed rules of validations are available here. The mandatory data format of the label is numeric (integer), based on the following:
- 0 : No emotion or Neutral emotion
- 1 : Fear
- 2 : Sadness
- 3 : Anger
- 4 : Disgust
- 5 : Success
- 6 : Joy
- 7 : Trust
You are free to add supplementary variables to the dataset beyond the compulsory ones in the columns following them. Automatic processing requires to follow these rules.
After you upload your dataset and your file is successfully processed, you will receive the emotions-coded dataset and a file (in CSV format) that includes the predictions by the ‘poltextlab/EngEmBERT8’ model. If the files you would like to upload are larger than 1 GB, please reach out to us with the download link attached (such as Dropbox or Google Drive) using our contact form.
If you have any questions or feedback regarding the Babel Machine, please let us know using our contact form. Please remember that we can only get back to you on Hungarian business days.
Submit a dataset:
The research was supported by the Ministry of Innovation and Technology NRDI Office and the European Union, in the framework of the RRF-2.3.1-21-2022-00004 Artificial Intelligence National Laboratory project.
The research leading to these results has received funding from the European Union Horizon 2020 Framework Programme (H2020) under grant agreement no 101008468.
On behalf of the Babel Machine project we are grateful for the possibility to use HUN-REN Cloud (see Héder et al. 2022; https://science-cloud.hu).