Corpus
Explore our collection of multilingual datasets for detecting Technology-Facilitated Gender-Based Violence (TFGBV). The corpus includes annotated text samples and lexicons across multiple languages, starting with Nepali and expanding to other languages. You can also access these dataset directly on GitHub.
Dataset Files
Download the complete datasets for research, analysis, and tool development.
| File Name | Description | Format | Size | Records | Schema | Download |
|---|---|---|---|---|---|---|
tfgbv_lexicon.csv | A multilingual lexicon of abusive terms and TFGBV expressions, categorized by language and subcategory. Contains terms used for detection and classification. | CSV | ~50KB | Loading... | Download | |
annotated_tfgbv_dataset.csv | Annotated dataset with full sentences and messages, labeled for TFGBV classification. Includes text samples, labels, language, subcategory, and terms used. | CSV | ~500KB | Loading... | Download |
Explore the Lexicon
Preview and search through the multilingual TFGBV lexicon. This interactive view shows a sample of entries from the dataset. Download the full dataset to access all entries.
Showing 0 of 0 entries
Explore Annotated Data
Preview annotated text samples with labels, languages, and categories. This interactive view shows examples from the annotated dataset. Download the full dataset to access all entries.
Showing 0 of 0 entries
No entries found matching your criteria.
Want to Contribute?
Help expand the corpus by contributing new terms, annotations, or data for additional languages. Your contributions help build better tools for detecting and preventing TFGBV.
Learn How to Contribute