Corpus

Explore our collection of multilingual datasets for detecting Technology-Facilitated Gender-Based Violence (TFGBV). The corpus includes annotated text samples and lexicons across multiple languages, starting with Nepali and expanding to other languages. You can also access these dataset directly on GitHub.

Dataset Files

Download the complete datasets for research, analysis, and tool development.

File NameDescriptionFormatSizeRecordsSchemaDownload
tfgbv_lexicon.csvA multilingual lexicon of abusive terms and TFGBV expressions, categorized by language and subcategory. Contains terms used for detection and classification.CSV~50KBLoading...Download
annotated_tfgbv_dataset.csvAnnotated dataset with full sentences and messages, labeled for TFGBV classification. Includes text samples, labels, language, subcategory, and terms used.CSV~500KBLoading...Download

Explore the Lexicon

Preview and search through the multilingual TFGBV lexicon. This interactive view shows a sample of entries from the dataset. Download the full dataset to access all entries.

Showing 0 of 0 entries

Loading lexicon data...

Explore Annotated Data

Preview annotated text samples with labels, languages, and categories. This interactive view shows examples from the annotated dataset. Download the full dataset to access all entries.

Showing 0 of 0 entries

No entries found matching your criteria.

Want to Contribute?

Help expand the corpus by contributing new terms, annotations, or data for additional languages. Your contributions help build better tools for detecting and preventing TFGBV.

Learn How to Contribute