MuLCAM
Multilingual Corpus for Online Safety and Moderation in Nepal
MuLCAM is Nepal's first open, context-aware linguistic corpus focused on
Technology-Facilitated Gender-Based Violence (TFGBV). It documents how harm and
abuse appear in Nepali digital spaces across languages, scripts, and cultural
contexts, and makes this data openly available for research, moderation, and
public-interest technology.
Online gender-based violence is rising in Nepal, and complaints have increased exponentially in recent years, with women constituting a significant proportion of survivors. According to the Cyber Bureau of Nepal Police, in fiscal year (FY) 2023/24 alone 19,730 cybercrime complaints were filed, of which 8,745 cases were related to violence against women, with cases ranging from harassment and impersonation to blackmail and non-consensual sharing of intimate images. 382 complaints involved girls and 767 involved individuals from gender and sexual minority groups.
Nepali digital communication is multilingual, contextual, and often informal. Contents on the digital spaces are on romanized Nepali, slang, coded language, grawlix, and obfuscation. Existing moderation systems which have been largely trained on global or English-language datasets, fail to recognize these patterns.