Lost in Translation: Analyzing Non-English Cybercrime Forums

1 Oct
2025

Mariella Mischinger, PhD Student at IMDEA Networks Institute, Madrid, Spain

In-house Presentation

Cybercrime analysis and Cyber Threat Intelligence are crucial for understanding and defending against cyber threats, with online underground communities serving as a key source of information. Classification tasks are popular but demand significant manual effort and language-specific expertise. Prior work focuses on English-language forums, as non-English languages require fluent domain experts.

We evaluate machine translation tools for suitability in preserving contextual information in posts and find GPT-4 is most reliable. We leverage existing underground forum post classification pipelines to compare their performance on translated text and original language text. We find classification performed on translated underground forum data is as effective as on original language text, enabling researchers to reuse existing pipelines. Finally, we investigate a fully machine-generated few-shot and zero-shot classification to reduce reliance on manual labeling, followed by a two-step machine-based classification, combining machine-generated labels with the existing classification pipeline. We find machine-based labeling causes errors to propagate downstream. For tasks requiring high-quality label creation, human expertise remains essential. Finally, we provide a qualitative evaluation of disagreements in annotator labels of the original language and the translations, as well as disagreements between annotators and machine labeling.

About Mariella Mischinger

Mariella Mischinger is a PhD student at the IMDEA Networks Institute investigating cybercrime activity in underground hacking forums using modern NLP techniques. Before that, she graduated from the Technical University of Munich with a M.Sc. in computer science. During her studies, the main focus was on IT security and networks. In her bachelor thesis and research seminars, she examined different blockchain systems. Moreover, she is also familiar with microcontroller and sensor technology, as this was the focus of her master’s thesis. After completing her studies, she initially worked as a certified SCRUM product owner and project manager at UnternehmerTUM GmbH in Munich. Mariella’s choice to eventually enroll in the PhD program comes from her desire to learn and investigate technical details, as well as from her strong interest in the research topic.

This event will be conducted in English

  • Location: MR-A1 [Ramón] & MR-A2 [Cajal], IMDEA Networks Institute, Avda. del Mar Mediterráneo 22, 28918 Leganés – Madrid
  • Organization: IMDEA Networks Institute; NETCOM Research Group (Telematics Engineering Department, UC3M)
  • Time: 13:00
  • Add to Calendar: iCalendar Outlook Google