DarkBERT: The Darknet-Trained ChatGPT’s Sinister Sibling

DarkBERT is trained with data from the Darknet - ChatGPT's dark brother?

DarkBERT: A Language Model Trained on the Dark Web for Cybersecurity and Law Enforcement Agencies

DarkBERT is a new large language model (LLM) created by a team of South Korean researchers to address the challenges of understanding the language of the Dark Web. The team developed the model using data exclusively from the dark web obtained via crawl, with the aim of improving contextual understanding of language in the domain that could provide valuable insights for cybersecurity and law enforcement.

The researchers then compared DarkBERT with existing models, BERT and RoBERTa, and found it outperformed both in terms of domain knowledge for the dark web, albeit slightly. The target group for this model is not cybercriminals but law enforcement agencies that surveil the dark web for illegal activities.

Surfing the Dark Web could be important not only for cybercriminals and hackers but also journalists, opposition members, and people who care about their privacy. The Tor browser offers a way for users to remain anonymous while browsing, chatting, and emailing. Although special software is required to access the Dark Web, it is also used to access censored content.

DarkBERT’s creators have no plans to make the model publicly available. DarkBERT is not freely accessible, and there are no plans to make the model available to the public, according to the arXiv preprint. However, similar approaches could be interesting for cybersecurity authorities if they are combined with real-time search to monitor relevant forums or illegal activities.

Leave a Reply