CYBERCUP.AI
LLM Backdoor
The LLM Backdoor Competition challenges participants to detect and recover hidden triggers in LoRA-adapted models, advancing research into AI supply chain security.
What is LLM Backdoor?
The LLM Backdoor Competition challenges participants to identify and recover hidden triggers implanted into large language models (LLMs) that have been fine tuned using Low-rank adaptation (LoRA). These backdoors cause models to behave normally on clean inputs but misbehave when specific trigger patterns appear.
Participants will work on two tasks:
- • Sentiment Misclassification – Detect triggers that cause the model to incorrectly classify sentiment when activated.
- • Targeted Refusal – Identify triggers that force the model to output a fixed refusal response regardless of the input.
Using provided LoRA-adapted backdoored models, participants must design methods to discover or detect hidden triggers by analyzing model behavior or internal weight changes. The competition includes development and blind test phases, and submissions are evaluated using precision, recall, and F1 score based on fuzzy substring matching between predicted and true triggers.
The goal of this competition is to advance LLM backdoor detection techniques and AI supply chain security by encouraging innovative, explainable, and effective trigger recovery strategies.