Announcing the winners of the Adaptive Prompt Injection Challenge (LLMail-Inject)

MSRC

/ By MSRC / March 14, 2025 / 6 min read

We are excited to announce the winners of LLMail-Inject, our first Adaptive Prompt Injection Challenge! The challenge ran from December 2024 until February 2025 and was featured as one of the four official competitions of the 3rd IEEE Conference on Secure and Trustworthy Machine Learning (IEEE SaTML). The overall aims of this challenge were to advance the state-of-the-art defenses against indirect prompt injection attacks and to broaden awareness of these new techniques. We provided researchers with a platform through which they could develop and test new indirect prompt injection attacks against specific defenses. The data obtained from this challenge will enable us, and others, to evaluate existing defenses and develop new, more advanced defenses.

The challenge

The challenge simulated an environment where an LLM-integrated email client, the LLMail service, can read emails and take actions on behalf of a user, including sending emails.

Participants took the role of an attacker who could send an email to the (victim) user. The attacker’s goal was to cause the user’s LLM to perform a specific action, which the user has not requested. To achieve this, the attacker had to craft their email so that it was retrieved by the LLM and bypassed the relevant prompt injection defenses. In this challenge, all the defenses were known to the attacker, allowing participants to adapt their attacks to each defense.

The challenge scenarios differed in complexity based on the number of emails in the context window, the location of the attacker’s emails, whether the attacker’s email was retrieved by default, and whether it needed to exfiltrate data from the user’s inbox.

We employed several state-of-the-art defenses. The defenses ranged from text-based classifiers (Prompt Shields) [1], classifiers on models’ hidden states (TaskTracker) [2], LLM-as-a-judge, Spotlighting [3], and a combination of all defenses. For each scenario and defense, we provided two LLMs: microsoft/Phi-3-medium-128k-instruct and GPT-4o-mini, which was trained with instruction hierarchy [4].

Each combination of scenario, LLM, and defense formed a separate level. Teams competed for a prize pool of 10,000 USD to solve as many levels as possible, with bonus points awarded for being among the first teams to solve a level, and for solving levels that were solved by fewer teams overall.

The results

The challenge had very active participation! At the end of the challenge, there were 621 registered participants, grouped into 224 teams. We received a total of 370,724 submissions, where each submission was an attempt to solve a single level. We’re currently analyzing the full submission dataset and plan to provide detailed analysis in the coming months.

Congratulations to everyone who participated – we hope this challenge provided a useful opportunity to learn about AI security.

Special congratulations to the top four teams on the leaderboard! Since many participants have asked us about strategies, we reached out to these top four teams to ask them to share a bit about their strategies and their thoughts on the challenge overall.

Grand prize winner: Team “TH3L053R5”

The highest-scoring team consisted of only a single member (@0xSombra), who solved a total of 36 levels and was often one of the first to solve a level. They shared that they found it more effective to iteratively build up and test prompts rather than using large, complex ones from the start. They rightly pointed out that, in some real-world scenarios, attackers might also have the ability to see the outputs of the LLM. However, as this challenge has shown, this is not always necessary for developing successful attacks.

“The challenge defenses were fun to figure out and break, but I would have liked to be able to read the AI’s response (an attacker would test on their own accounts first).”

@0xSombra, Grand prize winner

First prize: Team “Abyss Watchers”

The second-highest score was obtained by a team from Trend Micro consisting of Jay K Liao, Ian CH Liu, Tony Kuo, Jannis Weigend, and Danyael Manlangit. This team also solved 36 levels. They shared that their prior experience in developing prompt injection detections gave them a significant advantage in the early stages. As the competition progressed, they managed to improve their ranking using different attack strategies tailored to the attention behaviors of the two models.

“We gained valuable insights from this competition, which we believe will greatly benefit our future work.”

Team “Abyss Watchers”, First prize winners

Second prize: Team “Enter a team name...”

The second prize winner was Tran Huu Bach, who successfully solved 35 levels. While they didn’t share too much about their strategy, they mentioned that the challenge is beginner-friendly because it does not require extensive knowledge or reading.

“I could just approach it like solving a puzzle rather than overthinking it, which made the competition enjoyable, so I’m very satisfied with my experience.”

Tran Huu Bach, Second prize winner

Third prize: Team “A helpful assistant”

The third prize winner was Tsun-Han Chiang (also from Trend Micro), who also solved 35 levels. Their main strategy was to use the special token in the response, as they observe that these tokens can improve the success rate. For example, they used some common special tokens like <|end|>, <|user|>, and <|assistant|>. As defensive methods like LLM-as-judge will detect this approach, they tried using HTML (which contains many symbols that special tokens will use) to pass the defender. They also experimented with using other languages and adding random characters or removing the text randomly.

“In general, I think this is a very interesting challenge, I learned a lot from reading the reference of the defense techniques and analyzing the attack result. As my current work is related to LLM pre-training and supervised fine-tuning, the competition helps me re-think the importance of LLM safety.”

Tsun-Han Chiang, Third prize winner

The feedback

One of the reasons we ran this challenge was to help educate the broader community about indirect prompt injection and to give people a safe yet realistic environment in which to try it out for themselves.

We’re very happy to see the number of people who participated overall, and we also received very encouraging feedback from several anonymous participants.

“I participated in the LLMail Inject contest and found it to be a very insightful experience.”

“We are enjoying the competition very much! We like the different scenarios and the website works like a charm.”

“I am really enjoying this experience and spending a big part of my time in trying to break all of the defenses.”

“We had a lot of fun and learned a lot during the course of this challenge.”

“Thank you for running such a fun competition. I read so many Arxiv papers and learned a ton about prompt injection techniques and even more about LLMs in general.”

“Thank you for hosting such a great competition. We sincerely appreciate the effort and organization that went into it.”

If you participated in the challenge and wanted to share any feedback with us, we’d love to hear from you: llmailinject@microsoft.com

Announcing Re:LLMail-Inject

We are excited to announce the next challenge Re:LLMail-Inject, which started on March 13th! As before, the challenge website is https://llmailinject.azurewebsites.net/

For this new challenge, we’ve reused two of the scenarios you’ve already seen, but we’ve improved the defenses. In particular, we’ve added a new high-precision blocklist based on previous submissions. This blocklist is designed to block successful submissions from the first challenge, including paraphrases of these submissions. We’ve added input sanitization, updated the LLM-as-a-judge prompt, upgraded to the latest Prompt Shields model, and updated TaskTracker to use newer LLMs. We’ve also made changes to the system prompt and the user’s query to encourage the model to not follow instructions found in emails.

There’s a total of $6,000 USD in prizes for the top three teams in this new challenge. We invite participants to think of new strategies, solve new levels, and push the frontiers of indirect prompt injection defenses!

References

[1] Azure AI announces Prompt Shields for Jailbreak and Indirect prompt injection attacks

[2] Sahar Abdelnabi et al. Are you still on track!? Catching LLM Task Drift with Activations

[3] Keegan Hines et al. Defending Against Indirect Prompt Injection Attacks With Spotlighting

[4] Eric Wallace et al. The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Organizers

The competition is jointly organized by:

Aideen Fay*¹, Sahar Abdelnabi*¹, Benjamin Pannell*¹, Giovanni Cherubin*¹, Ahmed Salem1, Andrew Paverd¹, Conor Mac Amhlaoibh¹, Joshua Rakita¹, Santiago Zanella-Beguelin¹, Egor Zverev², Mark Russinovich¹, and Javier Rando³.

Microsoft (1), ISTA (2), ETH Zurich (3), Core organizer (*)