ORena SAVE FOCUS Challenge — PROCEDURE Track
Foreign Object Contextual Understanding in Surgery
Highlighted announcements
Make sure to sign up for the "ORena FOCUS Challenge introduction" Kick-off Webinar on 2026-05-28!
|
Long-context surgical VQA for foreign object understandingThis is the PROCEDURE Track of the ORena SAVE FOCUS Challenge. The track evaluates whether vision-language models can answer clinically relevant questions from long laparoscopic video contexts up to full procedures, focusing on long-horizon memory, persistent foreign object tracking, aggregation over time, and retrieval-status reasoning. The broader ORena SAVE FOCUS Challenge benchmarks vision-language models on clinically grounded visual question answering for foreign object understanding in minimally invasive surgery. The goal is to advance AI methods that can support intraoperative quality assurance and patient safety. The PROCEDURE Track is the most demanding track of the challenge. It tests whether models can reason over extended surgical video contexts where safety-relevant information may depend on events that occurred much earlier in the operation. |
Start here |
Why this challenge matters
Clinical relevanceIn minimally invasive surgery, foreign objects such as sponges, needles, clips, drains, specimen bags, and similar objects may be introduced into the abdominal cavity during a procedure. Retained foreign objects after major operations are rare but clinically relevant adverse events associated with patient harm [Badiee et al., 2025]. |
Technical challengeForeign object understanding over full procedures requires models to maintain consistent representations of objects across long videos. Long-video benchmarks have shown the importance of evaluating models beyond short clips by requiring reasoning over extended visual context [Wu et al., 2024]. |
Benchmark at a glance
|
Task type Long-context surgical video question answering |
Input long surgical video (up to full procedure) + meta data (type of procedure) + question |
Output short text answer |
Focus Long-context foreign object understanding |
|
PROCEDURE time budget 30 seconds per question |
PROCEDURE hardware 80GB VRAM GPU |
Prize pool $50k+ across tracks |
Submission Docker container |
The three ORena SAVE FOCUS tracks
FRAME
|
SEGMENT
|
PROCEDURE
|
PROCEDURE Track
The PROCEDURE Track evaluates a model’s ability to answer clinically relevant questions from long laparoscopic video contexts up to full procedures. The task targets long-context surgical video understanding skills such as:
- persistent foreign object tracking across extended surgical video
- long-horizon memory for objects inserted, manipulated, occluded, or retrieved earlier in the procedure
- aggregation of foreign object counts and events over time
- retrieval-status reasoning for objects that must be accounted for before the end of the operation
- complex reasoning across multiple objects, time points, and surgical events
The input consists of a long procedure video context, the procedure name as meta data and a question. The submitted algorithm must return a text answer. All methods must be fully automated.
Algorithm inputWhole procedure video, meta data (type of procedure, timestamp) + question Exact input format will follow the official submission template repository. |
Algorithm outputShort text answer Exact answer formatting and validation details will follow the official submission template repository. |
Data and scientific background
The first released data batch, HeiCo-FOCUS, is based on Heidelberg colorectal surgery videos and provides clinically grounded VQA pairs for foreign object understanding. The dataset covers five capability categories: object recognition and identity matching, temporal grounding, aggregation, event and procedural understanding, and complex reasoning.
The PROCEDURE Track builds on prior work in surgical visual question answering, where models answer clinically relevant questions from surgical scenes [Seenivasan et al., 2022].
The PROCEDURE Track also connects to the broader development of long-video understanding benchmarks, which evaluate whether multimodal models can reason beyond isolated frames and short static contexts [Fu et al., 2025].
For the PROCEDURE Track, the focus is on the long-context part of this benchmark. This provides a demanding setting for evaluating whether models can maintain object identity, aggregate evidence, and answer safety-relevant questions over extended surgical video contexts.
|
First data batch HeiCo-FOCUS VQA |
Number of videos 30 |
Expert involvement Clinical and technical experts |
Motivation Foreign object safety and long-context understanding |
Figure 1: Overview of the HeiCo-FOCUS benchmark, showing a) the clinical motivation and b) providing an overview of the first batch dataset.
Submission and evaluation
- Submissions must be made through the challenge website.
- Algorithms are submitted as Docker containers.
- Containers must run without internet access.
- Inference is limited to a single GPU.
- The PROCEDURE Track time budget is 30 seconds per question on an 80GB VRAM GPU.
- The PROCEDURE Track includes a technical leaderboard and a clinical leaderboard.
- During pre-evaluation, each team may submit up to 10 times, subject to possible adjustment depending on compute constraints.
- For the PROCEDURE Track, teams must beat both baselines on at least one of the leaderboards, technical or clinical, to proceed to the final test stage.
- Teams must submit a method description with sufficient technical detail for interpretation of the results.
Prizes and recognition
$50k+ prize poolA prize pool of at least $50k has been secured across the ORena SAVE FOCUS Challenge tracks. The PROCEDURE Track is planned to receive approximately 40% of the total prize money, split approximately equally between the Technical and Clinical leaderboards. |
Publication opportunityTeams that beat the baselines may be invited as co-authors on the planned challenge publication, subject to the official rules and submission requirements. |
Resources
| Registration | Register for the ORena SAVE FOCUS Challenge |
| Central forum | ORena SAVE FOCUS Forum |
| First data batch | HeiCo-FOCUS VQA on Hugging Face |
| Python package | orena-focus GitHub repository |
| Submission template | Will be released soon. |
Webinar recording
The ORena SAVE FOCUS webinar recording is available here after May 28th: