Track Overview

The PROCEDURE Track is the long-context video track of the ORena SAVE FOCUS Challenge. It evaluates whether a submitted algorithm can answer clinically relevant questions from a laparoscopic video context up to a full procedure, using only the provided video context and the provided question.

This page provides the technical track specification. Dataset composition, taxonomy details, resources, and general challenge background are described in the corresponding Overview and Data tabs.

PROCEDURE Track

Description

The PROCEDURE Track focuses on long-context surgical video understanding. Each task instance consists of a laparoscopic procedure-level video context and a natural-language question about foreign objects, actions, events, object-related context, or retrieval status within the provided video context. The algorithm must return a short text answer.

The track targets all capabilities listed in the taxonomy overview in the taxonomy overview in the data section, provided that the answer can be inferred from the supplied procedure-level video context and its associated question metadata.

Algorithm Docker Input

The algorithm input consists of the procedure-level video context and the question. The question includes the metadata and the question text itself.

Video context	Laparoscopic video context up to a full procedure.
Question	Natural-language VQA question including the relevant metadata, such as procedure name, expected output, and list of foreign objects.

The exact file structure and schema will follow the official submission template repository.

Algorithm Docker Output

Answer

Short text answer to the provided question.

The exact output format and validation rules will follow the official submission template repository.

Runtime Environment

AWS Hardware
NVIDIA H100 GPU
80GB VRAM

Time Limit
30 seconds per question

Execution
Docker container
No internet access during inference

Evaluation Scope

PROCEDURE submissions are evaluated on long-context surgical video question answering. Questions are restricted to information that can be inferred from the provided procedure-level video context and question metadata. The track evaluates persistent object tracking, temporal grounding, aggregation over time, event and procedural understanding, retrieval-status reasoning, and visually grounded complex reasoning over extended surgical context.

Official Track Document

For the full formal specification, please consult the official PROCEDURE Track document:

👉 PROCEDURE Track PDF