R2C2 run submission instructions


Dry run submission instructions version 20251219

 

First, please go through our SIGIR-AP 2025 workshop paper and slides.

 

To participate in the R2C2 task, please register first through the NTCIR website. Make sure you choose YES to “R2C2 task participation.”
Then also please contact us (organisers) by email (ntcir19r2c2org@list.waseda.jp), stating your GroupID, GroupID, and the names and contact email addresses, so that we can share our movie corpus with you right away.

 

A few toy run files (for both PR and AC subtasks) can be found here


Participating in the PR (Passage Retrieval) subtask

 

Step 1. Please obtain the Wikipedia/Wookieepedia movie corpus (with organisers’ passages) from the organisers.
Step 2. If you want to create your own passages from the corpus, please do so. Otherwise use the organisers’s passages in the following steps.
Step 3. Create an index etc. so that you can produce a ranked list of passages for any given query.
Step 4. Process the dry run topics with your passage retrieval system to produce your PR runs. The specifications of the PR runs are given in the following section.

 

Details about PR runs

 

NUMBER OF RUNS
We allow up to 4 PR runs from each team.

 

FILE NAMES
[TeamName]-{PG|PO}-[RunNumber]
where
PG means that the participating group generated their own passages from the documents while
PO means that the participanting group used the passages provided by the organisers; and
[RunNumber] is an integer in the [1-4] range.

 

For example, if your [TeamName] is WASEDA and used the organisers’ passages, your third PR run should be named
WASEDA-PO-3.

 

FILE FORMAT

 

Character encoding: UTF-8.

 

Your PR run file should look like this (Please check the above toy run files):

 

[qID];[PassageRank];[docID];[PassageText]
:

 

For each question, up to 20 passages may be returned.
Therefore, the range of [PassageRank] is [1-20].
[docID] is the ID of the document in the corpus from which the passage was extracted.

 

For example:
D001;1;docIDabababa;This is a passage
D001;2;docIDcdcdcdc;This is also a passage
:
D004;1;docIDefefefe;Another passage blah blah
:
D004;20;docIDxyxyxy;blah blah blah
D005:1;docIDyzyzyz;blahblahblahblah
:

 

HOW TO SUBMIT THE RUNS

 

Please create a zip file called
[TeamName]-PR.zip
e.g.
WASEDA-PR.zip
that contains all of your PR runs.

 

Please send it by email to ntcir19r2c2org@list.waseda.jp . We will send you a confirmation of receipt.

 

ORGANISERS’ PR RUNS (DRY RUN)
can be found here.

 


Participating in the AC (Answering with Confidence) subtask

 

Step 1. Download the dry run topics.
Step 2. For each topic, select passages from the PR run files. You can use the organisers’ PR runs mentioned above or create your own PR runs by participating in the PR subtask as well. Note that each AC run is allowed to utilise passages from multiple PR runs.
Step 3. For each topic, extract relevant nuggets from the passages you selected in Step 2. A nugget is a factual claim or more generally a string extracted from a given passage. It may not necessarily be a substring of the input passage. However, a nugget should ideally be a claim that is entailed by the associated passage.
Step 4. Based on the extracted nuggets, generate your answer to the question (dry run topic), as well as a confidence score about that answer in the [0-100] range.
Step 5. Format your results by following the AC run specifications below.

 

Details about AC runs

 

NUMBER OF RUNS

 

We allow up to 4 AC runs from each team.

 

FILE NAMES

 

[TeamName]-AC-[RunNumber]
where
[RunNumber] is an integer in the [1-4] range.

 

For example, if your [TeamName] is WASEDA, your third AC run should be named
WASEDA-AC-3.

 

FILE FORMAT

 

Character encoding: UTF-8.

 

Your AC run file should look like this (Please check the above toy run files):

 

<D001>
[Answer];[Confidence]
[NuggetNum];[PRrunname];[PassageRank];[Nugget]
:
[NuggetNum];[PRrunname];[PassageRank];[Nugget]
</D001>

:

<D005>
:
</D005>

 

[Answer] is the textual answer to the question;
[Confidence] is the confidence score for that answer, an integer in the [0-100] range;
[NuggetNum] is N is that line represents your N-th nugget.
[PRnunname] and [PassageRank] form a PassageKey and represents a unique passage from the PR runs.
For example, if the nugget was extracted from the third passage of a PR run called
WASEDA-PO-1, then [PRrunname]=WASEDA-PO-1 and [PassageRank]=3.

 

For example, the content of the question element for D004 might look like this:
<D004>
Anthony Mackie;90
1;WASEDA-PO-1;1;I am a nugget
2;WASEDA-PO-1;1;I am also a nugget
3;WASEDA-PO-1;2;nugget nugget nugget
4;THUIR-PG-1;1;Peking duck nugget
5;THUIR-PG-1;3;yum yum yum
</D004>

 

In this example, the first two nuggets were extracted from passage “WASEDA-PO-1;1″;
the third one from passage “WASEDA-PO-1;2″.
Similarly, the fourth nugget was extracted from passage “THUIR-PG-1;1″
and the fifth from passage “THUIR-PG-1;3″.
Note that one run can thus utilise passages from multiple PR runs.

 

If your system failed to return answer for a question,
you can leave the content of the question element empty, for example,
<D004>
</D004>

 

HOW TO SUBMIT THE RUNS

 

Please create a zip file called
[TeamName]-AC.zip
e.g.
WASEDA-AC.zip
that contains all of your AC runs.

 

Please send it by email to ntcir19r2c2org@list.waseda.jp . We will send you a confirmation of receipt.

 

ORGANISERS’ AC RUNS (DRY RUN)
can be found here.

 


Back to the R2C2 top page