NTCIR We Want Web with CENTRE Task




Last updated: May 5, 2020.


The NTCIR-15 WWW-3 English Subtask

This page contains information specific to the English subtask of the NTCIR-15 WWW-3 task.
For details of the task, please visit the WWW-3 task page.

Target Corpus


As in the previous rounds of the WWW English subtask, the search target corpus for the WWW-3 English subtask is clueweb12-B13. To obtain this corpus, please visit the clueweb12 webpage and follow the procedure. You only need to pay for the hard disk and the shipment. Once your organisation has obtained a clueweb licence, you may optionally utilise the clueweb online service.

Topics and Qrels

WWW-2 (0001-0080) + WWW-3 (0101-0180) English topics (XML file)
(If you are in a country where Box is blocked, use this link instead.)
WWW-2 English qrels (zip file)
(If you are in a country where Box is blocked, use this link instead.)

Additional materials:
WWW-1 English topics (XML file: note that the topics are different from the WWW-2 topics despite the overlapping topic IDs)
Original NTCIR-13 WWW-1 English qrels
CENTRE qrels (WWW-1 English qrels plus additional relevance assessments done at NTCIR-14 CENTRE)

Baseline run

You may optionally participate in the task without indexing the entire target corpus, by simply reranking our baseline BM25 run (top 1000 documents for each topic). The baseline run file and the html files of the retrieved URLs can be downloaded from here (password-protected – please register to the task first).
(If you are in a country where Box is blocked, use this link instead.)

English Run types

First, please see the task definition slides on our main WWW-3 webpage.
There are three types of runs.

REV (revived) runs

Only the Tsinghua University team (THUIR) is eligible to submit these runs.
The REV A-run is a rerun of THUIR-E-CO-MAN-Base-2 (LambdaMART) from the WWW-2 task;
the REV B-run is a rerun of THUIR-E-CO-PU-Base-4 (BM25) from the same task.
The only difference is that the REV runs process both the WWW-2 and WWW-3 test topics.
Note that these runs should not utilise the WWW-2 qrels in any way, since the qrels file was file not available at the time of WWW-2 run submission.

REP (replicated/reproduced) runs

Participants may try replicating/reproducing the above THUIR runs by reading their NTCIR-14 WWW-2 participant paper; these are called REP A-runs and REP B-runs. Please submit both A-run and B-run as we are interested in the gain of the A-run over the B-run (i.e., the effect size). The WWW-2 topic set portion of each REP run will be used to investigate replicability; the WWW-3 topic set portion of each REP run will be used to investigate reproducibility.
Note that these runs should not utilise the WWW-2 qrels in any way, since the qrels file was not available at the time of WWW-2 run submission.

NEW runs

You are of course welcome to try your own algorithms. Anything that is neither REV nor REP is called a NEW run. Let’s advance the state-of-the-art in web search!
These runs can utilise the WWW-2 qrels or the WWW-1 qrels or whatever resources available.

Links


NTCIR-15 WWW-3 Chinese subtask
NTCIR-15 WWW-3 task
NTCIR-15