NTCIR We Want Web with CENTRE Task




Last updated: October 10, 2019.
Twitter: @ntcirwww

Task definition slides updated August 20, 2019.
WWW-3 (and DialEval-1) task definitions in Japanese presented on September 10, 2019.

The NTCIR-14 We Want Web task organisers and the CENTRE (CLEF NTCIR TREC Reproducibility) organisers have joined forces to quantify technological advances, replicability, and reproducibility in web search!

The task consists of Chinese and English subtasks. This page contains general information about the task. If you are interested in the Chinese subtask, please also visit our Chinese subtask page. If you are interested in the English subtask (which addresses replicability and reproducibility as well as the usual adhoc web search), please also visit our English subtask page.

Please note that all runs are required to process all 160 topics – 80 WWW-2 test topics plus the new 80 WWW-3 test topics.

Important Dates (Timezone: Japan (UTC+9))

Oct 2019 Task registration open – participants can start CENTRE experiments
Feb 2020 WWW-3 test topics and baseline runs released
April 2020 Task registrations due
May 2020 Run submissions due
June-July 2020 Relevance assessments
Aug 2020 Evaluation results released
Dec 2020 NTCIR-15 Conference at NII, Tokyo, Japan
March 2021 Publication of post-conference proceedings

Registration (IN PREPARATION)

To register, please send an email to www3org@list.waseda.jp
with the following information so that we can send you the training data.
- Team Name (e.g. Waseda)
- Principal investigator’s name, affilication, email address
- Names, affiliations, email addresses of other team members
- Subtasks that you plan to participate: Chinese, English, or BOTH
(Later, NII will require you to register to NTCIR tasks through their website, but please contact us by email first.)

Data (target corpora, topics, qrels, baseline runs, and additional resources)

Chinese subtask data: please visit the WWW-3 Chinese subtask page.
English subtask data: please visit the WWW-3 English subtask page.

Run submissions (IN PREPARATION)

Each team is allowed to submit up to 5 Chinese runs and 5 English runs.
Runs should be generated automatically; no manual intervention is allowed.

Run file name

The name of the zip file for uploading should be of the form
[TEAMNAME].{zip,gz}.
Note that this file should contain no more than 10 runs (up to 5 Chinese and 5 English runs).

Each run file should be named as follows:
[TEAMNAME]-{C,E}-{CO,DE,CD}-{REV,REP,NEW}-<priority>
e.g.
WASEDA-E-CD-NEW-1
Run file names should NOT have the “.txt” suffix.

{C,E}: C means Chinese subtask; E means English subtask.
{CO,DE,CD}: CO if your run used only the CONTENT field in the topic file (“title” in TREC parlance); DE if your run used only the DESCRIPTION field; CD if your run used both.
{REV,REP,NEW}: REV for revived runs from Tsinghua (English only), REP for replicated/reproduced runs (English only), NEW for original runs (Chinese and English).
priority: an integer between 1 and 5, indicating which runs should be prioritised for the inclusion into the pools for relevance assessments. (But hopefully we will include all submitted runs into the pool.)

Run file format

The format is the same as those in previous WWW tasks. This is the typical TREC run format, except for the first line in the file.

The first line of the run file should be of the form:
<SYSDESC>[insert a short English description of this particular run]<SYSDESC>
e.g.
<SYSDESC>BM25F with Pseudo-Relevance Feedback]<SYSDESC>

The rest of the file should be of the form:
[TopicID] 0 [DocumentID] [Rank] [Score] [RunName]
e.g.
0001 0 clueweb12-0006-97-23810 1 27.73 WASEDA-E-CD-NEW-1
0001 0 clueweb12-0009-08-98321 2 25.15 WASEDA-E-CD-NEW-1
:

Note that the run files should contain the results for 160 topics (80 WWW-2 test topics + 80 WWW-3 test topics).

Your runs will be evaluated as fully ordered lists, by processing the ranked document IDs as is, using NTCIREVAL.

Where to submit your runs


Registered teams will receive a password and a privilege to upload a file to this site. Each team should upload exactly one file (see above). It is okay to overwrite your file until the submission deadline. Any uploads after the submission deadline will be ignored.

Organisers


Zhicheng Dou (Renmin University of China, P.R.C.)
Nicola Ferro (University of Padua, Italy)
Yiqun Liu (Tsinghua University, P.R.C.)
Maria Maistro (University of Padua, Italy)
Jiaxin Mao (Tsinghua University, P.R.C.)
Tetsuya Sakai (Waseda University, Japan)
Ian Soboroff (NIST, USA)
Sijie Tao (Waseda University)
Zhaohao Zeng (Waseda University)
Yukun Zheng (Tsinghua University)

INQUIRIES: www3org@list.waseda.jp

References

Links


NTCIR-15 WWW-3 Chinese subtask page
NTCIR-15 WWW-3 English subtask page
NTCIR-15
CENTRE (at CLEF 2018-2019, NTCIR-14, and TREC 2018)
NTCIR-14 WWW-2 webpage
NTCIR-13 WWW-1 webpage
NTCIREVAL for computing MSnDCG@10, Q@10, and nERR@10 (the effectiveness measures used in the WWW task)