NTCIR We Want Web with CENTRE Task

Last updated: September 22, 2020.
Twitter: @ntcirwww

Task definition slides updated August 20, 2019.
WWW-3 (and DialEval-1) task definitions in Japanese presented on September 10, 2019.

The topic files (80 WWW-2 topics + 80 WWW-topics) are now available! (March 25, 2020) Chinese English
(If you are in a country where Box is blocked, try these links instead: Chinese English)
The Chinese and English baseline runs and their HTML files are now available! (April 13, 2020) download (password-protected)
(If you are in a country where Box is blocked, try this instead: download (password-protected))
You can also download the WWW-1 topics and qrels from the subtask pages.

The NTCIR-14 We Want Web task organisers and the CENTRE (CLEF NTCIR TREC Reproducibility) organisers have joined forces to quantify technological advances, replicability, and reproducibility in web search!

The task consists of Chinese and English subtasks. This page contains general information about the task. If you are interested in the Chinese subtask, please also visit our Chinese subtask page. If you are interested in the English subtask (which addresses replicability and reproducibility as well as the usual adhoc web search), please also visit our English subtask page.

Please note that all runs are required to process all 160 topics – 80 WWW-2 test topics plus the new 80 WWW-3 test topics.

Important Dates (Timezone: Japan (UTC+9))

May 4, 2020 Task registrations due [DONE]
May 31, 2020 Run submissions due [DONE]
June-July 2020 Relevance assessments [DONE]
Aug 31, 2020 Evaluation results released [DONE]
Sep 20, 2020 Draft participants papers due [DONE]
Oct 1, 2020 Task organisers’ feedback to participants [DONE]
Nov 1, 2020 All camera ready papers due
Dec 8-11, 2020 NTCIR-15 Conference

Registration (NOW CLOSED)

To register, please do both (A) and (B):

(A) Register online at the NTCIR registration page

(B) Send an email to www3org@list.waseda.jp
with the following information so that we can send you the training data and the download password asap:
- Team Name
- Principal investigator’s name, affilication, email address
- Names, affiliations, email addresses of other team members
- Subtasks that you plan to participate: Chinese, English, or BOTH

Data (target corpora, topics, qrels, baseline runs, and additional resources)

Chinese subtask data: please visit the WWW-3 Chinese subtask page.
English subtask data: please visit the WWW-3 English subtask page.

Run submissions (NOW CLOSED)

Each team is allowed to submit up to 5 Chinese runs and 5 English runs.
Runs should be generated automatically; no manual intervention is allowed.

Run file name

The name of the zip file for uploading should be of the form
Note that this file should contain no more than 10 runs (up to 5 Chinese and 5 English runs).

Each run file should be named as follows:
Run file names should NOT have the “.txt” suffix.

{C,E}: C means Chinese subtask; E means English subtask.
{CO,DE,CD}: CO if your run used only the CONTENT field in the topic file (“title” in TREC parlance); DE if your run used only the DESCRIPTION field; CD if your run used both.
{REV,REP,NEW}: REV for revived runs from Tsinghua (English only), REP for replicated/reproduced runs (English only), NEW for original runs (Chinese and English).
priority: an integer between 1 and 5, indicating which runs should be prioritised for the inclusion into the pools for relevance assessments. (But hopefully we will include all submitted runs into the pool.)

Run file format

The format is the same as those in previous WWW tasks. This is the typical TREC run format, except for the first line in the file.

The first line of the run file should be of the form:
<SYSDESC>[insert a short English description of this particular run]<SYSDESC>
<SYSDESC>BM25F with Pseudo-Relevance Feedback]<SYSDESC>

The rest of the file should be of the form:
[TopicID] 0 [DocumentID] [Rank] [Score] [RunName]
0001 0 clueweb12-0006-97-23810 1 27.73 WASEDA-E-CD-NEW-1
0001 0 clueweb12-0009-08-98321 2 25.15 WASEDA-E-CD-NEW-1

Note that the run files should contain the results for 160 topics (80 WWW-2 test topics + 80 WWW-3 test topics).
In each run file, please do not include more than 1000 documents per topic.
(The pool depth is expected to be around 20-30.
The measurement depth will be 10: nDCG@10, Q@10, ERR@10 will be used for evaluation.)

Your runs will be evaluated as fully ordered lists, by processing the ranked document IDs as is, using NTCIREVAL.

Where to submit your runs

Please submit exactly one zip file (see above) as an email attachment to www3org@list.waseda.jp
with the email subject “WWW-3 run submission ” e.g. “WWW-3 run submission WASEDA”
by the above deadline.

Late submissions cannot be accepted as we need to create pool files right after the deadline.


Zhumin Chu (Tsinghua University, P.R.C.)
Zhicheng Dou (Renmin University of China, P.R.C.)
Nicola Ferro (University of Padua, Italy)
Yiqun Liu (Tsinghua University, P.R.C.)
Maria Maistro (University of Padua, Italy)
Jiaxin Mao (Tsinghua University, P.R.C.)
Tetsuya Sakai (Waseda University, Japan)
Ian Soboroff (NIST, USA)
Sijie Tao (Waseda University, Japan)
Zhaohao Zeng (Waseda University, Japan)
Yukun Zheng (Tsinghua University, P.R.C.)

INQUIRIES: www3org@list.waseda.jp



NTCIR-15 WWW-3 Chinese subtask page
NTCIR-15 WWW-3 English subtask page
CENTRE (at CLEF 2018-2019, NTCIR-14, and TREC 2018)
NTCIR-14 WWW-2 webpage
NTCIR-13 WWW-1 webpage
NTCIREVAL for computing MSnDCG@10, Q@10, and nERR@10 (the effectiveness measures used in the WWW task)