**Last updated: 12th June, 2018
**

Discpower for computing discriminative power using the randomised Tukey HSD test (Tetsuya Sakai)

NTCIREVAL for computing various IR effectiveness measures (Tetsuya Sakai)

NTCIRPOOL for making pool files for relevance assessments (Tetsuya Sakai)

BOOTS for conducting paired bootstrap tests (Tetsuya Sakai)

Data from Sakai’s CLEF20 book chapter: How to Run an Evaluation Task

Tutorial kit for Sakai SIGIR 2018: Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial (includes the five excel files listed below besides other materials)

samplesizeANOVA2.xlsx for computing topic set sizes to achieve high statistical power for one-way ANOVA (recommended: See Sakai’s book)

samplesize2SAMPLECI.xlsx for computing topic set sizes to achieve a tight confidence interval for paired data (recommended: See Sakai’s book)

samplesize2SAMPLET.xlsx for computing topic set sizes to achieve high statistical power for the two-sample t-test

samplesizeTTEST2.xlsx for computing topic set sizes to achieve high statistical power for the paired t-test

samplesizeCI2.xlsx for computing topic set sizes to achieve a tight confidence interval for unpaired data

Data from Sakai ICTIR 2016: A Simple and Effective Approach to Score Standardisation

Data and R code from Sakai SIGIR2016: Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS, 2006-2015

Data from Sakai CIKM 2014: Designing Test Collections for Comparing Many Systems

samplesizeTTEST.xlsx for computing topic set sizes to achieve high statistical power for the paired t-test (Sakai CIKM 2014: Designing Test Collections for Comparing Many Systems): This is an old version; use the above latest version instead

samplesizeANOVA.xlsx for computing topic set sizes to achieve high statistical power for one-way ANOVA (Sakai CIKM 2014: Designing Test Collections for Comparing Many Systems): This is an old version; use the above latest version instead

samplesizeCI.xlsx for computing topic set sizes to achieve a tight confidence interval for paired data (Sakai FIT 2014: Designing Test Collections That Provide Tight Confidence Intervals): This is an old version; use the above latest version instead