Psychophysical evaluation of large sample sets was studied with reference to the bi-annual International Newspaper Color Quality Club (INCQC) Jury Evaluations, in which approximately 150-170 prints of the same image are assessed by category judgment under 10 different quality criteria by a panel of expert observers. Data from three consecutive events was analyzed. Between the 2002 and 2004 INCQC events, a series of experiments were performed using sub-sets of the CQC prints, in order to evaluate some psychophysical techniques that could affect the reliability and precision of the results. The results of these experiments led to a number of proposed modifications to the category judgement task used in the Jury Evaluation. These included a reduction in the number of attributes and judgement categories, and the adoption of an anchor image. These proposed modifications were adopted in the 2004 Jury Evaluation, and the results show an improvement in the intra-observer repeatability of judgements, but there was no significant change in inter-observer variation.