https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html https://www.euro-online.org/enog/inoc2007/Papers/m https://www.euro-online.org/enog/inoc2007/Papers/mac-slots.html

NIR news 2014 data analysis challenge

Gerard Downey
Ashtown Food Research Centre, Dublin, Ireland

This year, to help all of you with time to fill over the Christmas and New Year break, we have decided to offer a data analysis challenge. The problem to be solved is described in the text below and the dataset may be downloaded from the Internet; see bottom of page. I would like to receive your answers to the problem described by 20 January 2015 ([email protected]) and I will forward them to a three-person committee with the necessary expertise to identify a winning entry. The winner will be the person who delivers the model with the highest accuracy and, in the case of a tie, the greatest elegance. Whilst the fame and glory that will come to the winner should be enough, we are pleased to offer a prize of a £200 Amazon Gift Card to help buy some chemometrics books (or whatever else you wish). So give it a try… and have some fun with it!

The damage caused by nematode infestations of sugar beet roots eventually leads to a reduction in sugar yield. The size of this reduction is related to the number of cysts present. The current challenge is to detect and quantify by hyperspectral NIR imaging the presence of cyst nematodes on sugar beet root samples.

For this experiment, 20 sugar beet plants with different levels of resistance were grown in a soil support, spread in plastic plates and infested with nematodes. The number of cysts in each sample was independently counted by optical microscopy. Then, one image for each plant was acquired with a pushbroom NIR hyperspectral imaging system (Burgermetrics, Latvia). All the images consisted of lines of 320 pixels acquired at 209 wavelength channels (1100–2400 nm) with a spectral resolution of 6.3 nm and result from averaging 32 scans at each line. This resulted in around 300 lines, 100,000 pixels depending on the sample.

Images 1–14 include reference values and are to be used to develop models for quantifying cysts. Samples 1–4 also include two RGB images (A: original image; B: image with cysts identified in red). Images 15–20 are test samples which are to be used to validate the models (reference values are not included). A summary of the sample information is provided in the table.

Sample number

Number of cysts

RGB image

NIR image

Set

1

24

yes

yes

Cal

2

49

yes

yes

Cal

3

70

yes

yes

Cal

4

82

yes

yes

Cal

5

33

no

yes

Cal

6

35

no

yes

Cal

7

43

no

yes

Cal

8

50

no

yes

Cal

9

51

no

yes

Cal

10

51

no

yes

Cal

11

55

no

yes

Cal

12

66

no

yes

Cal

13

76

no

yes

Cal

14

77

no

yes

Cal

15

 

 

yes

Test

16

 

 

yes

Test

17

 

 

yes

Test

18

 

 

yes

Test

19

 

 

yes

Test

20

 

 

yes

Test

The challenge is to provide estimates of the number of cysts for the calibration set (Images 1–14) and test set (Images 15–20). Results are to be reported in a Word document with a clear description of the overall approach and methodology used. Additionally, for Samples 19 and 20, images should be provided that clearly indicate the locations of the predicted cysts.

Dataset download

The dataset files are contained in two ZIP files; one for the RGB images and one for the NIR images. The NIR Image ZIP file is very large (about 630 MB) and will take some time to download. The RGB Image ZIP file is about 3 MB.
Both files are hosted on the Amazon S3 service to provide the best download performance wherever you are located. Go to: http://nirn-challenge.s3.amazonaws.com/index.html to download them.

I wish to thank Dr Pierre Dardenne and his colleagues at CRA-W, Belgium for providing this dataset and the research problem in memory of our late colleague Jim Burger. Good luck!