Author |
Message |
Ciaccheri Leonardo (leonardo)
Member Username: leonardo
Post Number: 12 Registered: 5-2010
| Posted on Thursday, March 28, 2013 - 8:34 am: | |
Hi Bilal, according to what my teacher said, it should be better to put all replicas of the same sample in the same set (calibration or validation). This because, if a sample is somewhat contaminated it have influence on all its spectra. If you split those spectra, they will bias both calibration and validation sets. Thus your estimation of SEP will be optimistic. Leonardo Ciaccheri |
Bilal Ahmad Malik (elp09bm)
Member Username: elp09bm
Post Number: 14 Registered: 7-2011
| Posted on Thursday, March 28, 2013 - 4:59 am: | |
f we collect 90 NIR spectra from 30 samples by repeating the spectra for each sample 3 times. what will be the advantage of using 3 replicates of each sample. What is the best way to split the data for the training set and test set? Could we simply split the data into 2/3 and 1/3 for training and test set resp. or should we split the three replicates for each sample? Will that affect the results? What will be recommended way. If the dataset is small. Is it better to use cross validation only? Will cross validation be sufficient enough. say "10-fold cross validation" or leave one out cross validation. Will it more appropriate to do more random splits (2/3,1/3) say 100 splits and see if the results are held up across say 95 splits out of 100. Will this be better approach instead of 1 split or Leave-one-out-cross validation will nearly give same results. |
|