14.1.4 Training Set Size, Analysis, Selection

Chapter Contents (Back)
Training Set.

Baum, L.E., Petrie, T., Soules, G., Weiss, N.,
A Maximizzation Technique Occurring in teh Statistical Analysis of Probabilistic Function of Markov Chains,
AMS(41), No. 1, 1970, pp. 164-171. BibRef 7000

Kanal, L.[Laveen], Chandrasekaran, B.,
On dimensionality and sample size in statistical pattern classification,
PR(3), No. 3, October 1971, pp. 225-234.
WWW Version. 0309 BibRef

Wilson, D.L.,
Asymptotic properties of nearest neighbor rules using edited data,
SMC(2), 1972, pp. 408-421. Remove from training set those items mis-classified by the chosen rules. BibRef 7200

Jain, A.K.[Anil K.], Dubes, R.C.[Richard C.],
Feature definition in pattern recognition with small sample size,
PR(10), No. 2, 1978, pp. 85-97.
WWW Version. 0309 BibRef

Kalayeh, H.M., Muasher, M.J., and Landgrebe, D.A.,
Feature Selection When Limited Numbers of Training Samples are Available,
GeoRS(21), No. 4, October 1983, pp. 434-438.
IEEE Top Reference. BibRef 8310

Muasher, M.J., and Landgrebe, D.A.,
The K-L Expansion as an Effective Feature Ordering Technique for Limited Training Sample Size,
GeoRS(21), No. 4, October 1983, pp. 438-441.
IEEE Top Reference. BibRef 8310

Kalayeh, H.M., Landgrebe, D.A.,
Predicting the Required Number of Training Samples,
PAMI(5), No. 6, November 1983, pp. 664-666. BibRef 8311

Muasher, M.J., and Landgrebe, D.A.,
A Binary Tree Feature Selection Technique for Limited Training Set Size,
RSE(16), No. 3, December 1984, pp. 183-194. BibRef 8412

Landgrebe, D.A., and Malaret, E.R.,
Noise in Remote Sensing Systems: Effect on Classification Accuracy,
GeoRS(24), No. 2, March 1986, pp. 294-299.
IEEE Top Reference. BibRef 8603

Shahshahani, B.M.[Behzad M.], and Landgrebe, D.A.[David A.],
The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon,
GeoRS(32), No. 5, September 1994, pp. 1087-1095.
IEEE Abstract. IEEE Top Reference.
WWW Version.
PDF Version. BibRef 9409

Hoffbeck, J.P.[Joseph P.], Landgrebe, D.A.,
Covariance-Matrix Estimation and Classification with Limited Training Data,
PAMI(18), No. 7, July 1996, pp. 763-767.
IEEE Abstract. IEEE Top Reference.
WWW Version. 9608
PDF Version. BibRef

Herbst, K.[Klaus],
Pattern recognition by polynomial canonical regression,
PR(17), No. 3, 1984, pp. 345-350.
WWW Version. 0309 BibRef

Wharton, S.W.[Stephen W.],
An analysis of the effects of sample size on classification performance of a histogram based cluster analysis procedure,
PR(17), No. 2, 1984, pp. 239-244.
WWW Version. 0309 BibRef

Djouadi, A., Snorrason, O., and Garber, F.D.,
The Quality of Training-Sample Estimates of the Bhattacharyya Coefficient,
PAMI(12), No. 1, January 1990, pp. 92-97.
IEEE Abstract. IEEE Top Reference.
WWW Version. BibRef 9001

Hong, Z.Q.[Zi-Quan], Yang, J.Y.[Jing-Yu],
Optimal discriminant plane for a small number of samples and design method of classifier on the plane,
PR(24), No. 4, 1991, pp. 317-324.
WWW Version. 0401 BibRef

Rachkovskij, D.A., Kussul, E.M.,
Datagen: A Generator of Datasets for Evaluation of Classification Algorithms,
PRL(19), No. 7, May 1998, pp. 537-544. 9808 BibRef

Larsen, R.[Rasmus], Nielsen, A.A.[Allan Aasbjerg], Flesche, H.[Harald],
Sensitivity study of a semi-automatic training set generator,
PRL(21), No. 13-14, December 2000, pp. 1175-1182. 0011 BibRef
Earlier:
Sensitivity Study of a Semi-automatic Supervised Classifier Applied to Minerals from X-Ray Mapping Images,
SCIA99(Statistical Methods). BibRef

Larsen, R.[Rasmus], Hilger, K.B.[Klaus Baggesen],
Probabilistic Generative Modelling,
SCIA03(861-868).
WWW Version. 0310 BibRef

Hilger, K.B., Nielsen, A.A., Larsen, R.,
A Scheme for Initial Exploratory Data Analysis of Multivariate Image Data,
SCIA01(O-Tu4A). 0206 BibRef

Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.,
Analysis of new techniques to obtain quality training sets,
PRL(24), No. 7, April 2003, pp. 1015-1022.
WWW Version.
HTML Version. 0301 BibRef

Chen, D.M.[Dong-Mei], Stow, D.[Douglas],
The Effect of Training Strategies on Supervised Classification at Different Spatial Resolutions,
PhEngRS(68), No. 11, November 2002, pp. 1155-1162. Three different training strategies often used for supervised classification are compared for six image subsets containing a single land-use/land-cover component and at five different spatial resolutions.
WWW Version. 0304 BibRef

Beiden, S.V.[Sergey V.], Maloof, M.A.[Marcus A.], Wagner, R.F.[Robert F.],
A general model for finite-sample effects in training and testing of competing classifiers,
PAMI(25), No. 12, December 2003, pp. 1561-1569.
IEEE Abstract. IEEE Top Reference. 0401More than size of sample set. BibRef

Inoue, M.[Masashi], Ueda, N.[Naonori],
Exploitation of unlabeled sequences in hidden Markov models,
PAMI(25), No. 12, December 2003, pp. 1570-1581.
IEEE Abstract. IEEE Top Reference. 0401How to use unlabeled data in learning. BibRef

Sánchez, J.S.,
High training set size reduction by space partitioning and prototype abstraction,
PR(37), No. 7, July 2004, pp. 1561-1564.
WWW Version. 0405 BibRef

Wang, H.C.[Hai-Chuan], Zhang, L.M.[Li-Ming],
Linear generalization probe samples for face recognition,
PRL(25), No. 8, June 2004, pp. 829-840.
WWW Version. 0405Generate probe sets using constrained linear subspace of the original probes. BibRef

Prudêncio, R.B.C.[Ricardo B. C.], Ludermir, T.B.[Teresa B.], de Carvalho, F.A.T.[Francisco A. T.],
A Modal Symbolic Classifier for selecting time series models,
PRL(25), No. 8, June 2004, pp. 911-921.
WWW Version. 0405 BibRef

Kuo, B.C., Chang, K.Y.,
Feature Extractions for Small Sample Size Classification Problem,
GeoRS(45), No. 3, March 2007, pp. 756-764.
WWW Version. 0703 BibRef

Angiulli, F.[Fabrizio],
Condensed Nearest Neighbor Data Domain Description,
PAMI(29), No. 10, October 2007, pp. 1746-1758.
WWW Version. 0710Distinguish between normal and abnormal data to find the minimal subset of consistent data. BibRef

Farhangfar, A.[Alireza], Kurgan, L.A.[Lukasz A.], Dy, J.[Jennifer],
Impact of imputation of missing values on classification error for discrete data,
PR(41), No. 12, December 2008, pp. 3692-3705.
WWW Version. 0810Missing values; Classification; Imputation of missing values; Single imputation; Multiple imputations For databases. studies the effect of missing data imputation using five single imputation methods (a mean method, a Hot deck method, a Naive-Bayes method, and the latter two methods with a recently proposed imputation framework) and one multiple imputation method (a polytomous regression based method) on classification accuracy for six popular classifiers (RIPPER, C4.5, K-nearest-neighbor, support vector machine with polynomial and RBF kernels, and Naive-Bayes) on 15 datasets. BibRef


Wang, H.[Hai], Wang, S.H.[Shou-Hong],
Visualization of the Critical Patterns of Missing Values in Classification Data,
Visual07(267-274).
WWW Version. 0706 BibRef

Lapedriza, À.[Àgata], Masip, D.[David], Vitrià, J.[Jordi],
A Hierarchical Approach for Multi-task Logistic Regression,
IbPRIA07(II: 258-265).
WWW Version. 0706small number of samples for training. BibRef

Sugiyama, M.[Masashi], Blankertz, B.[Benjamin], Krauledat, M.[Matthias], Dornhege, G.[Guido], Müller, K.R.[Klaus-Robert],
Importance-Weighted Cross-Validation for Covariate Shift,
DAGM06(354-363).
WWW Version. 0610Training points distribution differs from test data. BibRef

Kim, S.W.[Sang-Woon],
On Using a Dissimilarity Representation Method to Solve the Small Sample Size Problem for Face Recognition,
ACIVS06(1174-1185).
WWW Version. 0609 BibRef

Ren, J.[Junling],
A Pattern Selection Algorithm Based on the Generalized Confidence,
ICPR06(II: 824-827).
WWW Version. 0609Selecting the patterns that matter in training. BibRef

Levi, D., Ullman, S.,
Learning to classify by ongoing feature selection,
CRV06(1-1).
WWW Version. 0607Continuous updating of the clustering based on new inputs. BibRef

Cazes, T.B., Feitosa, R.Q., Mota, G.L.A.,
Automatic Selection of Training Samples for Multitemporal Image Classification,
ICIAR04(II: 389-396).
WWW Version. 0409 BibRef

Yang, C.B.[Chang-Bo], Dong, M.[Ming], Fotouhi, F.[Farshad],
Learning the Semantics in Image Retrieval: A Natural Language Processing Approach,
MMDE04(137).
WWW Version. 0406 BibRef

Yang, C.B.[Chang-Bo], Dong, M.[Ming], Fotouhi, F.[Farshad],
Image Content Annotation Using Bayesian Framework and Complement Components Analysis,
ICIP05(I: 1193-1196).
WWW Version. 0512 BibRef

Vázquez, F.[Fernando], Salvador-Sánchez, J., Pla, F.[Filiberto],
A Stochastic Approach to Wilson's Editing Algorithm,
IbPRIA05(II:35).
WWW Version. 0509 See also Asymptotic properties of nearest neighbor rules using edited data. BibRef

Angelova, A.[Anelia], Abu-Mostafa, Y.[Yaser], Perona, P.[Pietro],
Pruning Training Sets for Learning of Object Categories,
CVPR05(I: 494-501).
WWW Version. 0507 BibRef

Franco, A., Maltoni, D., Nanni, L.,
Reward-punishment editing,
ICPR04(IV: 424-427).
WWW Version. 0409Editing: remove patterns that are not classified correctly. (in the training set). See also Asymptotic properties of nearest neighbor rules using edited data. BibRef

Kuhl, A., Kruger, L., Wohler, C., Kressel, U.,
Training of classifiers using virtual samples only,
ICPR04(III: 418-421).
WWW Version. 0409 BibRef

Juszczak, P., Duin, R.P.W.,
Selective sampling based on the variation in label assignments,
ICPR04(III: 375-378).
WWW Version. 0409 BibRef

Sprevak, D., Azuaje, F., Wang, H.,
A non-random data sampling method for classification model assessment,
ICPR04(III: 406-409).
WWW Version. 0409 BibRef

Levin, A., Viola, P.A., Freund, Y.,
Unsupervised improvement of visual detectors using co-training,
ICCV03(626-633).
WWW Version. 0311Train detectors with limited data, then use that to label more data. Use training of 2 classifiers at once. Apply to vehicle tracking. BibRef

Kim, D.S.[Dong Sik], Lee, K.[Kiryung],
Training sequence size in clustering algorithms and averaging single-particle images,
ICIP03(II: 435-438).
IEEE Abstract. IEEE Top Reference. 0312 BibRef

Franc, V.[Vojtech], Hlavác, V.[Václav],
Greedy Algorithm for a Training Set Reduction in the Kernel Methods,
CAIP03(426-433).
WWW Version. 0311 BibRef

Johnson, A.Y., Sun, J.[Jie], Bobick, A.F.,
Using similarity scores from a small gallery to estimate recognition performance for larger galleries,
AMFG03(100-103).
IEEE Abstract. IEEE Top Reference. 0311 BibRef

Paredes, R., Vidal, E., Keysers, D.,
An evaluation of the WPE algorithm using tangent distance,
ICPR02(IV: 48-51).
WWW Version. 0211Weighted Prototype Editing. BibRef

Veeramachaneni, S.[Sriharsha], Nagy, G.[George],
Classifier Adaptation with Non-representative Training Data,
DAS02(123 ff.).
HTML Version. 0303 BibRef

Maletti, G., Ersbøll, B.K., Conradsen, K., Lira, J.,
An Initial Training Set Generation Scheme,
SCIA01(P-W3B). 0206 BibRef

Fursov, V.A.,
Training in Pattern Recognition from a Small Number of Observations Using Projections Onto Null-space,
ICPR00(Vol II: 785-788).
WWW Version.
HTML Version. 0009 BibRef

Miyamoto, T., Mitani, Y., Hamamoto, Y.,
Use of Bootstrap Samples in Quadratic Classifier Design,
ICPR00(Vol II: 789-792).
WWW Version.
HTML Version. 0009 BibRef

Mayer, H.A.[Helmut A.], Huber, R.[Reinhold],
ERC: Evolutionary Resample and Combine for Adaptive Parallel Training Data Set Selection,
ICPR98(Vol I: 882-885).
WWW Version. 9808 BibRef

Takacs, B.[Barnabas], Sadovnik, L.[Lev], Wechsler, H.[Harry],
Optimal Training Set Design for 3D Object Recognition,
ICPR98(Vol I: 558-560).
WWW Version. 9808 BibRef

Nedeljkovic, V., Milosavljevic, M.,
On the influence of the training set data preprocessing on neural networks training,
ICPR92(II:33-36).
WWW Version. 9208 BibRef

Ferri, F.J., Vidal, E.,
Small sample size effects in the use of editing techniques,
ICPR92(II:607-610).
WWW Version. 9208 BibRef

Chapter on Pattern Recognition, Clustering, Statistics, Grammars, Learning, Neural Nets, Genetic Algorithms continues in
Sample Sizes Issues, Data analysis, Training Sets .


Last update:Sep 2, 2008 at 17:29:35