Eleven evaluating parameters for rice core collection were assessed based on genotypic values and molecular marke' information. Monte Carlo simulation combined with mixed linear model was used to eliminate the interference from environment in order to draw more reliable results. The coincidence rate of range (CR) was the optimal parameter. Mean Simpson index (MD), mean Shannon-Weaver index of genetic diversity (M1) and mean polymorphism information content (MPIC) were important evaluating parameters. The variable rate of coefficient of variation (VR) could act as an important reference parameter for evaluating the variation degree of core collection. Percentage of polymorphic loci (p) could be used as a determination parameter for the size of core collection. Mean difference percentage (MD) was a determination parameter for the reliability judgment of core collection. The effective evaluating parameters for core collection selected in the research could be used as criteria for sampling percentage in different plant germplasm populations.
One hundred and sixty-eight genotypes of cotton from the same growing region were used as a germplasm group to study the validity of different genetic distances in constructing cotton core subset. Mixed linear model approach was employed to unbiasedly predict genotypic values of 20 traits for eliminating the environmental effect. Six commonly used genetic distances(Euclidean,standardized Euclidean,Mahalanobis,city block,cosine and correlation distances) combining four commonly used hierarchical cluster methods(single distance,complete distance,unweighted pair-group average and Ward's methods) were used in the least distance stepwise sampling(LDSS) method for constructing different core subsets. The analyses of variance(ANOVA) of different evaluating parameters showed that the validities of cosine and correlation distances were inferior to those of Euclidean,standardized Euclidean,Mahalanobis and city block distances. Standardized Euclidean distance was slightly more effective than Euclidean,Mahalanobis and city block distances. The principal analysis validated standardized Euclidean distance in the course of constructing practical core subsets. The covariance matrix of accessions might be ill-conditioned when Mahalanobis distance was used to calculate genetic distance at low sampling percentages,which led to bias in small-sized core subset construction. The standardized Euclidean distance is recommended in core subset construction with LDSS method.
In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker InformaUon. A mixed linear model approach was used to predict genotyplc values for eliminating the environment effect. The "mixed genetic distance" was designed to solve the difficult problem of combining continuous and discrete data to construct a core subset by cluster. Four commonly used genetic distances for continuous data (Euclidean distance, standardized Euclidean distance, city block distance, and Mahalanobls distance) were used to assess the validity of the conUnuous data part of the mixed genetic distance; three commonly used genetic distances for discrete data (cosine distance, correlaUon distance, and Jaccard distance) were used to assess the validity of the discrete data part of the mixed genetic distance, A rice germplasm group with eight quantitative traits and information for 60 molecular markers was used to evaluate the validity of the new strategy. The results suggest that the validity of both parts of the mixed geneUc distance are equal to or higher than the common geneUc distance. The core subset constructed on the basis of a combination of data for genotyplc values and molecular marker information was more representative than that constructed on the basis of data from genotypic values or molecular marker informaUon alone. Moreover, the strategy of using combined data was able to treat dominant marker informaUon and could combine any other continuous data and discrete data together to perform cluster to construct a plant core subset.
Jian-Cheng Wang Jin Hu Ning-Ning Liu Hai-Ming Xu Sheng Zhang