Discriminant, Factor and Cluster Analysis - Tài liệu, ebook, giáo trình

Determining linear combinations of the predictor variables to separate groups by measuring between-group variation relative to within-group variation

Developing procedures for assigning new objects, firms, or individuals, whose profiles, but not group identity are known, to one of the two groups

Testing whether significant differences exist between the two groups based on the group centroids

Determining which variables count most in explaining inter-group differences

48 trang | Chia sẻ: tieuaka001 | Lượt xem: 872 | Lượt tải: 0Free

Bạn đang xem trước 20 trang nội dung tài liệu Discriminant, Factor and Cluster Analysis, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Marketing ResearchAaker, Kumar, Leone and Day Twelfth EditionInstructor’s Presentation Slides1Chapter Twenty2Discriminant, Factor and Cluster AnalysisMarketing Research 12th Edition /Discriminant AnalysisUsed to classify individuals into one of two or more alternative groups on the basis of a set of measurementsUsed to identify variables that discriminate between naturally occurring groups 3PredictionDescriptionMajor UsesMarketing Research 12th Edition /Objectives of Discriminant AnalysisDetermining linear combinations of the predictor variables to separate groups by measuring between-group variation relative to within-group variationDeveloping procedures for assigning new objects, firms, or individuals, whose profiles, but not group identity are known, to one of the two groupsTesting whether significant differences exist between the two groups based on the group centroidsDetermining which variables count most in explaining inter-group differences4Marketing Research 12th Edition /If we can assume that two populations have the same variance, then theusual value of C iswhere X1 and XII are the mean values for the two groups, respectively.Basic Concept5Distribution of two populationsMarketing Research 12th Edition /Discriminant FunctionWhere Z = discriminant score b = discriminant weights X = predictor (independent) variables 6Zi = b1 X1 + b2 X2 + b3 X3 + ... + bn XnIn a particular group, each individual has a discriminant score (zi)Σ zi = centroid (group mean); where i = individualIndicates most typical location of an individual from a particular groupMarketing Research 12th Edition /Discriminant Function – A Graphical Illustration7Marketing Research 12th Edition /Cut-off ScoreCriterion against which each individual’s discriminant score is judged to determine into which group the individual should be classified8For equal group sizes For unequal group sizesMarketing Research 12th Edition /Determination of SignificanceNull Hypothesis: In the population, the group means the discriminant function are equalHo : μA = μB Generally, predictors with relatively large standardized coefficients contribute more to the discriminating power of the functionCanonical or discriminant loadings show the variance that the predictor shares with the function9Marketing Research 12th Edition /Classification and ValidationHoldout MethodUses part of sample to construct classification rule; other subsample used for validationUses classification matrix and hit ratio to evaluate groups classificationUses discriminant weights to generate discriminant scores for cases in subsample10Marketing Research 12th Edition /Classification and Validation (Contd.)U - method or Cross ValidationUses all available data without serious bias in estimating error ratesEstimated classification error rates P1 = m1/ n1 P2 = m2 / n2 where m1 and m2 = number of sample observations mis-classified in groups G1 and G2 11Marketing Research 12th Edition /Steps in Discriminant Analysis12Form groups2. Estimate discriminant function3. Determine significance of function and variables4. Interpret the discriminant function5. Perform classification and validationMarketing Research 12th Edition /Multiple Discriminant AnalysisNumber of possible discriminant functions = Min (p, m-1) Where M = number of groups P = number of predictor variables13Assumptions Underlying the Discriminant FunctionThe p independent variables must have a multivariate normal distribution2. The p x p variance–covariance matrix of the independent variables in each of the two groups must be the sameMarketing Research 12th Edition /Multiple Discriminant Analysis14Marketing Research 12th Edition /Multiple Discriminant Analysis15Marketing Research 12th Edition /Multiple Discriminant Analysis16Marketing Research 12th Edition /Factor AnalysisCombines questions or variables to create new factorsCombines objects to create new groupsUses in Data AnalysisTo identify underlying constructs in the data from the groupings of variables that emergeTo reduce the number of variables to a more manageable set17Marketing Research 12th Edition /Factor Analysis (Contd.)MethodologyPrincipal Component Analysis Summarizes information in a larger set of variables to a smaller set of factorsCommon Factor AnalysisUncovers underlying dimensions surrounding the original variables18Marketing Research 12th Edition /Factor Analysis - Example19Marketing Research 12th Edition /Principal Component AnalysisSince the objective of factor analysis is to represent each of the variables as a linear combination of a smaller set of factors, it is expressed as20WhereXx1 through x5 represent the standardized scoresF1 through F5 are the standardized factor scores,I11, In1,....In2 are factor loadingse1–e5 are error variancesMarketing Research 12th Edition /FactorsFactorA variable or construct that is not directly observable but needs to be inferred from the input variablesAll included factors (prior to rotation) must explain at least as much variance as an “average variable”Eigenvalue CriteriaRepresents the amount of variance in the original variables that is associated with a factorSum of the square of the factor loadings of each variable on a factor represents the eigenvalueOnly factors with eigenvalues greater than 1.0 are retained21Marketing Research 12th Edition /How Many Factors - CriteriaScree Plot CriteriaA plot of the eigenvalues against the number of factors, in order of extraction. The shape of the plot determines the number of factors22Marketing Research 12th Edition /How Many Factors: Criteria (Contd.)Percentage of Variance CriteriaThe number of factors extracted is determined so that the cumulative percentage of variance extracted by the factors reaches a satisfactory levelSignificance Test CriteriaStatistical significance of the separate eigenvalues is determined, and only those factors that are statistically significant are retained23Marketing Research 12th Edition /Common TermsFactor ScoresValues of each factor underlying the variablesFactor LoadingsCorrelations between the factors and the original variablesCommunalityThe amount of the variable variance that is explained by the factor 24Marketing Research 12th Edition /Factor Rotations25 Solutions generated by factor analysis for a data set.Marketing Research 12th Edition /Factor Rotations (Contd.)Varimax (orthogonal) rotationEach factor tends to load high (1 or 1) on a smaller number of variables and low, or very low (close to zero), on other variables, to make interpretation of the resulting factors easier. The variance explained by each unrotated factor is simply rearranged by the rotation, while the total variance explained by the rotated factors still remains the same. The first rotated factor will no longer necessarily account for the maximum variance and the amount of variance each factor accounts for has to be recalculated.Promax (oblique) rotationThe factors are rotated for better interpretation, such that the orthogonality is not preserved anymore.26Marketing Research 12th Edition /Common Factor AnalysisThe factor extraction procedure is similar to that of principal component analysis except for the input correlation matrixCommunalities or shared variance is inserted in the diagonal instead of unities in the original variable correlation matrixThe total amount of variance that can be explained by all the factors in common factor analysis is the sum of the diagonal elements in the correlation matrixThe output of common factor analysis depends on the amount of shared variance27Marketing Research 12th Edition /Common Factor Analysis – Results (Contd.)28Marketing Research 12th Edition /Common Factor Analysis - Results29Marketing Research 12th Edition /Common Factor Analysis – Results (Contd.)30Marketing Research 12th Edition /Cluster AnalysisTechnique for grouping individuals or objects into unknown groups. The typical criterion used in cluster analysis is distance between clusters or the error sum of squares. The input is any valid measure of similarity between objects, such as:CorrelationsDistance measures (Euclidean distance)Association coefficientsThe number of clusters or the level of clustering31Marketing Research 12th Edition /Steps in Cluster AnalysisDefine the problemDecide on the appropriate similarity measureDecide on how to group the objectsDecide the number of clustersInterpret, describe, and validate the clusters32Marketing Research 12th Edition /Cluster Analysis (Contd.)Hierarchical ClusteringCan start with all objects in one cluster and divide and subdivide them until all objects are in their own single-object cluster ( ‘top-down’ or decision approach)Can start with each object in its own single-object cluster and systematically combine clusters until all objects are in one cluster (‘bottom-up’ or agglomerative approach)Non-hierarchical ClusteringPermits objects to leave one cluster and join another as clusters are being formedA cluster center is initially selected and all the objects within a pre-specified threshold distance are included in that cluster33Marketing Research 12th Edition /Hierarchical ClusteringSingle LinkageClustering criterion based on the shortest distanceComplete LinkageClustering criterion based on the longest distance34Marketing Research 12th Edition /Hierarchical Clustering (Contd.)Average LinkageClustering criterion based on the average distance Ward's MethodBased on the loss of information resulting from grouping of the objects into clusters (minimize within cluster variation)35Marketing Research 12th Edition /Hierarchical Clustering (Contd.)Centroid MethodBased on the distance between the group centroids (the point whose coordinates are the means of all the observations in the cluster)36Marketing Research 12th Edition /Hierarchical Cluster Analysis - Example37Marketing Research 12th Edition /Hierarchical Cluster Analysis (Contd.)38A dendrogram for hierarchical clustering of bank dataMarketing Research 12th Edition /Hierarchical Cluster Analysis (Contd.)39Marketing Research 12th Edition /Non-hierarchical ClusteringSequential Threshold Cluster center is selected and all objects within a pre-specified threshold value are groupedParallel ThresholdSeveral cluster centers are selected and objects within threshold level are assigned to the nearest centerOptimizingObjects can be later reassigned to clusters on the basis of optimizing some overall criterion measure40Marketing Research 12th Edition /Nonhierarchical Cluster Analysis - Example41Marketing Research 12th Edition /Nonhierarchical Cluster Analysis – Example (Contd.)42Marketing Research 12th Edition /Nonhierarchical Cluster Analysis – Example (Contd.)43Marketing Research 12th Edition /Nonhierarchical Cluster Analysis – Example (Contd.)44Marketing Research 12th Edition /Criteria for Determining the Number of ClustersNumber of clusters is specified by the analyst for theoretical or practical reasons.Level of clustering with respect to clustering criterion is specified.Determine the number of clusters from the pattern of clusters generated. The distances between clusters or error variability measure at successive steps can be used to decide the number of clusters (from the plot of error sum of squares with the number of clusters).The ratio of total within-group variance to between group variance is plotted against the number of clusters and the point at which an elbow occurs indicates the number of clusters. 45Marketing Research 12th Edition /Methods to Validate a Cluster Analysis Solution Apply two or more different clustering approaches to same data or use different distance measures and compare the results.Split the data randomly into two halves and perform clustering on each half and then examine the average profile values of each cluster across sub samples.Delete various columns (variables) from the original data, compute dissimilarity measures across remaining variables and compare these results with the results obtained using full set.Using simulation procedures create a data set with the properties matching the overall properties of the original data but containing no clusters. Use the same clustering method on both original and the artificial data and compare the results.46Marketing Research 12th Edition /Assumptions and Limitations of Cluster AnalysisAssumptionsThe basic measure of similarity on which the clustering is based is a valid measure of the similarity between the objects.There is theoretical justification for structuring the objects into clustersLimitationsIt is difficult to evaluate the quality of the clusteringIt is difficult to know exactly which clusters are very similar and which objects are difficult to assign.It is difficult to select a clustering criterion and program on any basis other than availability.47Marketing Research 12th Edition /48End of Chapter TwentyMarketing Research 12th Edition /

Các file đính kèm theo tài liệu này:

ch20_7133.pptx