NbClust
package provides 30 indices for determining the number of clusters and proposes to user the best clustering scheme from the different results obtained by varying all combinations of number of clusters, distance measures, and clustering methods.
NbClust(data = NULL, diss = NULL, distance = "euclidean", min.nc = 2, max.nc = 15,
method = NULL, index = "all", alphaBeale = 0.1)
diss=NULL
, but if it is replaced by a dissimilarity matrix, distance should be "NULL".
Index in NbClust |
Optimal number of clusters |
1. "kl" or "all" or "alllong" |
Maximum value of the index |
(Krzanowski and Lai 1988) |
2. "ch" or "all" or "alllong" |
Maximum value of the index |
(Calinski and Harabasz 1974) |
3. "hartigan" or "all" or "alllong" |
Maximum difference between |
(Hartigan 1975) |
hierarchy levels of the index |
4. "ccc" or "all" or "alllong" |
Maximum value of the index |
(Sarle 1983) |
5. "scott" or "all" or "alllong" |
Maximum difference between |
(Scott and Symons 1971) |
hierarchy levels of the index |
6. "marriot" or "all" or "alllong" |
Max. value of second differences |
(Marriot 1971) |
between levels of the index |
7. "trcovw" or "all" or "alllong" |
Maximum difference between |
(Milligan and Cooper 1985) |
hierarchy levels of the index |
8. "tracew" or "all" or "alllong" |
Maximum value of absolute second |
(Milligan and Cooper 1985) |
differences between levels of the index |
9. "friedman" or "all" or "alllong" |
Maximum difference between |
(Friedman and Rubin 1967) |
hierarchy levels of the index |
10. "rubin" or "all" or "alllong" |
Minimum value of second differences |
(Friedman and Rubin 1967) |
between levels of the index |
11. "cindex" or "all" or "alllong" |
Minimum value of the index |
(Hubert and Levin 1976) |
12. "db" or "all" or "alllong" |
Minimum value of the index |
(Davies and Bouldin 1979) |
13. "silhouette" or "all" or "alllong" |
Maximum value of the index |
(Rousseeuw 1987) |
14. "duda" or "all" or "alllong" |
Smallest $n_{c}$ such that index > criticalValue |
(Duda and Hart 1973) |
15. "pseudot2" or "all" or "alllong" |
Smallest $n_{c}$ such that index < criticalValue |
(Duda and Hart 1973) |
16. "beale" or "all" or "alllong" |
$n_{c}$ such that critical value of the index >= alpha |
(Beale 1969) |
17. "ratkowsky" or "all" or "alllong" |
Maximum value of the index |
(Ratkowsky and Lance 1978) |
18. "ball" or "all" or "alllong" |
Maximum difference between hierarchy |
(Ball and Hall 1965) |
levels of the index |
19. "ptbiserial" or "all" or "alllong" |
Maximum value of the index |
(Milligan 1980, 1981) |
20. "gap" or "alllong" |
Smallest $n_{c}$ such that criticalValue >= 0 |
(Tibshirani et al. 2001) |
21. "frey" or "all" or "alllong" |
the cluster level before that index value < 1.00 |
(Frey and Van Groenewoud 1972) |
22. "mcclain" or "all" or "alllong" |
Minimum value of the index |
(McClain and Rao 1975) |
23. "gamma" or "alllong" |
Maximum value of the index |
(Baker and Hubert 1975) |
24. "gplus" or "alllong" |
Minimum value of the index |
(Rohlf 1974) (Milligan 1981) |
25. "tau" or "alllong" |
Maximum value of the index |
(Rohlf 1974) (Milligan 1981) |
26. "dunn" or "all" or "alllong" |
Maximum value of the index |
(Dunn 1974) |
27. "hubert" or "all" or "alllong" |
Graphical method |
(Hubert and Arabie 1985) |
28. "sdindex" or "all" or "alllong" |
Minimum value of the index |
(Halkidi et al. 2000) |
29. "dindex" or "all" or "alllong" |
Graphical method |
(Lebart et al. 2000) |
30. "sdbw" or "all" or "alllong" |
Minimum value of the index |
(Halkidi and Vazirgiannis 2001) |
## DATA MATRIX IS GIVEN ## A 2-dimensional example set.seed(1) x<-rbind(matrix(rnorm(100,sd=0.1),ncol=2), matrix(rnorm(100,mean=1,sd=0.2),ncol=2), matrix(rnorm(100,mean=5,sd=0.1),ncol=2), matrix(rnorm(100,mean=7,sd=0.2),ncol=2)) res<-NbClust(x, distance = "euclidean", min.nc=2, max.nc=8, method = "complete", index = "ch") res$All.index res$Best.nc res$Best.partition ## A 5-dimensional example set.seed(1) x<-rbind(matrix(rnorm(150,sd=0.3),ncol=5), matrix(rnorm(150,mean=3,sd=0.2),ncol=5), matrix(rnorm(150,mean=1,sd=0.1),ncol=5), matrix(rnorm(150,mean=6,sd=0.3),ncol=5), matrix(rnorm(150,mean=9,sd=0.3),ncol=5)) res<-NbClust(x, distance = "euclidean", min.nc=2, max.nc=10, method = "ward.D", index = "all") res$All.index res$Best.nc res$All.CriticalValues res$Best.partition ## A real data example data<-iris[,-c(5)] res<-NbClust(data, diss=NULL, distance = "euclidean", min.nc=2, max.nc=6, method = "ward.D2", index = "kl") res$All.index res$Best.nc res$Best.partition res<-NbClust(data, diss=NULL, distance = "euclidean", min.nc=2, max.nc=6, method = "kmeans", index = "hubert") res$All.index res<-NbClust(data, diss=NULL, distance = "manhattan", min.nc=2, max.nc=6, method = "complete", index = "all") res$All.index res$Best.nc res$All.CriticalValues res$Best.partition ## Examples with a dissimilarity matrix ## Data matrix is given set.seed(1) x<-rbind(matrix(rnorm(150,sd=0.3),ncol=3), matrix(rnorm(150,mean=3,sd=0.2),ncol=3), matrix(rnorm(150,mean=5,sd=0.3),ncol=3)) diss_matrix<- dist(x, method = "euclidean", diag=FALSE) res<-NbClust(x, diss=diss_matrix, distance = NULL, min.nc=2, max.nc=6, method = "ward.D", index = "ch") res$All.index res$Best.nc res$Best.partition data<-iris[,-c(5)] diss_matrix<- dist(data, method = "euclidean", diag=FALSE) NbClust(data, diss=diss_matrix, distance = NULL, min.nc=2, max.nc=6, method = "ward.D2", index = "all") res$All.index res$Best.nc res$All.CriticalValues res$Best.partition set.seed(1) x<-rbind(matrix(rnorm(20,sd=0.1),ncol=2), matrix(rnorm(20,mean=1,sd=0.2),ncol=2), matrix(rnorm(20,mean=5,sd=0.1),ncol=2), matrix(rnorm(20,mean=7,sd=0.2),ncol=2)) diss_matrix<- dist(x, method = "euclidean", diag=FALSE) res<-NbClust(x, diss=diss_matrix, distance = NULL, min.nc=2, max.nc=6, method = "ward.D2", index = "alllong") res$All.index res$Best.nc res$All.CriticalValues res$Best.partition ## Data matrix is not available. Only the dissimilarity matrix is given ## In this case, only these indices can be computed : frey, mcclain, cindex, silhouette and dunn res<-NbClust(diss=diss_matrix, distance = NULL, min.nc=2, max.nc=6, method = "ward.D2", index = "silhouette") res$All.index res$Best.nc res$All.CriticalValues res$Best.partition