Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Bisecting k-means¾ÛÀàË㷨ʵÏÖ
 

À´Ô´£º¼òµ¥Ö®ÃÀ ·¢²¼ÓÚ£º2016-6-6

  5576  次浏览      29
 

Bisecting k-means¾ÛÀàËã·¨£¬¼´¶þ·Ök¾ùÖµËã·¨£¬ËüÊÇk-means¾ÛÀàËã·¨µÄÒ»¸ö±äÌ壬Ö÷ÒªÊÇΪÁ˸Ľøk-meansËã·¨Ëæ»úÑ¡Ôñ³õʼÖÊÐĵÄËæ»úÐÔÔì³É¾ÛÀà½á¹û²»È·¶¨ÐÔµÄÎÊÌ⣬¶øBisecting k-meansËã·¨ÊÜËæ»úÑ¡Ôñ³õʼÖÊÐĵÄÓ°Ïì±È½ÏС¡£

Ê×ÏÈ£¬ÎÒÃÇ¿¼ÂÇÔÚÅ·¼¸ÀïµÂ¿Õ¼äÖУ¬ºâÁ¿´ØµÄÖÊÁ¿Í¨³£Ê¹ÓÃÈç϶ÈÁ¿£ºÎó²îƽ·½ºÍ£¨Sum of the Squared Error£¬¼ò³ÆSSE£©£¬Ò²¾ÍÊÇÒª¼ÆËãÖ´ÐоÛÀà·ÖÎöºó£¬¶Ôÿ¸öµã¶¼Òª¼ÆËãÒ»¸öÎó²îÖµ£¬¼´·ÇÖÊÐĵ㵽×î½üµÄÖÊÐĵľàÀë¡£ÄÇô£¬¼ÈȻÿ¸ö·ÇÖÊÐĵ㶼ÒѾ­ÊôÓÚij¸ö´Ø£¬Ò²¾ÍÊÇÒª¼ÆËãÿ¸ö·ÇÖÊÐĵ㵽ÆäËùÔڴصÄÖÊÐĵľàÀ룬×îºó½«ÕâЩ¾àÀëÖµÏà¼ÓÇóºÍ£¬×÷ΪSSEÈ¥ÆÀ¹ÀÒ»¸ö¾ÛÀàµÄÖÊÁ¿ÈçºÎ¡£ÎÒÃǵÄ×îÖÕÄ¿±êÊÇ£¬Ê¹µÃ×îÖÕµÄSSEÄܹ»×îС£¬Ò²¾ÍÊÇÒ»¸ö×îС»¯Ä¿±êSSEµÄÎÊÌâ¡£ÔÚnάŷ¼¸ÀïµÂ¿Õ¼ä£¬SSEÐÎʽ»¯µØ¶¨Ò壬¼ÆË㹫ʽÈçÏ£º

Bisecting k-means¾ÛÀàËã·¨µÄ»ù±¾Ë¼ÏëÊÇ£¬Í¨¹ýÒýÈë¾Ö²¿¶þ·ÖÊÔÑ飬ÿ´ÎÊÔÑ鶼ͨ¹ý¶þ·Ö¾ßÓÐ×î´óSSEÖµµÄÒ»¸ö´Ø£¬¶þ·ÖÕâ¸ö´ØÒÔºóµÃµ½µÄ2¸ö×Ó´Ø£¬Ñ¡Ôñ2¸ö×ӴصÄ×ÜSSE×îСµÄ»®·Ö·½·¨£¬ÕâÑùÄܹ»±£Ö¤Ã¿´Î¶þ·ÖµÃµ½µÄ2¸ö´ØÊDZȽÏÓŵģ¨Ò²¿ÉÄÜÊÇ×îÓŵģ©£¬Ò²¾ÍÊÇÕâ2¸ö´ØµÄ»®·Ö¿ÉÄÜÊǾֲ¿×îÓŵģ¬È¡¾öÓÚÊÔÑéµÄ´ÎÊý¡£

Bisecting k-means¾ÛÀàËã·¨µÄ¾ßÌåÖ´Ðйý³Ì£¬ÃèÊöÈçÏÂËùʾ£º

1¡¢³õʼʱ£¬½«´ý¾ÛÀàÊý¾Ý¼¯D×÷Ϊһ¸ö´ØC0£¬¼´C={C0}£¬ÊäÈë²ÎÊýΪ£º¶þ·ÖÊÔÑé´ÎÊým¡¢k-means¾ÛÀàµÄ»ù±¾²ÎÊý£»

2¡¢È¡CÖоßÓÐ×î´óSSEµÄ´ØCp£¬½øÐжþ·ÖÊÔÑém´Î£ºµ÷ÓÃk-means¾ÛÀàËã·¨£¬È¡k=2£¬½«Cp·ÖΪ2¸ö´Ø£ºCi1¡¢Ci2£¬Ò»¹²µÃµ½m¸ö¶þ·Ö½á¹û¼¯ºÏB={B1,B2,¡­,Bm}£¬ÆäÖУ¬Bi={Ci1,Ci2}£¬ÕâÀïCi1ºÍCi2Ϊÿһ´Î¶þ·ÖÊÔÑéµÃµ½µÄ2¸ö´Ø£»

3¡¢¼ÆËãÉÏÒ»²½¶þ·Ö½á¹û¼¯ºÏBÖУ¬Ã¿Ò»¸ö»®·Ö·½·¨µÃµ½µÄ2¸ö´ØµÄ×ÜSSEÖµ£¬Ñ¡Ôñ¾ßÓÐ×îС×ÜSSEµÄ¶þ·Ö·½·¨µÃµ½µÄ½á¹û£ºBj={Cj1,Cj2}£¬²¢½«´ØCj1¡¢Cj2¼ÓÈëµ½¼¯ºÏC£¬²¢½«Cp´ÓCÖÐÒÆ³ý£»

4¡¢Öظ´²½Öè2ºÍ3£¬Ö±µ½µÃµ½k¸ö´Ø£¬¼´¼¯ºÏCÖÐÓÐk¸ö´Ø¡£

¾ÛÀàË㷨ʵÏÖ

»ùÓÚÉÏÃæÃèÊöµÄ¾ÛÀàÖ´Ðйý³Ì£¬Ê¹ÓÃJavaʵÏÖBisecting k-means¾ÛÀ࣬´úÂëÈçÏÂËùʾ£º

@Override
public void clustering() {
// parse sample files
final List<Point2D> allPoints = Lists.newArrayList();
FileUtils.read2DPointsFromFiles(allPoints, "[\t,;\\s]+", inputFiles); // ´ÓÎļþÖжÁÈ¡¶þÎ¬×ø±êµã£¬¼ÓÈëµ½¼¯ºÏallPointsÖÐ

final int bisectingK = 2;
int bisectingIterations = 0;
int maxInterations = 20;
List<Point2D> points = allPoints;
final Map<CenterPoint, Set<ClusterPoint<Point2D>>> clusteringPoints = Maps.newConcurrentMap(); // ×îÖյľÛÀà½á¹û¼¯ºÏ
while(clusteringPoints.size() <= k) { // µ±µÃµ½k¸ö´Ø£¬ÔòËã·¨ÖÕÖ¹
LOG.info("Start bisecting iterations: #" + (++bisectingIterations) + ", bisectingK=" + bisectingK + ",maxMovingPointRate=" + maxMovingPointRate +
", maxInterations=" + maxInterations + ", parallism=" + parallism);

// for k=bisectingK, execute k-means clustering

// bisecting trials
KMeansClustering bestBisectingKmeans = null;
double minTotalSSE = Double.MAX_VALUE;
for (int i = 0; i < m; i++) { // Ö´Ðжþ·ÖÊÔÑ飺µ÷ÓÃk-means¾ÛÀàËã·¨£¬½«ÊäÈëµÄµã¼¯½øÐжþ·Ö£¬µÃµ½2¸ö´Ø£¬ÊÔÑéÖ´ÐÐm´Î
final KMeansClustering kmeans = new KMeansClustering(bisectingK, maxMovingPointRate, maxInterations, parallism);
kmeans.initialize(points);
// the clustering result should have 2 clusters
kmeans.clustering();
double currentTotalSSE = computeTotalSSE(kmeans.getCenterPointSet(), kmeans.getClusteringResult()); // ¼ÆËãÒ»´Î¶þ·ÖÊÔÑéÖÐ×ܵÄSSEµÄÖµ
if(bestBisectingKmeans == null) {
bestBisectingKmeans = kmeans;
minTotalSSE = currentTotalSSE;
} else {
if(currentTotalSSE < minTotalSSE) { // ¼Ç¼×ÜSSE×îСµÄ¶þ·Ö¾ÛÀ࣬ͨ¹ýkmeans±£´æ¶þ·Ö½á¹û
bestBisectingKmeans = kmeans;
minTotalSSE = currentTotalSSE;
}
}
LOG.info("Bisecting trial <<" + i + ">> : minTotalSSE=" + minTotalSSE + ", currentTotalSSE=" + currentTotalSSE);
}
LOG.info("Best biscting: minTotalSSE=" + minTotalSSE);

// merge cluster points for choosing cluster bisected again
int id = generateNewClusterId(clusteringPoints.keySet()); // ÿ´ÎÖ´ÐÐk-means¾ÛÀ࣬¶¼¶àÁËÒ»¸ö´Ø£¬Îª¶à³öµÄÕâ¸ö´Ø·ÖÅäÒ»¸ö±àºÅ
Set<CenterPoint> bisectedCentroids = bestBisectingKmeans.getCenterPointSet(); // ¶þ·ÖµÃµ½µÄ2¸ö´ØµÄÖÊÐĵļ¯ºÏ
merge(clusteringPoints, id, bisectedCentroids, bestBisectingKmeans.getClusteringResult().getClusteredPoints()); // ½«¶þ·ÖµÃµ½µÄ2¸ö´ØµÄ¼¯ºÏ£¬ºÏ²¢¼ÓÈëµ½×îÖÕ½á¹ûµÄ¼¯ºÏÖÐ

if(clusteringPoints.size() == k) { // ÒѾ­µÃµ½k¸ö´Ø£¬Ëã·¨ÖÕÖ¹
break;
}

// compute cluster to be bisected
ClusterInfo cluster = chooseClusterToBisect(clusteringPoints);
// remove centroid from collected clusters map
clusteringPoints.remove(cluster.centroidToBisect);
LOG.info("Cluster to be bisected: " + cluster);

points = Lists.newArrayList();
for(ClusterPoint<Point2D> cp : cluster.clusterPointsToBisect) {
points.add(cp.getPoint());
}

LOG.info("Finish bisecting iterations: #" + bisectingIterations + ", clusterSize=" + clusteringPoints.size());
}

// finally transform to result format
Iterator<Entry<CenterPoint, Set<ClusterPoint<Point2D>>>> iter = clusteringPoints.entrySet().iterator();
while(iter.hasNext()) { // ¹¹Ôì×îÖÕÊä³ö½á¹ûµÄÊý¾Ý½á¹¹
Entry<CenterPoint, Set<ClusterPoint<Point2D>>> entry = iter.next();
clusteredPoints.put(entry.getKey().getId(), entry.getValue());
centroidSet.add(entry.getKey());
}
}

ÉÏÃæ£¬ÎÒÃǵ÷ÓÃchooseClusterToBisect·½·¨ÇøÊµÏÖ¶Ô¾ßÓÐ×î´óµÄSSEµÄ´Ø½øÐжþ·Ö£¬¾ßÌåÑ¡ÔñµÄʵÏÖ¹ý³Ì£¬´úÂëÈçÏÂËùʾ£º

private ClusterInfo chooseClusterToBisect(Map<CenterPoint,
 Set<ClusterPoint<Point2D>>> clusteringPoints) {
double maxSSE = 0.0;
int clusterIdWithMaxSSE = -1;
CenterPoint centroidToBisect = null;
Set<ClusterPoint<Point2D>> clusterToBisect = null;
Iterator<Entry<CenterPoint, Set<ClusterPoint<Point2D>>>> iter = clusteringPoints.entrySet().iterator();
while(iter.hasNext()) {
Entry<CenterPoint, Set<ClusterPoint<Point2D>>> entry = iter.next();
CenterPoint centroid = entry.getKey();
Set<ClusterPoint<Point2D>> cpSet = entry.getValue();
double sse = computeSSE(centroid, cpSet); // ¼ÆËãÒ»¸ö´ØµÄSSEÖµ
if(sse > maxSSE) {
maxSSE = sse;
clusterIdWithMaxSSE = centroid.getId();
centroidToBisect = centroid;
clusterToBisect = cpSet;
}
}
return new ClusterInfo(clusterIdWithMaxSSE, centroidToBisect, clusterToBisect, maxSSE); // ½«´ý·ÖÁѵĴصÄÐÅÏ¢±£´æÔÚClusterInfo¶ÔÏóÖÐ
}

private double computeSSE(CenterPoint centroid, Set<ClusterPoint<Point2D>> cpSet) {

// ¼ÆËãij¸ö´ØµÄSSE
double sse = 0.0;
for(ClusterPoint<Point2D> cp : cpSet) {
// update cluster id for ClusterPoint object
cp.setClusterId(centroid.getId());
double distance = MetricUtils.euclideanDistance(cp.getPoint(), centroid);
sse += distance * distance;
}
return sse;
}

ÔÚ¶þ·ÖÊÔÑé¹ý³ÌÖУ¬ÒòΪÿ´Î¶þ·Ö¶¼Éú³É2¸öеĴأ¬Í¨¹ý¼ÆËãÕâ2¸ö´ØµÄ×ÜSSEµÄÖµ£¬Í¨¹ýµü´ú¼ÆË㣬ÕÒµ½Ò»¸ö¾Ö²¿×îСµÄ×ÜSSE¶ÔÓ¦µÄ2¸ö´ØµÄ»®·Ö·½·¨£¬È»ºó½«¾ÛÀàÉú³ÉµÄ2´Ø¼ÓÈëµ½×îÖյĴؼ¯ºÏÖУ¬×ÜSSEµÄ¼ÆËã·¨·½·¨ÔÚcomputeTotalSSE·½·¨ÖУ¬ÈçÏÂËùʾ£º

¾ÛÀàЧ¹û¶Ô±È

ÏÂÃæ£¬ÎÒÃǶԱÈһϣ¬k-meansËã·¨ÓëBisecting k-means¶à´ÎÔËÐеľÛÀà½á¹û¶Ô±È£¬ÈçÏÂͼËùʾ£º

private double computeTotalSSE(Set<CenterPoint> centroids, ClusteringResult<Point2D> clusteringResult) {
double sse = 0.0;
for(CenterPoint center : centroids) { // ¼ÆËã2¸ö´ØµÄ×ÜSSEÖµ
int clusterId = center.getId();
for(ClusterPoint<Point2D> p : clusteringResult.getClusteredPoints().get(clusterId)) {
double distance = MetricUtils.euclideanDistance(p.getPoint(), center);
sse += distance * distance;
}
}
return sse;
}

ÉÏͼÖУ¬µÚÒ»ÅÅÊÇÖ´ÐÐk-means¾ÛÀàµÃµ½µÄЧ¹ûͼ£¬µÚ¶þÅÅÊÇÖ´ÐÐBisecting k-means¾ÛÀàµÃµ½µÄЧ¹ûͼ£¬ÆäÖбàºÅΪ9999µÄµãΪÖÊÐĵ㡣µÚ¶þÅÅЧ¹ûͼ¿´Ëƶà´Î¼ÆËãÖÊÐÄûÓб仯£¬µ«ÊÇʵ¼ÊÊDZ仯µÄ£¬Ö»ÊÇÖÊÐĵãµÄλÖñȽÏÎȶ¨£¬ÀýÈ磬ÏÂÃæÊÇÁ½×éÖÊÐĵ㣺

462.9642857142857,303.5,7
172.0,279.625,8
236.54285714285714,136.25714285714287,10
105.125,65.9375,11
75.91304347826087,185.7826086956522,12
56.03333333333333,299.53333333333336,13
273.5,117.5,14
415.5952380952381,68.71428571428571,15
329.04,308.68,16
374.35,200.55,17

172.0,279.625,5
462.9642857142857,303.5,8
105.125,65.9375,9
236.54285714285714,136.25714285714287,10
273.6666666666667,124.44444444444444,11
415.5952380952381,68.71428571428571,12
75.91304347826087,185.7826086956522,13
56.03333333333333,299.53333333333336,14
379.57894736842104,201.6315789473684,15
329.04,308.68,16

ͬk-meansËã·¨Ò»Ñù£¬Bisecting k-meansËã·¨²»ÊÊÓÃÓÚ·ÇÇòÐδصľÛÀ࣬¶øÇÒ²»Í¬³ß´çºÍÃܶȵÄÀàÐ͵Ĵأ¬Ò²²»Ì«Êʺϡ£

   
5576 ´Îä¯ÀÀ       29
     
Ïà¹ØÎÄÕ Ïà¹ØÎĵµ Ïà¹ØÊÓÆµ



ÎÒÃǸÃÈçºÎÉè¼ÆÊý¾Ý¿â
Êý¾Ý¿âÉè¼Æ¾­Ñé̸
Êý¾Ý¿âÉè¼Æ¹ý³Ì
Êý¾Ý¿â±à³Ì×ܽá
Êý¾Ý¿âÐÔÄܵ÷Óż¼ÇÉ
Êý¾Ý¿âÐÔÄܵ÷Õû
Êý¾Ý¿âÐÔÄÜÓÅ»¯½²×ù
Êý¾Ý¿âϵͳÐÔÄܵ÷ÓÅϵÁÐ
¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Ê¦
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò¼¼Êõ
HadoopÔ­Àí¡¢²¿ÊðÓëÐÔÄܵ÷ÓÅ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí