[NEW FEATURE] Adding min-max linkage to Agglomerative Hierarchical Clustering #93

NimaSarajpoor · 2021-03-21T17:07:19Z

I mentioned it on scikit-learn, and someone suggested mentioning it here on extra library as it seems a more appropriate place to add this new feature.

I think many people (who are on the application side of ML) use scikit-learn for their projects and are not aware of many algorithms that are available. Thus, implementing a new algorithm can let them know such a method exists. Please let me know if it is a good idea to add this method to the Agglomerative Clustering. If so, I can work on it (as I already implemented it myself) and add it to the package.

======================================

Challenge: The linkages in agglemorative clustering that can work with the distance matrix are complete, linkage, and average. However, the resulted clusters don't have a clear centroid. So, a method that can work with a distance matrix and provide a centroid-based structured clusters can provide a lot of flexibility. It can be used for any distance measure and since it utilizes the distance matrix only, it can be performed very fast for large-size data. It also provides centroids which can be used as the representatives of the clusters.

Solution: A min-max linkage proposed by Jacob Bien & Robert Tibshirani (2011) is a good solution. It merges two clusters if a ball that is required to cover the points of the two clusters together is the smallest compare to others. A ball for each cluster can be identified as follows: First, identify the furthest neighbor of each point and calculate its distance. Then, choose the point (M) that has the shortest distance to its furthest neighbor. That point M is the centroid and that shortest distance is the radius of the ball. I recently used it for my paper where I had to use DTW distance, but I also need a fast algorithm that provides a reasonable centroid rather than computing DBA at each step of the process.

Alternative solution: N/A

Additional Info:
Paper:
Bien, Jacob, and Robert Tibshirani. "Hierarchical clustering with prototypes via minimax linkage." Journal of the American Statistical Association 106.495 (2011): 1075-1084.

NimaSarajpoor mentioned this issue Jun 27, 2021

Adding min-max linkage to Agglomerative Clustering scikit-learn/scikit-learn#19698

Open

NimaSarajpoor mentioned this issue Jul 30, 2021

Notebook for Min-Max linkage #93 #125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW FEATURE] Adding min-max linkage to Agglomerative Hierarchical Clustering #93

[NEW FEATURE] Adding min-max linkage to Agglomerative Hierarchical Clustering #93

NimaSarajpoor commented Mar 21, 2021

[NEW FEATURE] Adding min-max linkage to Agglomerative Hierarchical Clustering #93

[NEW FEATURE] Adding min-max linkage to Agglomerative Hierarchical Clustering #93

Comments

NimaSarajpoor commented Mar 21, 2021