You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I mentioned it on scikit-learn, and someone suggested mentioning it here on extra library as it seems a more appropriate place to add this new feature.
I think many people (who are on the application side of ML) use scikit-learn for their projects and are not aware of many algorithms that are available. Thus, implementing a new algorithm can let them know such a method exists. Please let me know if it is a good idea to add this method to the Agglomerative Clustering. If so, I can work on it (as I already implemented it myself) and add it to the package.
======================================
Challenge: The linkages in agglemorative clustering that can work with the distance matrix are complete, linkage, and average. However, the resulted clusters don't have a clear centroid. So, a method that can work with a distance matrix and provide a centroid-based structured clusters can provide a lot of flexibility. It can be used for any distance measure and since it utilizes the distance matrix only, it can be performed very fast for large-size data. It also provides centroids which can be used as the representatives of the clusters.
Solution: A min-max linkage proposed by Jacob Bien & Robert Tibshirani (2011) is a good solution. It merges two clusters if a ball that is required to cover the points of the two clusters together is the smallest compare to others. A ball for each cluster can be identified as follows: First, identify the furthest neighbor of each point and calculate its distance. Then, choose the point (M) that has the shortest distance to its furthest neighbor. That point M is the centroid and that shortest distance is the radius of the ball. I recently used it for my paper where I had to use DTW distance, but I also need a fast algorithm that provides a reasonable centroid rather than computing DBA at each step of the process.
Alternative solution: N/A
Additional Info:
Paper:
Bien, Jacob, and Robert Tibshirani. "Hierarchical clustering with prototypes via minimax linkage." Journal of the American Statistical Association 106.495 (2011): 1075-1084.
The text was updated successfully, but these errors were encountered:
I mentioned it on scikit-learn, and someone suggested mentioning it here on extra library as it seems a more appropriate place to add this new feature.
I think many people (who are on the application side of ML) use scikit-learn for their projects and are not aware of many algorithms that are available. Thus, implementing a new algorithm can let them know such a method exists. Please let me know if it is a good idea to add this method to the Agglomerative Clustering. If so, I can work on it (as I already implemented it myself) and add it to the package.
======================================
Challenge: The linkages in agglemorative clustering that can work with the distance matrix are complete, linkage, and average. However, the resulted clusters don't have a clear centroid. So, a method that can work with a distance matrix and provide a centroid-based structured clusters can provide a lot of flexibility. It can be used for any distance measure and since it utilizes the distance matrix only, it can be performed very fast for large-size data. It also provides centroids which can be used as the representatives of the clusters.
Solution: A min-max linkage proposed by Jacob Bien & Robert Tibshirani (2011) is a good solution. It merges two clusters if a ball that is required to cover the points of the two clusters together is the smallest compare to others. A ball for each cluster can be identified as follows: First, identify the furthest neighbor of each point and calculate its distance. Then, choose the point (M) that has the shortest distance to its furthest neighbor. That point M is the centroid and that shortest distance is the radius of the ball. I recently used it for my paper where I had to use DTW distance, but I also need a fast algorithm that provides a reasonable centroid rather than computing DBA at each step of the process.
Alternative solution: N/A
Additional Info:
Paper:
Bien, Jacob, and Robert Tibshirani. "Hierarchical clustering with prototypes via minimax linkage." Journal of the American Statistical Association 106.495 (2011): 1075-1084.
The text was updated successfully, but these errors were encountered: