Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary check of dimensions for input? #280

Open
BioTurboNick opened this issue Jun 4, 2024 · 2 comments
Open

Unnecessary check of dimensions for input? #280

BioTurboNick opened this issue Jun 4, 2024 · 2 comments

Comments

@BioTurboNick
Copy link

Is this check necessary?

Would be nicer to emit a trivial clustering rather than to error.

num_points <= dim && throw(ArgumentError("points has $dim rows and $num_points columns. Must be a D x N matric with D < N"))

@lwabeke
Copy link

lwabeke commented Sep 26, 2024

Bump on this, I don't have a lot of experience with DBSCAN, but I don't understand the point of this exception.
From what I can determine, the code should excecute without this check.
Not sure if I would call clustering any more trivial in this case, consider that expanding dim from 1 to 100 does not change the original inherent problem of deciding if the 2 points are close together or not (assuming a version of dbscan that ignores this check):

dim = 100
points = zeros(dim)
points[1,:] .= 1.0
clustersTogether = dbscan(points, 1)

points[1,2] = 10
clustersApart  = dbscan(points, 1)
length(clustersTogether.counts) != length(clustersApart.counts)

@BioTurboNick
Copy link
Author

I was assuming that the error was added because incorrect results can be produced if the number of dimensions is greater than the number of points present. So by "trivial clustering" I meant "each point is a cluster". or an empty set if min_cluster_size is greater than 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants