Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on existing tables of External Validation methods #147

Open
BradKML opened this issue Mar 5, 2021 · 11 comments
Open

Documentation on existing tables of External Validation methods #147

BradKML opened this issue Mar 5, 2021 · 11 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@BradKML
Copy link

BradKML commented Mar 5, 2021

Is your feature request related to a problem? Please describe.

There are set-matching and other external indices for comparing expected community labels and output labels from a CD algorithm, it is something that is worth noting. separate from #114 as they seem irrelevant from one another.

@github-actions
Copy link

github-actions bot commented Mar 5, 2021

Thanks for submitting your first issue!

@BradKML
Copy link
Author

BradKML commented Mar 5, 2021

@BradKML
Copy link
Author

BradKML commented May 21, 2021

@BradKML
Copy link
Author

BradKML commented Jun 2, 2021

@BradKML
Copy link
Author

BradKML commented Jun 28, 2021

I drafted some of the external indices for pair counting

ARI = a - (a+b)*(a+c)/total
Baulieu_1 = 1- (total*(b+c)+(b-c)**2)/(total**2)
Baulieu_2 = 1- (a*d+b*c)/(total**2)
Correlation_Coefficient = (a*d-b*c)/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Czekanowski = 2*a/(2*a+b+c)
Fager_McGowan = a/sqrt((a+b)*(c+d))-1/(2*sqrt(a+b)) # Asymmetric
Fowlkes_Mallows = a/sqrt((a+b)*(a+c))
Gamma = (total*a-(a+b)*(a+c))/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Goodman_Kruskal = (a*d-b*c)/(a*d+b*c)
Gower_Legendre = (a+d)/(a+(b+c)/2+d)
Hamann = ((a+d)-(b+c))/total
Hubert = (a+d)-(b+c)/total
Jaccard = a/(a+b+c)
Kulczynski = (a/(a+b)+a/(a+c))/2
McConnaughey = (a**2-b*c)/((a-b)*(a+c)) # Asymmetric
Minkowski = sqrt((b+c)/(a+b)) # Asymmetric
Mirkin = 2*(b+c)/total
Pearson = (a*d-b*c)/((a+b)*(a+c)*(c+d)*(b+d))
Pierce = (a*d-b*c)/((a+c)*(b+d))
Rand = (a+d)/total
Russel_Rao = a/total
Rogers_Tanimoto = (a+d)/(a+2(b+c)+d)
Sokal_Sneath_1 = (a/(a+b)+a/(a+c)+d/(b+d)+d/(c+d))/4
Sokal_Sneath_2 = a/(a+2*(b+c))
Sokal_Sneath_3 = (a*d)/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Wallance = a/(a+b) # Asymmetric
Yule = ((a*d)-(b*c))/((a*b)-(c+d)) # Asymmetric

@BradKML
Copy link
Author

BradKML commented Jun 28, 2021

For:

@GiulioRossetti
Copy link
Owner

I drafted some of the external indices for pair counting

ARI = a - (a+b)*(a+c)/total
Baulieu_1 = 1- (total*(b+c)+(b-c)**2)/(total**2)
Baulieu_2 = 1- (a*d+b*c)/(total**2)
Correlation_Coefficient = (a*d-b*c)/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Czekanowski = 2*a/(2*a+b+c)
Fager_McGowan = a/sqrt((a+b)*(c+d))-1/(2*sqrt(a+b)) # Asymmetric
Fowlkes_Mallows = a/sqrt((a+b)*(a+c))
Gamma = (total*a-(a+b)*(a+c))/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Goodman_Kruskal = (a*d-b*c)/(a*d+b*c)
Gower_Legendre = (a+d)/(a+(b+c)/2+d)
Hamann = ((a+d)-(b+c))/total
Hubert = (a+d)-(b+c)/total
Jaccard = a/(a+b+c)
Kulczynski = (a/(a+b)+a/(a+c))/2
McConnaughey = (a**2-b*c)/((a-b)*(a+c)) # Asymmetric
Minkowski = sqrt((b+c)/(a+b)) # Asymmetric
Mirkin = 2*(b+c)/total
Pearson = (a*d-b*c)/((a+b)*(a+c)*(c+d)*(b+d))
Pierce = (a*d-b*c)/((a+c)*(b+d))
Rand = (a+d)/total
Russel_Rao = a/total
Rogers_Tanimoto = (a+d)/(a+2(b+c)+d)
Sokal_Sneath_1 = (a/(a+b)+a/(a+c)+d/(b+d)+d/(c+d))/4
Sokal_Sneath_2 = a/(a+2*(b+c))
Sokal_Sneath_3 = (a*d)/sqrt((a+b)*(a+c)*(c+d)*(b+d))
Wallance = a/(a+b) # Asymmetric
Yule = ((a*d)-(b*c))/((a*b)-(c+d)) # Asymmetric

That's interesting.
Could you please specify what each letter represents? otherwise, some definitions are not straightforward to implement.

@BradKML
Copy link
Author

BradKML commented Jun 28, 2021

@GiulioRossetti so A is True Positive pairs (when nodes A and B are in the same partition in both cases), and D is True Negative pairs. (when nodes A and B are in a different partition in both cases), and B/C is when the node pair of the ground truth does not match the node pair of the prediction.

For further reference:

@GiulioRossetti
Copy link
Owner

@GiulioRossetti so A is True Positive pairs (when nodes A and B are in the same partition in both cases), and D is True Negative pairs. (when nodes A and B are in a different partition in both cases), and B/C is when the node pair of the ground truth does not match the node pair of the prediction.

Thanks for the clarification. Ok per A (which is TP) and D (which is TN) but B and C are not interchangeable. I assume that B is FP and C is FN, am I right?

@BradKML
Copy link
Author

BradKML commented Jun 28, 2021

@GiulioRossetti they are interchangeable in most cases, but in the Fager_McGowan, McConnaughey, Minkowski, Wallance and Yule they are not.

In Wallace(A,B)=a/(a+b) case, b is when the node pair is in the same partition in A but different partitions in B.
A would-be ground truth and B would be predicted partition

@BradKML
Copy link
Author

BradKML commented Mar 29, 2022

A weird side thought:

  1. The pair-counting methodology can be converted into a distance metric (see https://en.wikipedia.org/wiki/Jaccard_index#Other_definitions_of_Tanimoto_distance )
  2. Alternate implementation discovery in https://github.com/n-serrette/Cluster_Index and https://github.com/MartijnGosgens/validation_indices/blob/master/PairCountingIndices.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Development

No branches or pull requests

2 participants