CORRCLUSTERING ∈ NP
Given V = { v1, ..., vN } data points, a correlation function s: E → {0, 1} where s(vi, vj) = s(vj, vi) is applied to all pair of elements in V to obtain a symmetric correlation matrix M. Given an upper bound number of clusters k, CORRCLUSTERING determines whether there is a clustering cl of V where each pair of points in cl has intracluster correlation 1 and intercluster correlation 0 and |cl| ≤ k.
In this project, I created an instance of the CORRCLUSTERING problem and performed a polynomial-time reduction to SAT. I obtained a propositional logic formula by reducing the problem to the HARD and SOFT clauses described in the paper published in 2013 by Jeremias Berg and Matti Järvisalo Optimal Correlation Clustering via MaxSAT and applied a MaxSAT Solver to obtain a Maximum SATisfiability resulting clustering.
After obtaining the OPTIMAL resulting clustering it is very easy to extract from the model which SOFT clauses are violated. As the paper explains, the performance achieved by SAT Solvers to solve this task is much higher than its Integer programming counterparts, which can be prohibitive even for a relative small V.