convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241

shashank025 · 2024-06-01T02:00:43Z

Describe the bug

When the input graph for the method cdlib.utils.convert_graph_formats() is a networkx graph where the node type is neither a string not an int (but is hashable as networkx expects), if the graph is bipartite, this method fails when converting to igraph format with the following error when attempting to populate the gi.vs["type"] value:

Traceback (most recent call last):
  File "/Users/shashankr/projects/cdlib/test.py", line 29, in <module>
    convert_graph_formats(G, ig.Graph)
  File "/Users/shashankr/projects/cdlib/cdlib/utils.py", line 197, in convert_graph_formats
    return __from_nx_to_igraph(graph, directed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/shashankr/projects/cdlib/cdlib/utils.py", line 141, in __from_nx_to_igraph
    if type(name) == int else a_r[int(name.replace("\\", ""))]
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '<__main__.Node object at 0x100e5b560>'

To Reproduce
Steps to reproduce the behavior:

CDlib version: 0.4.0
Operating System: MacOs Sonoma 14.5 (23F79)
Python version: 3.12
Version(s) of CDlib required libraries: Not sure

Here's a script (test.py) to repro the issue:

import networkx as nx
import igraph as ig

from cdlib import algorithms
from cdlib.utils import convert_graph_formats

class Node:
  def __init__(self, id, type):
    self.id = id
    self.type = type

  def __hash__(self):
    return hash(self.id)

  def __eq__(self, other):
    return self.id == other.id

john = Node('John Travolta', 'actor')
nick = Node('Nick Cage', 'actor')
face_off = Node('Face Off', 'movie')

G = nx.Graph()
G.add_node(john)
G.add_node(nick)
G.add_edge(john, face_off, label='ACTED_IN')
G.add_edge(nick, face_off, label='ACTED_IN')

convert_graph_formats(G, ig.Graph)

Simply run with:

python test.py

Expected behavior

I'd expect that call to convert_graph_formats() to work.

Screenshots
NA

Additional context

I also have a fix that I'm workin on! I'll produce a PR shortly.

The text was updated successfully, but these errors were encountered:

shashank025 · 2024-06-01T02:07:06Z

The root cause is that the logic inside the bipartite case branch was treating nodes that are just hashable (but neither strings nor ints) as strings. The fix is to properly handle such nodes by maintaining a mapping from strings to corresponding nodes:

diff --git a/cdlib/utils.py b/cdlib/utils.py
index 17a6e03..9ec917b 100644
--- a/cdlib/utils.py
+++ b/cdlib/utils.py
@@ -132,8 +132,9 @@ def __from_nx_to_igraph(g: object, directed: bool = None) -> object:
             gi.add_edges([("\\" + str(u), "\\" + str(v)) for (u, v) in g.edges()])

     if bipartite.is_bipartite(g) and not skip_bipartite:
+        convert = {str(x):x for x in g.nodes()}
         gi.vs["type"] = [
-            a_r[name] if type(name) == int else a_r[int(name.replace("\\", ""))]
+            a_r[name] if type(name) == int else a_r[convert[name.replace("\\", "")]]
             for name in gi.vs["name"]
         ]

…that are neither strings nor ints

GiulioRossetti · 2024-06-01T11:10:35Z

Hi,
I'll look into it - but I cannot guarantee you that support for generic node object descriptors will be integrated.

The main reason for the actual nodetype constraints is that cdlib accommodates several graph representations (networkx, graph_tool, igraph, ad-hoc ones...).

Allowing generic objects as nodes makes it complicated to handle some cross-representation transformations (and could affect downstream visualization/evaluation tasks requiring printable node labels - we cannot assume that all objects will came with well defined and unique string reprs).

Moreover, using complex objects as nodes usually generate a relevant memory footprint: if such information is not needed during computation (rarely the case in CD) it is indeed better to keep it separated from the graph topology (and maybe use it for post-process analysis).

shashank025 · 2024-06-02T17:53:41Z

Thank you!

I understand that cdlib needs to work with diverse graph representations, etc, and I agree that working with complex objects can have adverse memory impact, but real world applications using cdlib will often already deal with complex node objects. As a library, cdlib will be more "ergonomic" if it can consume such objects as is.

The good thing is this should be relatively easy for cdlib to do, by taking advantage of the fact that node objects are hashable, by simply invoking id(node) on the object, which results in integer node values. Applications can later "recover" the original node object via:

recover = { id(node): node for node in graph.nodes() }

I already see similar code in cdlib:

convert = { str(node): node for node in graph.nodes() }

which makes me suspect the approach I'm suggesting should be easy to apply.

Let me know what you think! I'm actually willing to help with this as well.

shashank025 pushed a commit to shashank025/cdlib that referenced this issue Jun 1, 2024

fix GiulioRossetti#241: ensure convert_graph_formats() handles nodes …

c477978

…that are neither strings nor ints

This was referenced Jun 1, 2024

convert_graph_formats() handles networkx nodes that are neither strin… #242

Open

algorithms.leiden() breaks when the input networkx graph's nodes are neither strings nor ints #243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241

convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241

shashank025 commented Jun 1, 2024 •

edited

Loading

shashank025 commented Jun 1, 2024 •

edited

Loading

GiulioRossetti commented Jun 1, 2024

shashank025 commented Jun 2, 2024 •

edited

Loading

convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241

convert_graph_formats() breaks for bipartite networkx graphs, where the nodes are neither strings nor ints #241

Comments

shashank025 commented Jun 1, 2024 • edited Loading

shashank025 commented Jun 1, 2024 • edited Loading

GiulioRossetti commented Jun 1, 2024

shashank025 commented Jun 2, 2024 • edited Loading

shashank025 commented Jun 1, 2024 •

edited

Loading

shashank025 commented Jun 1, 2024 •

edited

Loading

shashank025 commented Jun 2, 2024 •

edited

Loading