You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing this great library with us! 😊
I am using implicit version 0.7.2 and I think I found a bug.
If I create two instances of BPR using the same arguments, with a fixed random_state and when using more than one thread via num_threads, I do get different user and item factors after fitting.
But, when I use just one thread this issue does not occur.
It seems that BPR produces non-deterministic results when using multiple threads.
Maybe this is the reason for the non-deterministic behaviour of BPR?
Best regards,
Bernhard
Minimal example of creating 2 BPR instances using a fixed seed and multiple threads.
The resulting user and item factors are different after fitting the models. Assertion errors occur.
Hello,
Thank you for sharing this great library with us! 😊
I am using implicit version 0.7.2 and I think I found a bug.
If I create two instances of BPR using the same arguments, with a fixed
random_state
and when using more than one thread vianum_threads
, I do get different user and item factors after fitting.But, when I use just one thread this issue does not occur.
It seems that BPR produces non-deterministic results when using multiple threads.
I suspect that there may be some bug using the
RNGVector
class inbpr.pyx
(c++ RNG object).https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L43
A Numpy Random Generator is seeded using the supplied
random_state
argument (check_random_state
inutils.py
).https://github.com/benfred/implicit/blob/v0.7.2/implicit/utils.py#L83
Each thread is supposed to get a
RNGVector
instance using a different random seed by using the Random Generator.https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L185-L187
But, this issue does not occur for me using the LMF matrix factorization model, though it also uses a c++ RNG object.
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/lmf.pyx#L40
There is a difference between BPR and LMF regarding the generation of random numbers with the
RNGVector
class.lmf_update
when running in parallel the functionthreadid
fromcython.parallel
to obtain the thread id, which is then used to generate a random number usingRNGVector
withrng.generate(thread_id)
.https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/lmf.pyx#L250
bpr_update
the thread id by using an external functionget_thread_num
inbpr.h
, which returnsomp_get_thread_num
if the_OPENMP
macro is defined, otherwise it returns 0.https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.pyx#L267
https://github.com/benfred/implicit/blob/v0.7.2/implicit/cpu/bpr.h#L10-L20
Maybe this is the reason for the non-deterministic behaviour of BPR?
Best regards,
Bernhard
Minimal example of creating 2 BPR instances using a fixed seed and multiple threads.
The resulting user and item factors are different after fitting the models. Assertion errors occur.
Output:
I am using a win64 architecture and this conda environment:
The text was updated successfully, but these errors were encountered: