Tests are broken? #5

libeanim · 2016-03-03T11:07:42Z

Hi,
I set up a new virtual environment on my Linux Mint machine and installed numpy, scipy, matplotlib, Cython and plfit via pip.

> pip freeze
cycler==0.10.0
Cython==0.23.4 
matplotlib==1.5.1
numpy==1.10.4
numpydoc==0.6.0
plfit==1.0.2
pyparsing==2.1.0
python-dateutil==2.5.0
pytz==2015.7
scipy==0.17.0
six==1.10.0

My python version is:

> python --version
Python 2.7.6

If I run the 'clauset2009_tests.py' script after everything was installed properly it returns following precision error:

> python clauset2009_tests.py 
/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Using DISCRETE fitter
/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/plfit/plfit.py:830: RuntimeWarning: invalid value encountered in log
  alpha = 1.0 + float(nn) * ( sum(log(xx/(xmin-0.5))) )**-1
/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/plfit/plfit.py:830: RuntimeWarning: divide by zero encountered in divide
  alpha = 1.0 + float(nn) * ( sum(log(xx/(xmin-0.5))) )**-1
alpha = 2.325891   xmin = 46.449000   ksD = 0.015483   L = -3556.028920   (n<x) = 18776  (n>=x) = 671
Using DISCRETE fitter
alpha = 2.325891   xmin = 46.449000   ksD = 0.015483   L = -3556.028920   (n<x) = 18776  (n>=x) = 671
Using DISCRETE fitter
alpha = 2.325891   xmin = 46.449000   ksD = 0.015483   L = -3556.028920   (n<x) = 18776  (n>=x) = 671
Cities (Clauset): n:     19447 mean,std,max:     9.00,   77.83, 8009.00 xmin:    52.46 alpha:     2.37 (    0.08) ntail:        580 p:  0.76
Cities (me)     : n:     19447 mean,std,max:     9.00,   77.82, 8008.65 xmin:    46.45 alpha:     2.33 (    0.05) ntail:        671 p:  1.00
Traceback (most recent call last):
  File "clauset2009_tests.py", line 48, in <module>
    np.testing.assert_almost_equal(ppp._xmin, 52.46, 2)
  File "/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 513, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 2 decimals
 ACTUAL: 46.448999999999998
 DESIRED: 52.46

I also tried the 'consistency_test.py' with a similar result.

> python consistency_test.py 
/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/plfit/plfit.py:113: RuntimeWarning: divide by zero encountered in double_scalars
  a = float(n) / sum(log(x/xmin))
PYTHON plfit executed in 0.327898 seconds
xmin: 0.584303 n(>xmin): 561 alpha: 2.39867 +/- 0.0590518   Log-Likelihood: -472.424   ks: 0.0180329 p(ks): 0.993215
CYTHON plfit executed in 0.202372 seconds
PYTHON plfit executed in 0.202398 seconds
cython cplfit did not load
xmin: 0.60835 n(>xmin): 538 alpha: 2.41978 +/- 0.0612111   Log-Likelihood: -460.973   ks: 0.025935 p(ks): 0.862215
PYTHON plfit executed in 0.329090 seconds
fortran fplfit did not load
xmin: 0.584303 n(>xmin): 561 alpha: 2.39867 +/- 0.0590518   Log-Likelihood: -472.424   ks: 0.0180329 p(ks): 0.993215
Traceback (most recent call last):
  File "consistency_test.py", line 21, in <module>
    np.testing.assert_almost_equal(aa._alpha, bb._alpha, 5)
  File "/home/libeanim/Desktop/WORK/plfit/env/local/lib/python2.7/site-packages/numpy/testing/utils.py", line 513, in assert_almost_equal
    raise AssertionError(_build_err_msg())
AssertionError: 
Arrays are not almost equal to 5 decimals
 ACTUAL: 2.3986678289876329
 DESIRED: 2.4197805296940231

I'm not sure if this is so dramatic but if these tests worked before, there is probably a bug in the code now. Or am I doing something wrong?

keflavich · 2016-03-03T13:54:09Z

I'm afraid bugs have somehow crept in and I have not been able to track them down. I think there is an index-off-by-one error somewhere, but I haven't had the time to find it and my last attempt didn't turn up anything. If you're interested in digging through the source at all to try to find the bug, help would be very welcome!

libeanim · 2016-05-13T08:22:38Z

I'm sorry for this late answer. Unfortunately I have little time as well and I'm not too familiar with this topic, but when I can make it I will file a patch.

Additionally, I encountered another issue which is probably related:

Because I analysed neural avalanche distributions, I generated a fake avalanche distribution which follows a perfect power-law with an exponent of -1.5.

# generate avalanches of size 1 to 100
x = np.arange(1,100)
# distribute commonness of occurrence as perfect power-law with exponent -1.5
y = x**(-1.5) * 10000
# convert to int because the commonness of occurrence is discrete
y = np.int_(y)

# convert into raw data as it is necessary for the fitting
out = [] 
for sz, num in zip(x, y):
    out += (np.ones([num]) * (sz)).tolist()
out = np.array(out)

After generating the data I started the fitting with a given xmin of 1.

myplfit = plfit.plfit(out, xmin=1, verbose=True)
# stdout:
# Using DISCRETE fitter because there are repeated values.
# The lowest value included in the power-law fit,  xmin: 1
# The number of values above xmin,  n(>xmin): 24072
# The derived power-law alpha (p(x)~x^-alpha) with MLE-derived error,  alpha: 1.9288 +/- 0.00598638
# The log of the Likelihood (the maximized parameter; you minimized the negative log likelihood),  # Log-Likelihood: -51767.5
# The KS-test statistic between the best-fit power-law and the data,  ks: 0.415379  occurs with probability   p(ks): 0

Next I plotted the PDF as well as the values of the perfect power-law x and y generated in the beginning:

plt.figure(dpi=120)
plt.plot(x, y, 'g.')
myplfit.plotpdf()
plt.show()

The green dots are the generated power-law which looks like a perfect line. The raw data (black histogram) appears to be messed up in the end which might be a result of the log binning. So I did the same with linear binning:

plt.figure(dpi=120)
plt.plot(x, y, 'g.')
myplfit.plotpdf(dolog=False)
plt.show()

Here we can see that the raw data (black histogram) is now parallel to the green line (even though they don't share the same values). However the red line fitted by the algorithm appears to fit much better to the raw data (using log binning) in the first plot. But as far as I remember the algorithm should work independently of the binning type and be able to fit both?
So is this a bug or is there an error in reasoning in the test data generation above?

keflavich · 2016-06-03T07:30:14Z

@libeanim I haven't had a chance to look at this til today. I'm not entirely sure what is going on, but I suspect some logical error in the construction of the data set or a bug in the discrete fitter. I've never seen a discrepancy of this magnitude; the errors I've seen previously are in the few - tens % range.

libeanim · 2016-06-06T19:14:21Z

Yeah, this discrepancy is strange.
Didn't know if you saw it, but I just realised that the slope of the (fitted) red line also changes between the pictures... First, I only concentrated on the difference between the black bars and the red line and wasn't aware that the slope changes too. To my mind this doesn't make any sense.
(It gets obvious when you compare the red and green line in both pictures as the green line doesn't change its slope.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests are broken? #5

Tests are broken? #5

libeanim commented Mar 3, 2016

keflavich commented Mar 3, 2016

libeanim commented May 13, 2016

keflavich commented Jun 3, 2016

libeanim commented Jun 6, 2016

Tests are broken? #5

Tests are broken? #5

Comments

libeanim commented Mar 3, 2016

keflavich commented Mar 3, 2016

libeanim commented May 13, 2016

keflavich commented Jun 3, 2016

libeanim commented Jun 6, 2016