Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

included complex step finite method for the gradient computation #74

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

flavioluiz
Copy link

Hi,
I've just included the "complex" step method for the multivariable finite_difference, so it can be used to compute the gradient too.

@mlubin
Copy link
Collaborator

mlubin commented Aug 15, 2015

This is okay, but are you aware of https://github.com/JuliaDiff/ForwardDiff.jl which evaluates gradients exactly? The complex step method is approximate, albeit with a very small error term. The complex step method is also not safe for arbitrary user functions:

julia> gradient(x -> dot(x,[2.0,3.0]), [1.0, 1.0], :complex)
2-element Array{Float64,1}:
 -2.0
 -3.0

CC @jrevels

@mlubin
Copy link
Collaborator

mlubin commented Aug 15, 2015

The methods used in ForwardDiff might also be quite a bit faster than the complex step method depending on the function.

@flavioluiz
Copy link
Author

Hi! I agree with you: complex step method must be used with care. The user needs to know how each function deals with complex variables (as in your example, the standard library defines the dot product using the conjugate of the first vector and it breaks the complex step method). But I still believe that this method is a good option against finite differences in many cases if proper care is taken, since it lead to "nearly" exact differentiation (actually, unlike finite differences, complex step can get exact result up to numerical precision).

However, I wasn't aware about ForwardDiff! It seems really interesting!! I thought that those "algorithmic differentiation" methods were slower than complex step. Seems I was wrong. I will look forward to check this package deeply. Thank you for your tip!!

@flavioluiz
Copy link
Author

Hi @mlubin. I've tested ForwardDiff with a few simple examples. Weirdly, complex step lead to an "exact" computation and ForwardDiff doesn't (but very accurate results anyway):

The test function and its exact gradient are:

    f(x) = sin(x[1]) + cos(x[2])
    fd(x) = [cos(x[1]), -sin(x[2])]

With ForwardDiff:

    using ForwardDiff
    g = ForwardDiff.gradient(f)
    g([0.5,0.5]) - fd([0.5,0.5])

which leads to the following error:

    2-element Array{Float64,1}:
     -2.16915e-12
     -2.10881e-12

With complex step:

    julia> gradient(f, [0.5, 0.5],:complex)-fd([0.5,0.5])
    2-element Array{Float64,1}:
     0.0
     0.0

similar results for other simple functions.. I think I am using ForwardDiff wrong. It seems that it uses exactly the same function from Calculus.gradient (which uses central finite differences). I've just started working with Julia, maybe I am doing something wrong.

@jrevels
Copy link
Collaborator

jrevels commented Aug 16, 2015

On my machine:

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)

julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)

julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)

julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
 0.0
 0.0

...so somehow there seems to be a namespacing issue? I even tried it with using Calculus and things still seemed fine on my end:

julia> using ForwardDiff

julia> using Calculus

julia> f(x) = sin(x[1]) + cos(x[2])
f (generic function with 1 method)

julia> fd(x) = [cos(x[1]), -sin(x[2])]
fd (generic function with 1 method)

julia> g = ForwardDiff.gradient(f)
gradf (generic function with 1 method)

julia> g([0.5,0.5]) - fd([0.5,0.5])
2-element Array{Float64,1}:
 0.0
 0.0

@jrevels
Copy link
Collaborator

jrevels commented Aug 16, 2015

@flavioluiz I just pushed a commit to ForwardDiff.jl that should totally prevent method switch-ups on ForwardDiff's end, if you'd like to try things out again to see if that fixes it. If it still end up picking the wrong method, would you mind opening an issue in the ForwardDiff.jl repo with the example? Thanks.

@mlubin
Copy link
Collaborator

mlubin commented Aug 16, 2015

@flavioluiz, which version of ForwardDiff were you testing with?

@flavioluiz
Copy link
Author

ouch. I've realized that I was supposed to use Julia v0.4. I was using Julia v0.3.11 (and last version of ForwardDiff obtained from Pkg).

I've just installed v0.4 and added ForwardDiff from Pkg. Same problem (for some reason, the version from Pkg uses the gradient from Calculus).

Then I cloned 'ForwardDiff' repo and with last version it works fine! Exactly as showed by @jrevels . :-)

@jrevels
Copy link
Collaborator

jrevels commented Aug 17, 2015

Same problem (for some reason, the version from Pkg uses the gradient from Calculus).

Yeah, we haven't yet officially tagged the new version of ForwardDiff, so the version in METADATA.jl hasn't received the update (that's the version you get from Pkg.add("...")) . We're waiting a little bit before releasing the new version to give folks time to adjust to the new API/critique it.

@flavioluiz
Copy link
Author

Hi again. I've been playing with ForwardDiff: it is really nice! In particular, I am impressed with the Hessian computation (in my tests, ForwardDiff was much faster than Calculus finite differences). I am looking forward to try this API with my simulations!!

I compared the gradient methods using Complex step vs ForwardDiff for several functions. Both methods computes the derivative "exactly". In most cases complex step was about two times faster!

I don't know much about these "automatic differentiation" methods (I quickly read the wiki about the subject - still trying to understand). However, I have a feeling that the philosophy behind "complex step" is not so far from this Forward mode method. While Forward mode extends real algebra using dual numbers, complex step profits from the standard implementation of complex arithmetic (whose basic operations are pretty much the same of the dual algebra!). In complex step: if epsilon is "small enough", epsilon^2 and higher orders easily becomes 0.0 in double precision (while first order term still keeps full numerical precision); then the derivative is "almost always exact". if I understood well, forward mode dual numbers uses epsilon^2 = 0 as definition and all the basic operations are done taking this definition into account - so there is no need to chose a small epsilon and there is no danger: the derivative is always exact. Is it right or am I talking nonsense? ;-)

@mlubin
Copy link
Collaborator

mlubin commented Aug 19, 2015

@flavioluiz, yes, you're basically correct. My understanding is that the complex step rule is a "poor man's AD" which came from the matlab world where the only number types which are fast to compute with are real and complex floating-point values.

Do you have any examples you could share where the complex step is faster than ForwardDiff? The only examples I can think of where this could happen is if using BLAS/LAPACK operations, since there's BLAS for complex numbers but not for ForwardDiff numbers.

@flavioluiz
Copy link
Author

@mlubin , right: I was using matrix/vector multiplications:

Q = randn(N,N)
f(x) = dot(conj(x),Q*x)

and similar functions with additional nonlinearities (taking sinus of the result, exponentials, multiplying by sum of x, etc).

After your comment, I tried do be sure of not using any LAPACK operation:

function f(x)
     res = 0.
     for i = 1:length(x)
         res += x[i]*x[i]
     end
     return res
end

Obtaining the gradient using complex step is still faster than ForwardDifffor small vector x (up to about 12 elements). For more variables, ForwardDiff clearly outperforms complex step.

@jrevels
Copy link
Collaborator

jrevels commented Aug 27, 2015

function f(x)
     res = 0.
     for i = 1:length(x)
         res += x[i]*x[i]
     end
     return res
end

@flavioluiz This function isn't type-stable, and is probably causing ForwardDiff to run a bit slower than it normally would. To make the function type-stable, try changing this line:

res = 0.

To this:

res = zero(eltype(x))

That way the type of res won't be variable inside the loop for non-Float64 vectors (ForwardDiff needs to wrap your original values in new types to do forward mode automatic differentiation).

I'd be interested to see if it's still problematic after this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants