Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalised unary vector function framework #1558

Merged
merged 36 commits into from
Jan 24, 2020
Merged

Generalised unary vector function framework #1558

merged 36 commits into from
Jan 24, 2020

Conversation

andrjohns
Copy link
Collaborator

@andrjohns andrjohns commented Dec 29, 2019

Summary

This pull introduces a framework for generalising unary vector functions (e.g. log_sum_exp or log_softmax) to work with Eigen column and row vectors, std::vectors, and containers of these.

The motivations behind the framework were discussed in #1425, and the design document for the framework is here. There has also been additional discussion over in the forums.

A new vectorisation framework, apply_vector_unary, is proposed. The framework has two functions to address the different types of vector functions:

  • apply_vector_unary<T>::apply() for: f(vector) -> vector
  • apply_vector_unary<T>::reduce() for: f(vector) -> scalar

This pull also introduces an example use of each type of this vectorisation:

  • log_softmax (apply())
  • log_sum_exp (reduce())

For each of these functions, their prim, rev, and fwd definitions have been re-written in the new framework, and their associated tests expanded.

Tests

Both prim and mix tests have been added to check that the same values and derivatives are returned across the new vector inputs.

Side Effects

I had to update the templating in one header from Eigen::Matrix<T, R, C> to Eigen::MatrixBase<Derived> so that they could work with Eigen::Map inputs:

  • matrix_vari

Checklist

  • Math issue Use Eigen::Map to replace arr functions #1425

  • Copyright holder: Andrew Johnson

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@andrjohns
Copy link
Collaborator Author

I've removed the apply_scalar and head implementations, since they don't line up with the expected functionality of head in the integration tests. Under apply_scalar, when passing head a container of vectors (e.g. head(std::vector<VectorXd>, 2)), the framework would apply head to each of those vectors in the container. However, the integration tests expect that head(std::vector<VectorXd>, 2) would return the first two vectors in the container (in full).

This was me misunderstanding the implementation of head and so I've not touched it in this PR.

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.99)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.95)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.93)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 0.97)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.03)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.0)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.07)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 1.01)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.01)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 0.97)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 1.0)
Result: 0.99535221307
Commit hash: 8ebea40

stan/math/prim/mat/vectorize/apply_vector_unary.hpp Outdated Show resolved Hide resolved
stan/math/prim/mat/vectorize/apply_vector_unary.hpp Outdated Show resolved Hide resolved
stan/math/prim/mat/vectorize/apply_vector_unary.hpp Outdated Show resolved Hide resolved
stan/math/rev/core/matrix_vari.hpp Outdated Show resolved Hide resolved
stan/math/rev/core/matrix_vari.hpp Show resolved Hide resolved
stan/math/fwd/mat/fun/log_sum_exp.hpp Outdated Show resolved Hide resolved
test/unit/math/mix/mat/fun/log_softmax_test.cpp Outdated Show resolved Hide resolved
stan/math/rev/mat/fun/log_sum_exp.hpp Outdated Show resolved Hide resolved
*/
template <typename T, require_t<is_fvar<scalar_type_t<T>>>...>
inline auto log_softmax(T&& x) {
return apply_vector_unary<T>::apply(std::forward<T>(x), [&](auto& alpha) {
Copy link
Contributor

@t4c1 t4c1 Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect forwarding brings no benefits here. This would work as well with no additional copys and the code is a bit shorter:

inline auto log_softmax(const T& x) {
  return apply_vector_unary<T>::apply(x, [&](auto& alpha) {

I think the same holds true for every other place in this PR that uses perfect forwarding.

Perfect forwarding is better than just const refs only if a variable can outlive the function in question. For example if the function can directley return the variable or construct some object that holds the variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect forwarding is better than just const refs only if a variable can outlive the function in question. For example if the function can directley return the variable or construct some object that holds the variable.

Can you link me to a doc that talks about that? My understanding of PF is that it tells the compiler "I'm okay with moving ownership of this memory and leaving the original object in an undefined state"

In particular, for the below think about a recursive function that can have a long ref stack that the compiler probably can't sort through. PF gives the compiler the ability to move ownership of that memory. So we don't have to worry about whether the compiler will have alias issues when deciding to inline recurses.

For example if the function can directly return the variable or construct some object that holds the variable.

The CPP guides have a lot on this with a nice little table in F.15 has some good heuristics

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#fcall-parameter-passing

In particular this seems to recommend Andrew keep it as a forwarding reference

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f19-for-forward-parameters-pass-by-tp-and-only-stdforward-the-parameter

It feels like the above func is just something we want to allow the compiler to move through if it's able

Copy link
Contributor

@t4c1 t4c1 Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link me to a doc that talks about that?

Sorry, that is just my understanding and I a quick search does not find anything on when perfect forwarding is benefitial and when it is not. I can't think of any other example, but please correct me if I am wrong.

My understanding of PF is that it tells the compiler "I'm okay with moving ownership of this memory and leaving the original object in an undefined state"

I would say more accurate would be "I am CAPABLE of either moving a temporary object or using a reference to lvalue object." In this PR no function can actually move the parameter, so const refs are just as good.

PF gives the compiler the ability to move ownership of that memory. So we don't have to worry about whether the compiler will have alias issues when deciding to inline recurses.

I don't know anything about that. Can you link any doc or benchmark that supports this?

Thanks for the links to the guidelines. However I disagree with your conclusion that they suggest we should use perfect forwarding. The first link points to section that says simple means of parameter passing (which exclude perfect forwarding) should be used, unless benefits of something more advanced are demonstrated.

The second link suggests nothing on whether perfect forwarding should be used or not. It only explains what a function accepting a forwarding reference should do with it. It says that a function should be flagged (as violatin those guidelines) if it does anything with this argument except std::forward-ing it exactley once.

I would add to this that std::forward-ing an argument to function is only beneficial if that function either uses perfect forwarding or has two overloads accepting const lvalue ref and rvalue ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we shouldn't clutter the code with perfect forwarding examples where they can't be used. This came up in another PR with autodiff types, which have no data content to move.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t4c1 at work but ill link some stuff tonight

Maybe we should write up an rfc to talk about what types and sizes should use certain argument types. I tend to be way to pf giddy

Copy link
Contributor

@t4c1 t4c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! There are a few places it can be improved a bit and a question or two.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 5, 2020

@andrjohns I will make a PR to your fork to fix these merge conflicts. This wont be completely trivial changes so I think a PR is needed. I missed that you are touching fwd files, else I would have hold off on the flattening.

EDIT: It was actually more or less straightforward, so I just pushed the changes. Hopefully that is fine.

@stan-buildbot
Copy link
Contributor

(stat_comp_benchmarks/benchmarks/gp_pois_regr/gp_pois_regr.stan, 0.98)
(stat_comp_benchmarks/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan, 0.97)
(stat_comp_benchmarks/benchmarks/irt_2pl/irt_2pl.stan, 1.01)
(stat_comp_benchmarks/benchmarks/pkpd/one_comp_mm_elim_abs.stan, 0.99)
(stat_comp_benchmarks/benchmarks/eight_schools/eight_schools.stan, 1.02)
(stat_comp_benchmarks/benchmarks/gp_regr/gp_regr.stan, 1.0)
(stat_comp_benchmarks/benchmarks/arK/arK.stan, 1.0)
(performance.compilation, 1.02)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan, 1.0)
(stat_comp_benchmarks/benchmarks/low_dim_gauss_mix/low_dim_gauss_mix.stan, 1.0)
(stat_comp_benchmarks/benchmarks/sir/sir.stan, 1.0)
(stat_comp_benchmarks/benchmarks/pkpd/sim_one_comp_mm_elim_abs.stan, 0.95)
(stat_comp_benchmarks/benchmarks/garch/garch.stan, 1.0)
(stat_comp_benchmarks/benchmarks/gp_regr/gen_gp_data.stan, 1.01)
(stat_comp_benchmarks/benchmarks/arma/arma.stan, 0.96)
Result: 0.99359923753
Commit hash: 4e83afb

@andrjohns
Copy link
Collaborator Author

Thanks for sorting out those conflicts @rok-cesnovar!

@t4c1 I'll update this PR (and the tests) once #1471 gets merged in. Thanks!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.9 4.89 1.0 0.33% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.25% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 -0.22% slower
gp_regr/gp_regr.stan 0.22 0.23 0.98 -1.59% slower
irt_2pl/irt_2pl.stan 6.07 6.14 0.99 -1.14% slower
performance.compilation 87.88 86.56 1.02 1.5% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.32 7.3 1.0 0.23% faster
pkpd/one_comp_mm_elim_abs.stan 20.86 21.5 0.97 -3.09% slower
sir/sir.stan 97.05 101.95 0.95 -5.04% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.98 -1.92% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.95 1.0 0.07% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.32 1.03 2.99% faster
arK/arK.stan 2.42 2.43 1.0 -0.35% slower
arma/arma.stan 0.79 0.79 1.0 -0.23% slower
garch/garch.stan 0.53 0.53 1.0 -0.31% slower
Mean result: 0.99433636044

Jenkins Console Log
[Blue Ocean](https://jenkins.mc-stan.org/blue/organizations/jenkins/Math Pipeline/detail/PR-1558/892/pipeline)
Commit hash: 8d8539a


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

deriv += v[i].d_ * exp_vi;
}
return fvar<T>(log_sum_exp(vals), deriv / denominator);
return log_sum_exp(x2, x1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed a redundant definition here (and in the rev) header. Originally, there was a definition for log_sum_exp(const fvar<T>& x1, double x2) and log_sum_exp(double x1, const fvar<T>& x2), but we can just have one definition and change the order of arguments as needed.

@andrjohns
Copy link
Collaborator Author

@t4c1 This is ready for another look-over

Copy link
Contributor

@t4c1 t4c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the conflicts and a few details and this is good to go.

return max + std::log((x.array() - max).exp().sum());
template <typename T, require_t<std::is_arithmetic<scalar_type_t<T>>>...>
inline auto log_sum_exp(const T& x) {
return apply_vector_unary<T>::reduce(x, [&](auto& v) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return apply_vector_unary<T>::reduce(x, [&](auto& v) {
return apply_vector_unary<T>::reduce(x, [&](const auto& v) {

With no perfect forwarding this can be const. Same for all other lambdas you introduced.

stan/math/rev/fun/log_sum_exp.hpp Outdated Show resolved Hide resolved
stan/math/rev/core/matrix_vari.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@t4c1 t4c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we have feture freeze now, so merging needs to wait until release.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.95 4.91 1.01 0.78% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.37% slower
eight_schools/eight_schools.stan 0.09 0.09 1.01 1.04% faster
gp_regr/gp_regr.stan 0.23 0.22 1.03 3.33% faster
irt_2pl/irt_2pl.stan 6.08 6.07 1.0 0.21% faster
performance.compilation 88.24 86.4 1.02 2.09% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.35 7.32 1.0 0.36% faster
pkpd/one_comp_mm_elim_abs.stan 20.62 20.29 1.02 1.6% faster
sir/sir.stan 104.92 104.51 1.0 0.39% faster
gp_regr/gen_gp_data.stan 0.05 0.04 1.01 0.88% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 2.96 1.01 1.28% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.32 1.0 0.22% faster
arK/arK.stan 1.76 1.75 1.0 0.33% faster
arma/arma.stan 0.8 0.8 1.0 -0.42% slower
garch/garch.stan 0.59 0.59 0.99 -0.74% slower
Mean result: 1.00554972051

Jenkins Console Log
Blue Ocean
Commit hash: f3b3286


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.98 4.92 1.01 1.22% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.02% faster
eight_schools/eight_schools.stan 0.09 0.09 0.97 -3.14% slower
gp_regr/gp_regr.stan 0.23 0.23 1.01 0.62% faster
irt_2pl/irt_2pl.stan 6.19 6.07 1.02 1.92% faster
performance.compilation 87.91 86.69 1.01 1.39% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.3 7.32 1.0 -0.27% slower
pkpd/one_comp_mm_elim_abs.stan 20.25 20.49 0.99 -1.17% slower
sir/sir.stan 104.37 105.15 0.99 -0.75% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.2% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.96 2.96 1.0 0.16% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.32 1.0 0.12% faster
arK/arK.stan 1.75 1.74 1.0 0.43% faster
arma/arma.stan 0.8 0.81 0.99 -0.65% slower
garch/garch.stan 0.59 0.59 0.99 -0.52% slower
Mean result: 0.99983537081

Jenkins Console Log
Blue Ocean
Commit hash: f3b3286


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@andrjohns
Copy link
Collaborator Author

@rok-cesnovar I'm guessing this means that the segfaults have been fixed? Is this safe to merge?

@SteveBronder
Copy link
Collaborator

I think either way we need to wait until next release since we are in code freeze atm

@andrjohns
Copy link
Collaborator Author

Good catch, forgot about that part

@rok-cesnovar
Copy link
Member

Yes, the segfault were resolved (see #1615), but we have to wait until Sunday/Monday to merge after the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants