355 lines
13 KiB
Markdown
355 lines
13 KiB
Markdown
|
|
cppduals
|
||
|
|
========
|
||
|
|
|
||
|
|
Header-only dual number library for C++11. The `dual<>` type can be
|
||
|
|
used for automatic (forward) differentiation. It can be used in
|
||
|
|
conjunction with Eigen to produced very fast vectorized computations
|
||
|
|
of real and complex matrix functions and their derivatives.
|
||
|
|
|
||
|
|
There is a small paper on cppduals here:
|
||
|
|
[](https://doi.org/10.21105/joss.01487)
|
||
|
|
|
||
|
|
Documentation
|
||
|
|
=============
|
||
|
|
|
||
|
|
[Full documentation is here](https://tesch1.gitlab.io/cppduals/).
|
||
|
|
|
||
|
|
The dual number space is closely related to the complex number space,
|
||
|
|
and as such, the dual class `duals::dual<>` is similar to
|
||
|
|
`std::complex<>`.
|
||
|
|
|
||
|
|
When compiling with Eigen it is possible to disable the vectorization
|
||
|
|
templates by `#define CPPDUALS_DONT_VECTORIZE`. This may be useful if
|
||
|
|
your compiler is particularly good at optimizing Eigen expressions, I
|
||
|
|
have had mixed results, sometimes there are differences between the
|
||
|
|
compiler's best (GCC and Clang) and the vectorized code of 30% or
|
||
|
|
more, in either direction.
|
||
|
|
|
||
|
|
Examples
|
||
|
|
========
|
||
|
|
|
||
|
|
Here we calculate a function $`f(x)`$, with its derivative $`f'(x)`$,
|
||
|
|
calculated explicitly as `df()`, and calculated by using the dual
|
||
|
|
class (`1_e` returns the dual number $`0 + 1 \epsilon`$, it is
|
||
|
|
equivalent to `dual<double>(0,1)`):
|
||
|
|
|
||
|
|
```cpp
|
||
|
|
#include <duals/dual>
|
||
|
|
|
||
|
|
using namespace duals::literals;
|
||
|
|
|
||
|
|
template <class T> T f(T x) { return pow(x,pow(x,x)); }
|
||
|
|
template <class T> T df(T x) { return pow(x,-1. + x + pow(x,x)) * (1. + x*log(x) + x*pow(log(x),2.)); }
|
||
|
|
template <class T> T ddf(T x) { return (pow(x,pow(x,x)) * pow(pow(x,x - 1.) + pow(x,x)*log(x)*(log(x) + 1.), 2.) +
|
||
|
|
pow(x,pow(x,x)) * (pow(x,x - 1.) * log(x) +
|
||
|
|
pow(x,x - 1.) * (log(x) + 1.) +
|
||
|
|
pow(x,x - 1.) * ((x - 1.)/x + log(x)) +
|
||
|
|
pow(x,x) * log(x) * pow(log(x) + 1., 2.) )); }
|
||
|
|
|
||
|
|
int main()
|
||
|
|
{
|
||
|
|
std::cout << " f(2.) = " << f(2.) << "\n";
|
||
|
|
std::cout << " df(2.) = " << df(2.) << "\n";
|
||
|
|
std::cout << "ddf(2.) = " << ddf(2.) << "\n";
|
||
|
|
std::cout << " f(2+1_e) = " << f(2+1_e) << "\n";
|
||
|
|
std::cout << " f(2+1_e).dpart() = " << f(2+1_e).dpart() << "\n";
|
||
|
|
duals::hyperduald x(2+1_e,1+0_e);
|
||
|
|
std::cout << " f((2+1_e) + (1+0_e)_e).dpart().dpart() = " << f(x).dpart().dpart() << "\n";
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Produces:
|
||
|
|
|
||
|
|
```
|
||
|
|
f(2.) = 16
|
||
|
|
df(2.) = 107.11
|
||
|
|
ddf(2.) = 958.755
|
||
|
|
f(2+1_e) = (16+107.11_e)
|
||
|
|
f(2+1_e).dpart() = 107.11
|
||
|
|
f((2+1_e) + (1+0_e)_e).dpart().dpart() = 958.755
|
||
|
|
```
|
||
|
|
|
||
|
|
Installation
|
||
|
|
============
|
||
|
|
|
||
|
|
Copy the [duals/](./duals/) directory (or just [dual](./duals/dual) )
|
||
|
|
somewhere your `#include`s can find it. Then just `#include
|
||
|
|
<duals/dual[_eigen]>` from your source.
|
||
|
|
|
||
|
|
Alternatively, `cppduals` supports building with CMake. If using CMake v3.14+,
|
||
|
|
the ``FetchContent`` pattern is straightforward and enables using CMake targets
|
||
|
|
to specify library dependencies:
|
||
|
|
|
||
|
|
```cmake
|
||
|
|
include(FetchContent)
|
||
|
|
|
||
|
|
# Have CMake download the library
|
||
|
|
set (CPPDUALS_TAG v0.4.1)
|
||
|
|
set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
|
||
|
|
FetchContent_Declare (cppduals
|
||
|
|
URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
|
||
|
|
URL_HASH MD5=${CPPDUALS_MD5}
|
||
|
|
)
|
||
|
|
FetchContent_MakeAvailable (cppduals)
|
||
|
|
|
||
|
|
# Link to cppduals
|
||
|
|
target_link_libraries (your_target PRIVATE cppduals::duals)
|
||
|
|
```
|
||
|
|
|
||
|
|
Older versions of CMake can achieve a similar result using the ``ExternalProject``
|
||
|
|
family of commands and modifying the global preprocessor search path:
|
||
|
|
|
||
|
|
```cmake
|
||
|
|
include(ExternalProject)
|
||
|
|
|
||
|
|
# Have CMake download the library headers only
|
||
|
|
set (CPPDUALS_TAG v0.4.1)
|
||
|
|
set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
|
||
|
|
ExternalProject_Add (cppduals
|
||
|
|
URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
|
||
|
|
URL_HASH MD5=${CPPDUALS_MD5}
|
||
|
|
CONFIGURE_COMMAND "" BUILD_COMMAND "" INSTALL_COMMAND "" )
|
||
|
|
|
||
|
|
# Make include directory globally visible
|
||
|
|
ExternalProject_Get_Property (cppduals source_dir)
|
||
|
|
include_directories (${source_dir}/)
|
||
|
|
```
|
||
|
|
|
||
|
|
Alternatively, `cppduals` supports installation and discovery via the
|
||
|
|
`find_package` utility. First, download and install the library to a
|
||
|
|
location of your choosing:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
CPPDUALS_PREFIX=<desired_install_location>
|
||
|
|
git clone https://gitlab.com/tesch1/cppduals.git && cd cppduals
|
||
|
|
mkdir build && cd build
|
||
|
|
cmake -DCMAKE_INSTALL_PREFIX="$CPPDUALS_PREFIX" ..
|
||
|
|
cmake --build . --target install
|
||
|
|
```
|
||
|
|
|
||
|
|
Then, in your project's `CMakeLists.txt`, find and link to the library in the
|
||
|
|
standard manner:
|
||
|
|
|
||
|
|
```cmake
|
||
|
|
find_package(cppduals REQUIRED)
|
||
|
|
target_link_libraries(your_target PRIVATE cppduals::cppduals)
|
||
|
|
```
|
||
|
|
|
||
|
|
If you installed `cppduals` to a location that is not on `find_package`'s
|
||
|
|
default search path, you can specify the location by setting the `cppduals_DIR`
|
||
|
|
environment variable when configuring your project:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
cd your_build_dir
|
||
|
|
cppduals_DIR="${CPPDUALS_PREFIX}" cmake ..
|
||
|
|
```
|
||
|
|
|
||
|
|
|
||
|
|
Benchmarks
|
||
|
|
==========
|
||
|
|
|
||
|
|
The benchmark compares cppduals against a local BLAS implementation,
|
||
|
|
by default OpenBLAS (whose development package is required;
|
||
|
|
RedHat-flavor: `openblas-devel`, Debian-flavor: `openblas-dev`). If
|
||
|
|
you wish to build the benchmarks against a different installation of
|
||
|
|
BLAS, the following CMake variables can be set at configuration time:
|
||
|
|
|
||
|
|
- [BLA_VENDOR](https://cmake.org/cmake/help/latest/module/FindBLAS.html)
|
||
|
|
- BLAS_DIR
|
||
|
|
- LAPACK_DIR
|
||
|
|
|
||
|
|
For example, to build and run the tests shown below:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
cmake -Bbuild-bench -H. -DCPPDUALS_BENCHMARK=ON -DBLAS_DIR=/opt/local -DLAPACK_DIR=/opt/local
|
||
|
|
cmake --build build-bench --target bench_gemm
|
||
|
|
./build-bench/tests/bench_gemm
|
||
|
|
```
|
||
|
|
|
||
|
|
The first performance goal of this project is to make the
|
||
|
|
`duals::dual<>` type at least as fast as `std::complex<>`. This is
|
||
|
|
considered to be an upper-bound for performance because complex math
|
||
|
|
operations are usually highly optimized for scientific computing and
|
||
|
|
have a similar algebraic structure. The second goal is to make the
|
||
|
|
compound type `std::complex<duals::dual<>>` as fast as possible for
|
||
|
|
use in calculation that require the derivative of complex functions
|
||
|
|
(ie comprising quantum-mechanical wave functions).
|
||
|
|
|
||
|
|
The first goal is measured by comparing the speed of matrix-matrix
|
||
|
|
operations (nominally matrix multiplication) on `duals::dual<>`-valued
|
||
|
|
Eigen matrices with highly optimtimized BLAS implementations of
|
||
|
|
equivalent operations on complex-valued matrices. This can be done by
|
||
|
|
running the [./tests/bench_gemm](./tests/bench_gemm.cpp) program. In
|
||
|
|
the *ideal* case, the results of the `B_MatMat<dual{f,d}>` type should
|
||
|
|
be nearly as fast, or faster than equivalently sized
|
||
|
|
`B_MatMat<complex{f,d}>`, and double-sized
|
||
|
|
`B_MatMatBLAS<{float,double}>` operations. This is very difficult to
|
||
|
|
achieve in reality, as the BLAS libraries typically use hand-tuned
|
||
|
|
assembly, where the Eigen libraries must strive to express the
|
||
|
|
calculation in a general form that the compiler can turn into optimal
|
||
|
|
code.
|
||
|
|
|
||
|
|
Comparing Eigen 3.3.7 and OpenBLAS 0.3.6 on an `Intel(R) Core(TM)
|
||
|
|
i7-7700 CPU @ 3.60GHz` is still sub-optimal, only achieving about half
|
||
|
|
the performance of the BLAS equivalent, and 90% of
|
||
|
|
`std::complex<float>`:
|
||
|
|
|
||
|
|
B_MatMat<dualf,dualf>/32 5433 ns 5427 ns
|
||
|
|
B_MatMat<dualf,dualf>/64 38478 ns 38433 ns
|
||
|
|
B_MatMat<dualf,dualf>/128 299450 ns 298981 ns
|
||
|
|
B_MatMat<dualf,dualf>/256 2365347 ns 2361566 ns
|
||
|
|
B_MatMat<dualf,dualf>/512 18888220 ns 18857342 ns
|
||
|
|
B_MatMat<dualf,dualf>/1024 151079955 ns 150856120 ns
|
||
|
|
|
||
|
|
B_MatMat<complexf,complexf>/32 4963 ns 4955 ns
|
||
|
|
B_MatMat<complexf,complexf>/64 36716 ns 36671 ns
|
||
|
|
B_MatMat<complexf,complexf>/128 280870 ns 280346 ns
|
||
|
|
B_MatMat<complexf,complexf>/256 2173791 ns 2170886 ns
|
||
|
|
B_MatMat<complexf,complexf>/512 17493222 ns 17459890 ns
|
||
|
|
B_MatMat<complexf,complexf>/1024 138498432 ns 138286283 ns
|
||
|
|
|
||
|
|
B_MatMatBLAS<complexf>/32 4877 ns 4870 ns
|
||
|
|
B_MatMatBLAS<complexf>/64 27722 ns 27691 ns
|
||
|
|
B_MatMatBLAS<complexf>/128 177084 ns 176756 ns
|
||
|
|
B_MatMatBLAS<complexf>/256 1268715 ns 1266445 ns
|
||
|
|
B_MatMatBLAS<complexf>/512 9772184 ns 9726621 ns
|
||
|
|
B_MatMatBLAS<complexf>/1024 75915016 ns 75432354 ns
|
||
|
|
|
||
|
|
|
||
|
|
The second benchmark of interest measures how well the nested
|
||
|
|
specializations `std::complex<duals::dual<>>` perform as matrix values
|
||
|
|
relative to using a BLAS library with an extended matrix to compute
|
||
|
|
the same value function. This comparison is also made with the
|
||
|
|
[./tests/bench_gemm](./tests/bench_gemm.cpp) program. The relevant
|
||
|
|
measures are `B_MatMat<cdual{f,d}>` and `B_MatMatBLAS<complex{f,d}>`
|
||
|
|
of twice the size.
|
||
|
|
|
||
|
|
On the same machine as above, using `std::complex<duals::dual<float>>`
|
||
|
|
(`cdualf`) shows a speed advantage over the BLAS approach, while using
|
||
|
|
only half the memory. However, notice that the advantage decreases as
|
||
|
|
the matrices get larger, which ideally should not happen:
|
||
|
|
|
||
|
|
B_MatMat<cdualf,cdualf>/16 2810 ns 2808 ns
|
||
|
|
B_MatMat<cdualf,cdualf>/32 19900 ns 19878 ns
|
||
|
|
B_MatMat<cdualf,cdualf>/64 151837 ns 151646 ns
|
||
|
|
B_MatMat<cdualf,cdualf>/128 1174699 ns 1172931 ns
|
||
|
|
B_MatMat<cdualf,cdualf>/256 9122903 ns 9110123 ns
|
||
|
|
B_MatMat<cdualf,cdualf>/512 72575352 ns 72467264 ns
|
||
|
|
|
||
|
|
B_MatMatBLAS<complexf>/32 4877 ns 4870 ns
|
||
|
|
B_MatMatBLAS<complexf>/64 27722 ns 27691 ns
|
||
|
|
B_MatMatBLAS<complexf>/128 177084 ns 176756 ns
|
||
|
|
B_MatMatBLAS<complexf>/256 1268715 ns 1266445 ns
|
||
|
|
B_MatMatBLAS<complexf>/512 9772184 ns 9726621 ns
|
||
|
|
B_MatMatBLAS<complexf>/1024 75915016 ns 75432354 ns
|
||
|
|
|
||
|
|
|
||
|
|
Contributions
|
||
|
|
=============
|
||
|
|
|
||
|
|
Questions, bug reports, bug fixes, and contributions are welcome.
|
||
|
|
Simply submit an [Issue](https://gitlab.com/tesch1/cppduals/issues)
|
||
|
|
or [Merge Request](https://gitlab.com/tesch1/cppduals/merge_requests).
|
||
|
|
|
||
|
|
Contributors
|
||
|
|
------------
|
||
|
|
|
||
|
|
- [Nestor Demeure](https://gitlab.com/nestordemeure)
|
||
|
|
- [Jeff](https://github.com/flying-tiger)
|
||
|
|
|
||
|
|
Compiler notes
|
||
|
|
==============
|
||
|
|
|
||
|
|
XCode 11 (Apple Clang 11) is known to work. Also various version of
|
||
|
|
g++. Clang 8.0 appears to have some trouble with compiling the
|
||
|
|
optimized templates for Eigen, as evidenced by its propensity to
|
||
|
|
segfault when compiling the cppduals test programs. Please submit
|
||
|
|
issues if you experience similar problems, with specifics of your
|
||
|
|
compiler and compilation flags.
|
||
|
|
|
||
|
|
License
|
||
|
|
=======
|
||
|
|
|
||
|
|
The primary header file `duals/dual` and testing and benchmarking code
|
||
|
|
is licensed under the following:
|
||
|
|
|
||
|
|
```
|
||
|
|
Part of the cppduals project.
|
||
|
|
https://tesch1.gitlab.io/cppduals
|
||
|
|
|
||
|
|
(c)2019 Michael Tesch. tesch1@gmail.com
|
||
|
|
|
||
|
|
See https://gitlab.com/tesch1/cppduals/blob/master/LICENSE.txt for
|
||
|
|
license information.
|
||
|
|
|
||
|
|
This Source Code Form is subject to the terms of the Mozilla
|
||
|
|
Public License v. 2.0. If a copy of the MPL was not distributed
|
||
|
|
with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
|
||
|
|
```
|
||
|
|
|
||
|
|
Eigen-derived
|
||
|
|
-------------
|
||
|
|
|
||
|
|
The support for Eigen vectorization, including `duals/dual_eigen` and
|
||
|
|
the architecture-specific vectorization files under `duals/arch` are
|
||
|
|
derived from the [Eigen
|
||
|
|
project](http://eigen.tuxfamily.org/index.php?title=Main_Page), and
|
||
|
|
thus licensed under [MPL-2](http://www.mozilla.org/MPL/2.0/FAQ.html) .
|
||
|
|
|
||
|
|
ChangeLog
|
||
|
|
=========
|
||
|
|
|
||
|
|
v0.4.1
|
||
|
|
======
|
||
|
|
|
||
|
|
- changed constexpr to FMT_CONSTEXPR in the dual<> and complex<>
|
||
|
|
formatters to work with more compilers / lang standards.
|
||
|
|
|
||
|
|
v0.4.0
|
||
|
|
======
|
||
|
|
|
||
|
|
- cleaned-up release with fixes from v0.3.2.
|
||
|
|
- improved docs
|
||
|
|
|
||
|
|
v0.3.3+
|
||
|
|
=======
|
||
|
|
|
||
|
|
- ignore these, will be, trying to cleanup release tarballs, next
|
||
|
|
stable will be v0.4.0
|
||
|
|
|
||
|
|
v0.3.2
|
||
|
|
======
|
||
|
|
|
||
|
|
- not actually tagged release
|
||
|
|
- fixed a bug in the `{fmt}` support, added docs for the same.
|
||
|
|
- added benchmarking for `{fmt}` vs iostreams.
|
||
|
|
|
||
|
|
v0.3.1
|
||
|
|
======
|
||
|
|
|
||
|
|
- forgot to bump the CMakeLists package version number in 0.3.0.
|
||
|
|
|
||
|
|
v0.3.0
|
||
|
|
======
|
||
|
|
|
||
|
|
- vastly improved cmake support, thanks to
|
||
|
|
[Jeff](https://gitlab.com/flying-tiger). The improvements required
|
||
|
|
changing some CMake target names.
|
||
|
|
- Added basic optional [libfmt](https://github.com/fmtlib/fmt) support for
|
||
|
|
duals::dual<> and std::complex<>, enabled with `#define`\ s
|
||
|
|
|
||
|
|
v0.2.0
|
||
|
|
======
|
||
|
|
|
||
|
|
- fixed build on VS2017
|
||
|
|
- save and restore signam and errno in {t,l}gamma
|
||
|
|
- fixes from Nestor D. for https://gitlab.com/tesch1/cppduals/issues/5 (spurious nan)
|
||
|
|
|
||
|
|
Todo
|
||
|
|
====
|
||
|
|
|
||
|
|
- Add multi-variate differentiation capability.
|
||
|
|
- Non-x86_64 (CUDA/AltiVec/HIP/NEON/...) vectorization.
|
||
|
|
- Higher-order derivatives.
|
||
|
|
|