|
|
||
|---|---|---|
| .. | ||
| doc | ||
| duals | ||
| paper | ||
| tests | ||
| thirdparty | ||
| .appveyor.yml | ||
| .editorconfig | ||
| .gitlab-ci.yml | ||
| CMakeLists.txt | ||
| LICENSE.txt | ||
| README.md | ||
README.md
cppduals
Header-only dual number library for C++11. The dual<> type can be
used for automatic (forward) differentiation. It can be used in
conjunction with Eigen to produced very fast vectorized computations
of real and complex matrix functions and their derivatives.
There is a small paper on cppduals here:
Documentation
The dual number space is closely related to the complex number space,
and as such, the dual class duals::dual<> is similar to
std::complex<>.
When compiling with Eigen it is possible to disable the vectorization
templates by #define CPPDUALS_DONT_VECTORIZE. This may be useful if
your compiler is particularly good at optimizing Eigen expressions, I
have had mixed results, sometimes there are differences between the
compiler's best (GCC and Clang) and the vectorized code of 30% or
more, in either direction.
Examples
Here we calculate a function `f(x)`, with its derivative `f'(x)`,
calculated explicitly as df(), and calculated by using the dual
class (1_e returns the dual number `0 + 1 \epsilon`, it is
equivalent to dual<double>(0,1)):
#include <duals/dual>
using namespace duals::literals;
template <class T> T f(T x) { return pow(x,pow(x,x)); }
template <class T> T df(T x) { return pow(x,-1. + x + pow(x,x)) * (1. + x*log(x) + x*pow(log(x),2.)); }
template <class T> T ddf(T x) { return (pow(x,pow(x,x)) * pow(pow(x,x - 1.) + pow(x,x)*log(x)*(log(x) + 1.), 2.) +
pow(x,pow(x,x)) * (pow(x,x - 1.) * log(x) +
pow(x,x - 1.) * (log(x) + 1.) +
pow(x,x - 1.) * ((x - 1.)/x + log(x)) +
pow(x,x) * log(x) * pow(log(x) + 1., 2.) )); }
int main()
{
std::cout << " f(2.) = " << f(2.) << "\n";
std::cout << " df(2.) = " << df(2.) << "\n";
std::cout << "ddf(2.) = " << ddf(2.) << "\n";
std::cout << " f(2+1_e) = " << f(2+1_e) << "\n";
std::cout << " f(2+1_e).dpart() = " << f(2+1_e).dpart() << "\n";
duals::hyperduald x(2+1_e,1+0_e);
std::cout << " f((2+1_e) + (1+0_e)_e).dpart().dpart() = " << f(x).dpart().dpart() << "\n";
}
Produces:
f(2.) = 16
df(2.) = 107.11
ddf(2.) = 958.755
f(2+1_e) = (16+107.11_e)
f(2+1_e).dpart() = 107.11
f((2+1_e) + (1+0_e)_e).dpart().dpart() = 958.755
Installation
Copy the duals/ directory (or just dual )
somewhere your #includes can find it. Then just #include <duals/dual[_eigen]> from your source.
Alternatively, cppduals supports building with CMake. If using CMake v3.14+,
the FetchContent pattern is straightforward and enables using CMake targets
to specify library dependencies:
include(FetchContent)
# Have CMake download the library
set (CPPDUALS_TAG v0.4.1)
set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
FetchContent_Declare (cppduals
URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
URL_HASH MD5=${CPPDUALS_MD5}
)
FetchContent_MakeAvailable (cppduals)
# Link to cppduals
target_link_libraries (your_target PRIVATE cppduals::duals)
Older versions of CMake can achieve a similar result using the ExternalProject
family of commands and modifying the global preprocessor search path:
include(ExternalProject)
# Have CMake download the library headers only
set (CPPDUALS_TAG v0.4.1)
set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
ExternalProject_Add (cppduals
URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
URL_HASH MD5=${CPPDUALS_MD5}
CONFIGURE_COMMAND "" BUILD_COMMAND "" INSTALL_COMMAND "" )
# Make include directory globally visible
ExternalProject_Get_Property (cppduals source_dir)
include_directories (${source_dir}/)
Alternatively, cppduals supports installation and discovery via the
find_package utility. First, download and install the library to a
location of your choosing:
CPPDUALS_PREFIX=<desired_install_location>
git clone https://gitlab.com/tesch1/cppduals.git && cd cppduals
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX="$CPPDUALS_PREFIX" ..
cmake --build . --target install
Then, in your project's CMakeLists.txt, find and link to the library in the
standard manner:
find_package(cppduals REQUIRED)
target_link_libraries(your_target PRIVATE cppduals::cppduals)
If you installed cppduals to a location that is not on find_package's
default search path, you can specify the location by setting the cppduals_DIR
environment variable when configuring your project:
cd your_build_dir
cppduals_DIR="${CPPDUALS_PREFIX}" cmake ..
Benchmarks
The benchmark compares cppduals against a local BLAS implementation,
by default OpenBLAS (whose development package is required;
RedHat-flavor: openblas-devel, Debian-flavor: openblas-dev). If
you wish to build the benchmarks against a different installation of
BLAS, the following CMake variables can be set at configuration time:
- BLA_VENDOR
- BLAS_DIR
- LAPACK_DIR
For example, to build and run the tests shown below:
cmake -Bbuild-bench -H. -DCPPDUALS_BENCHMARK=ON -DBLAS_DIR=/opt/local -DLAPACK_DIR=/opt/local
cmake --build build-bench --target bench_gemm
./build-bench/tests/bench_gemm
The first performance goal of this project is to make the
duals::dual<> type at least as fast as std::complex<>. This is
considered to be an upper-bound for performance because complex math
operations are usually highly optimized for scientific computing and
have a similar algebraic structure. The second goal is to make the
compound type std::complex<duals::dual<>> as fast as possible for
use in calculation that require the derivative of complex functions
(ie comprising quantum-mechanical wave functions).
The first goal is measured by comparing the speed of matrix-matrix
operations (nominally matrix multiplication) on duals::dual<>-valued
Eigen matrices with highly optimtimized BLAS implementations of
equivalent operations on complex-valued matrices. This can be done by
running the ./tests/bench_gemm program. In
the ideal case, the results of the B_MatMat<dual{f,d}> type should
be nearly as fast, or faster than equivalently sized
B_MatMat<complex{f,d}>, and double-sized
B_MatMatBLAS<{float,double}> operations. This is very difficult to
achieve in reality, as the BLAS libraries typically use hand-tuned
assembly, where the Eigen libraries must strive to express the
calculation in a general form that the compiler can turn into optimal
code.
Comparing Eigen 3.3.7 and OpenBLAS 0.3.6 on an Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz is still sub-optimal, only achieving about half
the performance of the BLAS equivalent, and 90% of
std::complex<float>:
B_MatMat<dualf,dualf>/32 5433 ns 5427 ns
B_MatMat<dualf,dualf>/64 38478 ns 38433 ns
B_MatMat<dualf,dualf>/128 299450 ns 298981 ns
B_MatMat<dualf,dualf>/256 2365347 ns 2361566 ns
B_MatMat<dualf,dualf>/512 18888220 ns 18857342 ns
B_MatMat<dualf,dualf>/1024 151079955 ns 150856120 ns
B_MatMat<complexf,complexf>/32 4963 ns 4955 ns
B_MatMat<complexf,complexf>/64 36716 ns 36671 ns
B_MatMat<complexf,complexf>/128 280870 ns 280346 ns
B_MatMat<complexf,complexf>/256 2173791 ns 2170886 ns
B_MatMat<complexf,complexf>/512 17493222 ns 17459890 ns
B_MatMat<complexf,complexf>/1024 138498432 ns 138286283 ns
B_MatMatBLAS<complexf>/32 4877 ns 4870 ns
B_MatMatBLAS<complexf>/64 27722 ns 27691 ns
B_MatMatBLAS<complexf>/128 177084 ns 176756 ns
B_MatMatBLAS<complexf>/256 1268715 ns 1266445 ns
B_MatMatBLAS<complexf>/512 9772184 ns 9726621 ns
B_MatMatBLAS<complexf>/1024 75915016 ns 75432354 ns
The second benchmark of interest measures how well the nested
specializations std::complex<duals::dual<>> perform as matrix values
relative to using a BLAS library with an extended matrix to compute
the same value function. This comparison is also made with the
./tests/bench_gemm program. The relevant
measures are B_MatMat<cdual{f,d}> and B_MatMatBLAS<complex{f,d}>
of twice the size.
On the same machine as above, using std::complex<duals::dual<float>>
(cdualf) shows a speed advantage over the BLAS approach, while using
only half the memory. However, notice that the advantage decreases as
the matrices get larger, which ideally should not happen:
B_MatMat<cdualf,cdualf>/16 2810 ns 2808 ns
B_MatMat<cdualf,cdualf>/32 19900 ns 19878 ns
B_MatMat<cdualf,cdualf>/64 151837 ns 151646 ns
B_MatMat<cdualf,cdualf>/128 1174699 ns 1172931 ns
B_MatMat<cdualf,cdualf>/256 9122903 ns 9110123 ns
B_MatMat<cdualf,cdualf>/512 72575352 ns 72467264 ns
B_MatMatBLAS<complexf>/32 4877 ns 4870 ns
B_MatMatBLAS<complexf>/64 27722 ns 27691 ns
B_MatMatBLAS<complexf>/128 177084 ns 176756 ns
B_MatMatBLAS<complexf>/256 1268715 ns 1266445 ns
B_MatMatBLAS<complexf>/512 9772184 ns 9726621 ns
B_MatMatBLAS<complexf>/1024 75915016 ns 75432354 ns
Contributions
Questions, bug reports, bug fixes, and contributions are welcome. Simply submit an Issue or Merge Request.
Contributors
Compiler notes
XCode 11 (Apple Clang 11) is known to work. Also various version of g++. Clang 8.0 appears to have some trouble with compiling the optimized templates for Eigen, as evidenced by its propensity to segfault when compiling the cppduals test programs. Please submit issues if you experience similar problems, with specifics of your compiler and compilation flags.
License
The primary header file duals/dual and testing and benchmarking code
is licensed under the following:
Part of the cppduals project.
https://tesch1.gitlab.io/cppduals
(c)2019 Michael Tesch. tesch1@gmail.com
See https://gitlab.com/tesch1/cppduals/blob/master/LICENSE.txt for
license information.
This Source Code Form is subject to the terms of the Mozilla
Public License v. 2.0. If a copy of the MPL was not distributed
with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
Eigen-derived
The support for Eigen vectorization, including duals/dual_eigen and
the architecture-specific vectorization files under duals/arch are
derived from the Eigen
project, and
thus licensed under MPL-2 .
ChangeLog
v0.4.1
- changed constexpr to FMT_CONSTEXPR in the dual<> and complex<> formatters to work with more compilers / lang standards.
v0.4.0
- cleaned-up release with fixes from v0.3.2.
- improved docs
v0.3.3+
- ignore these, will be, trying to cleanup release tarballs, next stable will be v0.4.0
v0.3.2
- not actually tagged release
- fixed a bug in the
{fmt}support, added docs for the same. - added benchmarking for
{fmt}vs iostreams.
v0.3.1
- forgot to bump the CMakeLists package version number in 0.3.0.
v0.3.0
- vastly improved cmake support, thanks to Jeff. The improvements required changing some CMake target names.
- Added basic optional libfmt support for
duals::dual<> and std::complex<>, enabled with
#define\ s
v0.2.0
- fixed build on VS2017
- save and restore signam and errno in {t,l}gamma
- fixes from Nestor D. for https://gitlab.com/tesch1/cppduals/issues/5 (spurious nan)
Todo
- Add multi-variate differentiation capability.
- Non-x86_64 (CUDA/AltiVec/HIP/NEON/...) vectorization.
- Higher-order derivatives.