cppduals ======== Header-only dual number library for C++11. The `dual<>` type can be used for automatic (forward) differentiation. It can be used in conjunction with Eigen to produced very fast vectorized computations of real and complex matrix functions and their derivatives. There is a small paper on cppduals here: [![DOI](https://joss.theoj.org/papers/10.21105/joss.01487/status.svg)](https://doi.org/10.21105/joss.01487) Documentation ============= [Full documentation is here](https://tesch1.gitlab.io/cppduals/). The dual number space is closely related to the complex number space, and as such, the dual class `duals::dual<>` is similar to `std::complex<>`. When compiling with Eigen it is possible to disable the vectorization templates by `#define CPPDUALS_DONT_VECTORIZE`. This may be useful if your compiler is particularly good at optimizing Eigen expressions, I have had mixed results, sometimes there are differences between the compiler's best (GCC and Clang) and the vectorized code of 30% or more, in either direction. Examples ======== Here we calculate a function $`f(x)`$, with its derivative $`f'(x)`$, calculated explicitly as `df()`, and calculated by using the dual class (`1_e` returns the dual number $`0 + 1 \epsilon`$, it is equivalent to `dual(0,1)`): ```cpp #include using namespace duals::literals; template T f(T x) { return pow(x,pow(x,x)); } template T df(T x) { return pow(x,-1. + x + pow(x,x)) * (1. + x*log(x) + x*pow(log(x),2.)); } template T ddf(T x) { return (pow(x,pow(x,x)) * pow(pow(x,x - 1.) + pow(x,x)*log(x)*(log(x) + 1.), 2.) + pow(x,pow(x,x)) * (pow(x,x - 1.) * log(x) + pow(x,x - 1.) * (log(x) + 1.) + pow(x,x - 1.) * ((x - 1.)/x + log(x)) + pow(x,x) * log(x) * pow(log(x) + 1., 2.) )); } int main() { std::cout << " f(2.) = " << f(2.) << "\n"; std::cout << " df(2.) = " << df(2.) << "\n"; std::cout << "ddf(2.) = " << ddf(2.) << "\n"; std::cout << " f(2+1_e) = " << f(2+1_e) << "\n"; std::cout << " f(2+1_e).dpart() = " << f(2+1_e).dpart() << "\n"; duals::hyperduald x(2+1_e,1+0_e); std::cout << " f((2+1_e) + (1+0_e)_e).dpart().dpart() = " << f(x).dpart().dpart() << "\n"; } ``` Produces: ``` f(2.) = 16 df(2.) = 107.11 ddf(2.) = 958.755 f(2+1_e) = (16+107.11_e) f(2+1_e).dpart() = 107.11 f((2+1_e) + (1+0_e)_e).dpart().dpart() = 958.755 ``` Installation ============ Copy the [duals/](./duals/) directory (or just [dual](./duals/dual) ) somewhere your `#include`s can find it. Then just `#include ` from your source. Alternatively, `cppduals` supports building with CMake. If using CMake v3.14+, the ``FetchContent`` pattern is straightforward and enables using CMake targets to specify library dependencies: ```cmake include(FetchContent) # Have CMake download the library set (CPPDUALS_TAG v0.4.1) set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3) FetchContent_Declare (cppduals URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2 URL_HASH MD5=${CPPDUALS_MD5} ) FetchContent_MakeAvailable (cppduals) # Link to cppduals target_link_libraries (your_target PRIVATE cppduals::duals) ``` Older versions of CMake can achieve a similar result using the ``ExternalProject`` family of commands and modifying the global preprocessor search path: ```cmake include(ExternalProject) # Have CMake download the library headers only set (CPPDUALS_TAG v0.4.1) set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3) ExternalProject_Add (cppduals URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2 URL_HASH MD5=${CPPDUALS_MD5} CONFIGURE_COMMAND "" BUILD_COMMAND "" INSTALL_COMMAND "" ) # Make include directory globally visible ExternalProject_Get_Property (cppduals source_dir) include_directories (${source_dir}/) ``` Alternatively, `cppduals` supports installation and discovery via the `find_package` utility. First, download and install the library to a location of your choosing: ```sh CPPDUALS_PREFIX= git clone https://gitlab.com/tesch1/cppduals.git && cd cppduals mkdir build && cd build cmake -DCMAKE_INSTALL_PREFIX="$CPPDUALS_PREFIX" .. cmake --build . --target install ``` Then, in your project's `CMakeLists.txt`, find and link to the library in the standard manner: ```cmake find_package(cppduals REQUIRED) target_link_libraries(your_target PRIVATE cppduals::cppduals) ``` If you installed `cppduals` to a location that is not on `find_package`'s default search path, you can specify the location by setting the `cppduals_DIR` environment variable when configuring your project: ```sh cd your_build_dir cppduals_DIR="${CPPDUALS_PREFIX}" cmake .. ``` Benchmarks ========== The benchmark compares cppduals against a local BLAS implementation, by default OpenBLAS (whose development package is required; RedHat-flavor: `openblas-devel`, Debian-flavor: `openblas-dev`). If you wish to build the benchmarks against a different installation of BLAS, the following CMake variables can be set at configuration time: - [BLA_VENDOR](https://cmake.org/cmake/help/latest/module/FindBLAS.html) - BLAS_DIR - LAPACK_DIR For example, to build and run the tests shown below: ```sh cmake -Bbuild-bench -H. -DCPPDUALS_BENCHMARK=ON -DBLAS_DIR=/opt/local -DLAPACK_DIR=/opt/local cmake --build build-bench --target bench_gemm ./build-bench/tests/bench_gemm ``` The first performance goal of this project is to make the `duals::dual<>` type at least as fast as `std::complex<>`. This is considered to be an upper-bound for performance because complex math operations are usually highly optimized for scientific computing and have a similar algebraic structure. The second goal is to make the compound type `std::complex>` as fast as possible for use in calculation that require the derivative of complex functions (ie comprising quantum-mechanical wave functions). The first goal is measured by comparing the speed of matrix-matrix operations (nominally matrix multiplication) on `duals::dual<>`-valued Eigen matrices with highly optimtimized BLAS implementations of equivalent operations on complex-valued matrices. This can be done by running the [./tests/bench_gemm](./tests/bench_gemm.cpp) program. In the *ideal* case, the results of the `B_MatMat` type should be nearly as fast, or faster than equivalently sized `B_MatMat`, and double-sized `B_MatMatBLAS<{float,double}>` operations. This is very difficult to achieve in reality, as the BLAS libraries typically use hand-tuned assembly, where the Eigen libraries must strive to express the calculation in a general form that the compiler can turn into optimal code. Comparing Eigen 3.3.7 and OpenBLAS 0.3.6 on an `Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz` is still sub-optimal, only achieving about half the performance of the BLAS equivalent, and 90% of `std::complex`: B_MatMat/32 5433 ns 5427 ns B_MatMat/64 38478 ns 38433 ns B_MatMat/128 299450 ns 298981 ns B_MatMat/256 2365347 ns 2361566 ns B_MatMat/512 18888220 ns 18857342 ns B_MatMat/1024 151079955 ns 150856120 ns B_MatMat/32 4963 ns 4955 ns B_MatMat/64 36716 ns 36671 ns B_MatMat/128 280870 ns 280346 ns B_MatMat/256 2173791 ns 2170886 ns B_MatMat/512 17493222 ns 17459890 ns B_MatMat/1024 138498432 ns 138286283 ns B_MatMatBLAS/32 4877 ns 4870 ns B_MatMatBLAS/64 27722 ns 27691 ns B_MatMatBLAS/128 177084 ns 176756 ns B_MatMatBLAS/256 1268715 ns 1266445 ns B_MatMatBLAS/512 9772184 ns 9726621 ns B_MatMatBLAS/1024 75915016 ns 75432354 ns The second benchmark of interest measures how well the nested specializations `std::complex>` perform as matrix values relative to using a BLAS library with an extended matrix to compute the same value function. This comparison is also made with the [./tests/bench_gemm](./tests/bench_gemm.cpp) program. The relevant measures are `B_MatMat` and `B_MatMatBLAS` of twice the size. On the same machine as above, using `std::complex>` (`cdualf`) shows a speed advantage over the BLAS approach, while using only half the memory. However, notice that the advantage decreases as the matrices get larger, which ideally should not happen: B_MatMat/16 2810 ns 2808 ns B_MatMat/32 19900 ns 19878 ns B_MatMat/64 151837 ns 151646 ns B_MatMat/128 1174699 ns 1172931 ns B_MatMat/256 9122903 ns 9110123 ns B_MatMat/512 72575352 ns 72467264 ns B_MatMatBLAS/32 4877 ns 4870 ns B_MatMatBLAS/64 27722 ns 27691 ns B_MatMatBLAS/128 177084 ns 176756 ns B_MatMatBLAS/256 1268715 ns 1266445 ns B_MatMatBLAS/512 9772184 ns 9726621 ns B_MatMatBLAS/1024 75915016 ns 75432354 ns Contributions ============= Questions, bug reports, bug fixes, and contributions are welcome. Simply submit an [Issue](https://gitlab.com/tesch1/cppduals/issues) or [Merge Request](https://gitlab.com/tesch1/cppduals/merge_requests). Contributors ------------ - [Nestor Demeure](https://gitlab.com/nestordemeure) - [Jeff](https://github.com/flying-tiger) Compiler notes ============== XCode 11 (Apple Clang 11) is known to work. Also various version of g++. Clang 8.0 appears to have some trouble with compiling the optimized templates for Eigen, as evidenced by its propensity to segfault when compiling the cppduals test programs. Please submit issues if you experience similar problems, with specifics of your compiler and compilation flags. License ======= The primary header file `duals/dual` and testing and benchmarking code is licensed under the following: ``` Part of the cppduals project. https://tesch1.gitlab.io/cppduals (c)2019 Michael Tesch. tesch1@gmail.com See https://gitlab.com/tesch1/cppduals/blob/master/LICENSE.txt for license information. This Source Code Form is subject to the terms of the Mozilla Public License v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. ``` Eigen-derived ------------- The support for Eigen vectorization, including `duals/dual_eigen` and the architecture-specific vectorization files under `duals/arch` are derived from the [Eigen project](http://eigen.tuxfamily.org/index.php?title=Main_Page), and thus licensed under [MPL-2](http://www.mozilla.org/MPL/2.0/FAQ.html) . ChangeLog ========= v0.4.1 ====== - changed constexpr to FMT_CONSTEXPR in the dual<> and complex<> formatters to work with more compilers / lang standards. v0.4.0 ====== - cleaned-up release with fixes from v0.3.2. - improved docs v0.3.3+ ======= - ignore these, will be, trying to cleanup release tarballs, next stable will be v0.4.0 v0.3.2 ====== - not actually tagged release - fixed a bug in the `{fmt}` support, added docs for the same. - added benchmarking for `{fmt}` vs iostreams. v0.3.1 ====== - forgot to bump the CMakeLists package version number in 0.3.0. v0.3.0 ====== - vastly improved cmake support, thanks to [Jeff](https://gitlab.com/flying-tiger). The improvements required changing some CMake target names. - Added basic optional [libfmt](https://github.com/fmtlib/fmt) support for duals::dual<> and std::complex<>, enabled with `#define`\ s v0.2.0 ====== - fixed build on VS2017 - save and restore signam and errno in {t,l}gamma - fixes from Nestor D. for https://gitlab.com/tesch1/cppduals/issues/5 (spurious nan) Todo ==== - Add multi-variate differentiation capability. - Non-x86_64 (CUDA/AltiVec/HIP/NEON/...) vectorization. - Higher-order derivatives.