ngspice/src/include/cppduals
Holger Vogt ebff1fe79f makefile.am for cppduals 2021-01-23 12:54:23 +01:00
..
doc Squashed commit of the following: 2020-12-22 11:10:10 +01:00
duals makefile.am for cppduals 2021-01-23 12:54:23 +01:00
paper Squashed commit of the following: 2020-12-22 11:10:10 +01:00
tests Squashed commit of the following: 2020-12-22 11:10:10 +01:00
thirdparty Squashed commit of the following: 2020-12-22 11:10:10 +01:00
.appveyor.yml Squashed commit of the following: 2020-12-22 11:10:10 +01:00
.editorconfig Squashed commit of the following: 2020-12-22 11:10:10 +01:00
.gitlab-ci.yml Squashed commit of the following: 2020-12-22 11:10:10 +01:00
CMakeLists.txt Squashed commit of the following: 2020-12-22 11:10:10 +01:00
LICENSE.txt Squashed commit of the following: 2020-12-22 11:10:10 +01:00
README.md Squashed commit of the following: 2020-12-22 11:10:10 +01:00

README.md

cppduals

Header-only dual number library for C++11. The dual<> type can be used for automatic (forward) differentiation. It can be used in conjunction with Eigen to produced very fast vectorized computations of real and complex matrix functions and their derivatives.

There is a small paper on cppduals here: DOI

Documentation

Full documentation is here.

The dual number space is closely related to the complex number space, and as such, the dual class duals::dual<> is similar to std::complex<>.

When compiling with Eigen it is possible to disable the vectorization templates by #define CPPDUALS_DONT_VECTORIZE. This may be useful if your compiler is particularly good at optimizing Eigen expressions, I have had mixed results, sometimes there are differences between the compiler's best (GCC and Clang) and the vectorized code of 30% or more, in either direction.

Examples

Here we calculate a function `f(x)`, with its derivative `f'(x)`, calculated explicitly as df(), and calculated by using the dual class (1_e returns the dual number `0 + 1 \epsilon`, it is equivalent to dual<double>(0,1)):

#include <duals/dual>

using namespace duals::literals;

template <class T> T   f(T x) { return pow(x,pow(x,x)); }
template <class T> T  df(T x) { return pow(x,-1. + x + pow(x,x)) * (1. + x*log(x) + x*pow(log(x),2.)); }
template <class T> T ddf(T x) { return (pow(x,pow(x,x)) * pow(pow(x,x - 1.) + pow(x,x)*log(x)*(log(x) + 1.), 2.) +
                                        pow(x,pow(x,x)) * (pow(x,x - 1.) * log(x) +
                                                           pow(x,x - 1.) * (log(x) + 1.) +
                                                           pow(x,x - 1.) * ((x - 1.)/x + log(x)) +
                                                           pow(x,x) * log(x) * pow(log(x) + 1., 2.) )); }

int main()
{
  std::cout << "  f(2.)            = " << f(2.)    << "\n";
  std::cout << " df(2.)            = " << df(2.)   << "\n";
  std::cout << "ddf(2.)            = " << ddf(2.)  << "\n";
  std::cout << "  f(2+1_e)         = " << f(2+1_e) << "\n";
  std::cout << "  f(2+1_e).dpart() = " << f(2+1_e).dpart() << "\n";
  duals::hyperduald x(2+1_e,1+0_e);
  std::cout << "  f((2+1_e) + (1+0_e)_e).dpart().dpart() = " << f(x).dpart().dpart() << "\n";
}

Produces:

  f(2.)            = 16
 df(2.)            = 107.11
ddf(2.)            = 958.755
  f(2+1_e)         = (16+107.11_e)
  f(2+1_e).dpart() = 107.11
  f((2+1_e) + (1+0_e)_e).dpart().dpart() = 958.755

Installation

Copy the duals/ directory (or just dual ) somewhere your #includes can find it. Then just #include <duals/dual[_eigen]> from your source.

Alternatively, cppduals supports building with CMake. If using CMake v3.14+, the FetchContent pattern is straightforward and enables using CMake targets to specify library dependencies:

  include(FetchContent)

  # Have CMake download the library
  set (CPPDUALS_TAG v0.4.1)
  set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
  FetchContent_Declare (cppduals
    URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
    URL_HASH MD5=${CPPDUALS_MD5}
    )
  FetchContent_MakeAvailable (cppduals)

  # Link to cppduals
  target_link_libraries (your_target PRIVATE cppduals::duals)

Older versions of CMake can achieve a similar result using the ExternalProject family of commands and modifying the global preprocessor search path:

  include(ExternalProject)

  # Have CMake download the library headers only
  set (CPPDUALS_TAG v0.4.1)
  set (CPPDUALS_MD5 7efe49496b8d0e3d3ffbcd3c68f542f3)
  ExternalProject_Add (cppduals
    URL https://gitlab.com/tesch1/cppduals/-/archive/${CPPDUALS_TAG}/cppduals-${CPPDUALS_TAG}.tar.bz2
    URL_HASH MD5=${CPPDUALS_MD5}
    CONFIGURE_COMMAND "" BUILD_COMMAND "" INSTALL_COMMAND "" )

  # Make include directory globally visible
  ExternalProject_Get_Property (cppduals source_dir)
  include_directories (${source_dir}/)

Alternatively, cppduals supports installation and discovery via the find_package utility. First, download and install the library to a location of your choosing:

  CPPDUALS_PREFIX=<desired_install_location>
  git clone https://gitlab.com/tesch1/cppduals.git && cd cppduals
  mkdir build && cd build
  cmake -DCMAKE_INSTALL_PREFIX="$CPPDUALS_PREFIX" ..
  cmake --build . --target install

Then, in your project's CMakeLists.txt, find and link to the library in the standard manner:

  find_package(cppduals REQUIRED)
  target_link_libraries(your_target PRIVATE cppduals::cppduals)

If you installed cppduals to a location that is not on find_package's default search path, you can specify the location by setting the cppduals_DIR environment variable when configuring your project:

  cd your_build_dir
  cppduals_DIR="${CPPDUALS_PREFIX}" cmake ..

Benchmarks

The benchmark compares cppduals against a local BLAS implementation, by default OpenBLAS (whose development package is required; RedHat-flavor: openblas-devel, Debian-flavor: openblas-dev). If you wish to build the benchmarks against a different installation of BLAS, the following CMake variables can be set at configuration time:

For example, to build and run the tests shown below:

  cmake -Bbuild-bench -H. -DCPPDUALS_BENCHMARK=ON -DBLAS_DIR=/opt/local -DLAPACK_DIR=/opt/local
  cmake --build build-bench --target bench_gemm
  ./build-bench/tests/bench_gemm

The first performance goal of this project is to make the duals::dual<> type at least as fast as std::complex<>. This is considered to be an upper-bound for performance because complex math operations are usually highly optimized for scientific computing and have a similar algebraic structure. The second goal is to make the compound type std::complex<duals::dual<>> as fast as possible for use in calculation that require the derivative of complex functions (ie comprising quantum-mechanical wave functions).

The first goal is measured by comparing the speed of matrix-matrix operations (nominally matrix multiplication) on duals::dual<>-valued Eigen matrices with highly optimtimized BLAS implementations of equivalent operations on complex-valued matrices. This can be done by running the ./tests/bench_gemm program. In the ideal case, the results of the B_MatMat<dual{f,d}> type should be nearly as fast, or faster than equivalently sized B_MatMat<complex{f,d}>, and double-sized B_MatMatBLAS<{float,double}> operations. This is very difficult to achieve in reality, as the BLAS libraries typically use hand-tuned assembly, where the Eigen libraries must strive to express the calculation in a general form that the compiler can turn into optimal code.

Comparing Eigen 3.3.7 and OpenBLAS 0.3.6 on an Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz is still sub-optimal, only achieving about half the performance of the BLAS equivalent, and 90% of std::complex<float>:

B_MatMat<dualf,dualf>/32               5433 ns         5427 ns
B_MatMat<dualf,dualf>/64              38478 ns        38433 ns
B_MatMat<dualf,dualf>/128            299450 ns       298981 ns
B_MatMat<dualf,dualf>/256           2365347 ns      2361566 ns
B_MatMat<dualf,dualf>/512          18888220 ns     18857342 ns
B_MatMat<dualf,dualf>/1024        151079955 ns    150856120 ns

B_MatMat<complexf,complexf>/32         4963 ns         4955 ns
B_MatMat<complexf,complexf>/64        36716 ns        36671 ns
B_MatMat<complexf,complexf>/128      280870 ns       280346 ns
B_MatMat<complexf,complexf>/256     2173791 ns      2170886 ns
B_MatMat<complexf,complexf>/512    17493222 ns     17459890 ns
B_MatMat<complexf,complexf>/1024  138498432 ns    138286283 ns

B_MatMatBLAS<complexf>/32              4877 ns         4870 ns
B_MatMatBLAS<complexf>/64             27722 ns        27691 ns
B_MatMatBLAS<complexf>/128           177084 ns       176756 ns
B_MatMatBLAS<complexf>/256          1268715 ns      1266445 ns
B_MatMatBLAS<complexf>/512          9772184 ns      9726621 ns
B_MatMatBLAS<complexf>/1024        75915016 ns     75432354 ns

The second benchmark of interest measures how well the nested specializations std::complex<duals::dual<>> perform as matrix values relative to using a BLAS library with an extended matrix to compute the same value function. This comparison is also made with the ./tests/bench_gemm program. The relevant measures are B_MatMat<cdual{f,d}> and B_MatMatBLAS<complex{f,d}> of twice the size.

On the same machine as above, using std::complex<duals::dual<float>> (cdualf) shows a speed advantage over the BLAS approach, while using only half the memory. However, notice that the advantage decreases as the matrices get larger, which ideally should not happen:

B_MatMat<cdualf,cdualf>/16             2810 ns         2808 ns
B_MatMat<cdualf,cdualf>/32            19900 ns        19878 ns
B_MatMat<cdualf,cdualf>/64           151837 ns       151646 ns
B_MatMat<cdualf,cdualf>/128         1174699 ns      1172931 ns
B_MatMat<cdualf,cdualf>/256         9122903 ns      9110123 ns
B_MatMat<cdualf,cdualf>/512        72575352 ns     72467264 ns

B_MatMatBLAS<complexf>/32              4877 ns         4870 ns
B_MatMatBLAS<complexf>/64             27722 ns        27691 ns
B_MatMatBLAS<complexf>/128           177084 ns       176756 ns
B_MatMatBLAS<complexf>/256          1268715 ns      1266445 ns
B_MatMatBLAS<complexf>/512          9772184 ns      9726621 ns
B_MatMatBLAS<complexf>/1024        75915016 ns     75432354 ns

Contributions

Questions, bug reports, bug fixes, and contributions are welcome. Simply submit an Issue or Merge Request.

Contributors

Compiler notes

XCode 11 (Apple Clang 11) is known to work. Also various version of g++. Clang 8.0 appears to have some trouble with compiling the optimized templates for Eigen, as evidenced by its propensity to segfault when compiling the cppduals test programs. Please submit issues if you experience similar problems, with specifics of your compiler and compilation flags.

License

The primary header file duals/dual and testing and benchmarking code is licensed under the following:

 Part of the cppduals project.
 https://tesch1.gitlab.io/cppduals

 (c)2019 Michael Tesch. tesch1@gmail.com

 See https://gitlab.com/tesch1/cppduals/blob/master/LICENSE.txt for
 license information.

 This Source Code Form is subject to the terms of the Mozilla
 Public License v. 2.0. If a copy of the MPL was not distributed
 with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Eigen-derived

The support for Eigen vectorization, including duals/dual_eigen and the architecture-specific vectorization files under duals/arch are derived from the Eigen project, and thus licensed under MPL-2 .

ChangeLog

v0.4.1

  • changed constexpr to FMT_CONSTEXPR in the dual<> and complex<> formatters to work with more compilers / lang standards.

v0.4.0

  • cleaned-up release with fixes from v0.3.2.
  • improved docs

v0.3.3+

  • ignore these, will be, trying to cleanup release tarballs, next stable will be v0.4.0

v0.3.2

  • not actually tagged release
  • fixed a bug in the {fmt} support, added docs for the same.
  • added benchmarking for {fmt} vs iostreams.

v0.3.1

  • forgot to bump the CMakeLists package version number in 0.3.0.

v0.3.0

  • vastly improved cmake support, thanks to Jeff. The improvements required changing some CMake target names.
  • Added basic optional libfmt support for duals::dual<> and std::complex<>, enabled with #define\ s

v0.2.0

Todo

  • Add multi-variate differentiation capability.
  • Non-x86_64 (CUDA/AltiVec/HIP/NEON/...) vectorization.
  • Higher-order derivatives.