Abstract
In this vignette, we describe how to use RcppEnsmallen in a standalone C++ file.
RcppEnsmallen package provides an embedded copy of the
ensmallen
C++ library of optimization functions. Optimizers contained within are
state of the art and possess a high level of code quality. Each
optimizer must be accessed through C++ by implementing the
appropriate objective functions and, then, surfaced into R
through RcppArmadillo.
Alternatively, work has been done by Dirk Schumacher in armacmp
to automatically create the underlying C++ code from
R.
Note: Optimizers in RcppEnsmallen only
work with armadillo
data structures. Thus, if using Eigen
through RcppEigen,
please consider the RcppNumerical
package.
Consider the Residual Sum of Squares, also known as RSS, defined as:
\[RSS\left( \beta \right) = \left( { \mathbf{y} - \mathbf{X} \beta } \right)^{\top} \left( \mathbf{y} - \mathbf{X} \beta \right)\]
The objective function we wish to minimize would be defined as:
\[f(\beta) = \rVert \mathbf{y} - \mathbf{X}\beta\lVert_2\]
The gradient is defined as:
\[\frac{\partial RSS}{\partial \beta} = -2 \mathbf{X}^{\top} \left(\mathbf{y} - \mathbf{X} \beta \right)\]
When using ensmallen to solve this problem, we must
create a C++ class that computes both the objective function
value and its gradient value either together or separately under member
functions named as:
Evaluate(): Value of the objective function under the
parameters.Gradient(): Convergence to the correct value under the
given parameters.EvaluateWithGradient(): Perform both steps at the same
time. (Optional)In the Linear Regression scenario, we will define each step
separately to emphasize the calculation occurring. Generally, the one
step EvaluateWithGradient() function will be faster than
the two step variant. More details on design can be found on ensmallen
documentation page for differentiable functions.
Before writing the class, RcppEnsmallen requires
accessing the library in a standalone C++ file with the follow include
and Rcpp Attribute declarations:
The overaching Linear regression class should be constructed as follows:
#include <RcppEnsmallen.h>
// [[Rcpp::depends(RcppEnsmallen)]]
// Define a differentiable objective function by implementing both Evaluate()
// and Gradient() separately.
class LinearRegressionFunction
{
public:
  // Construct the object with the given the design 
  // matrix and responses.
  LinearRegressionFunction(const arma::mat& X,
                           const arma::vec& y) :
  X(X), y(y) { }
  // Return the objective function for model parameters beta.
  double Evaluate(const arma::mat& beta)
  {
      return std::pow(arma::norm(y - X * beta), 2.0);
  }
  // Compute the gradient for model parameters beta
  void Gradient(const arma::mat& beta, arma::mat& g)
  {
      g = -2 * X.t() * (y - X * beta);
  }
private:
  // The design matrix.
  const arma::mat& X;
  // The responses to each data point.
  const arma::vec& y;
};From there:
Note: Make sure to have the definition of the Linear Regression class in the same C++ file as the exported C++ function into R.
// [[Rcpp::export]]
arma::mat lin_reg_lbfgs(const arma::mat& X, const arma::vec& y) {
  // Construct the first objective function.
  LinearRegressionFunction lrf(X, y);
  // Create the L_BFGS optimizer with default parameters.
  // The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
  // handle differentiable functions.
  ens::L_BFGS lbfgs;
  lbfgs.MaxIterations() = 10;
  // Create a starting point for our optimization randomly.
  // The model has p parameters, so the shape is p x 1.
  arma::mat beta(X.n_cols, 1, arma::fill::randn);
  // Run the optimization
  lbfgs.Optimize(lrf, beta);
  return beta;
}Prior to using the new optimizer in mission critical work, compare the results to methods already implemented in R. The best way to achieve this is to create an oracle model by specifying the parameters known to generate data and, then, try to recover them. Moreover, if a method is already implemented in R feel free to try to check the result equality within an appropriate tolerance threshold.
Following with this methodology, data must be generated.
n <- 1e6
beta <- c(-2, 1.5, 3, 8.2, 6.6)
p <- length(beta)
X <- cbind(1, matrix(rnorm(n), ncol = p - 1))
y <- X %*% beta + rnorm(n / (p - 1))Next, the optimization procedure is used to estimate the parameters
of interest. Under this example, the results of the estimation can be
compared to lm(). That said, lm() may have
more precise results when compared against the optimizer as it is
implemented with a closed-form solution to linear regression plus the
computational is performed more rigorously.
| LBFGS | LM | |
|---|---|---|
| Beta1 | -1.999755 | -1.999755 | 
| Beta2 | 1.498457 | 1.498457 | 
| Beta3 | 2.995665 | 2.995665 | 
| Beta4 | 8.200895 | 8.200895 | 
| Beta5 | 6.601061 | 6.601061 |