optim – pkofod

I am very happy to say that we can finally announce that Optim.jl v0.9.0 is out. This version has quite a few user facing changes. Please read about the changes below if you use Optim.jl in a package, a script, or anything else, as you will quite likely have to make some changes to your code.

As always, I have to thank my two partners in crime: Asbjørn Nilsen Riseth (@anriseth) and Christoph Ortner (@cortner) for their help in making the changes, transitions, and tests that are included in v0.9.0.

The last update (form v0.6.0 to v0.7.0 had) some changes that were a long time coming, and so does v0.9.0. Hopefully, these fixes to old design problems will greatly improve the user experience and performance of Optim.jl, and pave the way for more exiting features in the future.

We’ve tried to make the transition as smooth as possible, although we do have breaking changes in this update. Please consult the documentation if you face problems, join us on gitter or ask the community at discourse!

Okay, now to the changes.

Why not v0.8.0?
First of all, why v0.9.0? Last version was v0.7.8! The is because we are dropping support for Julia v0.4 and v0.5 simultaneously, so we are reserving v0.8.0 for backporting serious fixes to Julia v0.5. However, v0.6 should be just around the corner. With Julia v0.7 and v1.0.0 not too far out in the horizon either, I’ve decided it’s more important to move forward than to keep v0.4 and v0.5 up to speed. The dev time is constrained, so currently it’s one or the other. Of course, for users of Julia v0.5. they can simply continue to use Optim.jl v0.7.8. Post Julia’s proper release, backwards compatibility and continuity will be more important, even if it comes at the expense of development speed.

Another note about the version number: The next version of Optim.jl will be v1.0.0, and we will follow SEMVER 2.0 fully.

Change order of evaluation point and storage arguments
This one is very breaking, although we have set up op a system such that all gradients and Hessians will be checked before proceeding. This check will be removed shortly in a v1.0.0 version bump, so please correct your code now. Basically, we closed a very old issue (#156) concerning the input argument order in gradients and Hessians. In Julia, an in-place function typically has an exclamation mark at the end of its name, and the cache as the first argument. In Optim.jl it has been the other way around for the argument order. We’ve changed that, and this means that you now have to provide “g” or “H” as the first argument, and “x” as the second. The old version

function g!(x, g)
    ... do something ...
end

is now

function g!(g, x)
    ... do something ...
end

NLSolversBase.jl
Since v0.7.0, we’ve moved some of the basic infrastructure of Optim.jl to NLSolversBase.jl. This is currently the Non-, Once-, and TwiceDifferentiable types and constructors. This is done to, as a first step, share code between Optim.jl and LineSearches.jl, and but also NLsolve.jl in the future. At the same time, we’ve made the code a little smarter, such that superfluous calls to the objective function, gradient, and Hessian are now avoided. As an example, compare the objective and gradient calls in the example in our readme. Here, we optimize the Rosenbrock “banana” function using BFGS. Since last version of Optim we had to change the output, as it has gone from 157 calls to 53. Much of this comes from this refactoring, but some of it also comes form a better choices for initial line search steps for BFGS and Newton introduced in #328.

As mentioned, we’ve made the *Differentiable-types a bit smarter, including moving the gradient and Hessian caches into the respective types. This also means, that a OnceDifferentiable type instance needs to know what the return type of the gradient is. This is done by providing an x seed in the constructor

rosenbrock = Optim.UnconstrainedProblems.examples["Rosenbrock"]
f = rosenbrock.f
g! = rosenbrock.g!
x_seed = rosenbrock.initial_x
od = OnceDifferentiable(f, g!, x_seed)

If the seed also happens to be the initial x, then you do not have to provide an x when calling optimize

julia> optimize(od, BFGS(), Optim.Options(g_tol=0.1))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [1.0005999613152214,1.001138415164852]
 * Minimizer: [1.0005999613152214,1.001138415164852]
 * Minimum: 7.427113e-07
 * Iterations: 13
 * Convergence: true
   * |x - x'| < 1.0e-32: false 
     |x - x'| = 1.08e-02 
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN 
   * |g(x)| < 1.0e-01: true 
     |g(x)| = 2.60e-02 
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 45
 * Gradient Calls: 45

If you’ve used Optim.jl before, you’ll notice that the output carries a bit more information about the convergence criteria.

LineSearches.jl turned Julian
Line searches used to be chosen using symbols in the method constructor for line search based methods such as GradientDescent, BFGS, and Newton by use of the linesearch keyword. The new version of LineSearches.jl uses types and dispatch exactly like Optim.jl does for solvers. This means that you now have to pass a type instance instead of a keyword, and this also means that we can open up for easy tweaking of line search parameters through fields in the line search types.

Let us illustrate by the following example how the new syntax works. First, we construct a BFGS instance without specifying the linesearch. This defaults to HagerZhang.

julia> rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock (generic function with 1 method)
 
julia> result = optimize(rosenbrock, zeros(2), BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926033423,0.9999999852005353]
 * Minimum: 5.471433e-17
 * Iterations: 16
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 3.47e-07
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 2.33e-09
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 53
 * Gradient Calls: 53

or we could choose a backtracking line search instead

 
julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking()))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926655744,0.9999999853309254]
 * Minimum: 5.379380e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 1.13e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 8.79e-11
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 31
 * Gradient Calls: 24

this defaults to cubic backtracking, but quadratic can be chosen using the order keyword

julia> optimize(rosenbrock, zeros(2), BFGS(linesearch = LineSearches.BackTracking(order = 2)))
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [0.9999999926644578,0.9999999853284671]
 * Minimum: 5.381020e-17
 * Iterations: 23
 * Convergence: true
   * |x - x'| < 1.0e-32: false
     |x - x'| = 4.73e-09
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
     |f(x) - f(x')| / |f(x)| = NaN
   * |g(x)| < 1.0e-08: true
     |g(x)| = 1.76e-10
   * stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 29
 * Gradient Calls: 24

LineSearches.jl should have better documentation coming soon, but the code is quite self-explanatory for those who want to twiddle around with these parameters.

The method state is now an argument to optimize
While not always that useful to know for users, we use method states internally to hold all the pre-allocated cache variables that are needed. In the new version of Optim.jl, this can be explicitly provided by the user such that you can retrieve various diagnostics after the optimization routine is done. One such example is the inverse Hessian estimate that BFGS spits out.

method = BFGS()
options = Optim.Options()
initial_x = rand(2)
d = OnceDifferentiable(f, g!, initial_x)
my_state = Optim.initial_state(method, options, d, initial_x)
optimize{d, method, options, my_state)

The future
We have more changes coming in the near future. There’s PR #356 for a Trust Region solver for cases where you can explicitly calculate Hessian-vector products without forming the Hessian (from @jeff-regier from the Celeste.jl project), the interior point replacement for our current barrier function approach to box constrained optimization in PR #303, and more.