The VMCON Algorithm¶

VMCON aims to solve a nonlinearly constrained problem (general nonlinear programming problem), specifically, minimising an objective function subject to several nonlinear constraints.

The Problem¶

Also called the general non-linear programming problem, we want to minimise an objective function (figure of merit), $f (\vec{x})$ , subject to some non-linear constraints:

\begin{array}{r} \begin{matrix} c_{i} (\vec{x}) = 0, i = 1, . . ., k \\ c_{i} (\vec{x}) \geq 0, i = k + 1, . . ., m \end{matrix} \end{array}

where the objective and constraint functions are all $n$ -dimensional.

Several facts are worth noting about problem formulation:

To maximise some function $g (\vec{x})$ we would minimise the objective $f (\vec{x}) = - g (\vec{x})$ .
To constrain the solution such that $h (\vec{x}) = a$ we would apply the constraint $h (\vec{x}) - a = 0$ .
To constrain the solution such that $h (\vec{x}) \geq a$ we would apply the constraint $h (\vec{x}) - a \geq 0$ .
To constrain the solution such that $h (\vec{x}) \leq 0$ we would apply the constraint $- h (\vec{x}) \geq 0$ .
To constrain the solution such that $h (\vec{x}) \leq a$ we would apply the constraint $a - h (\vec{x}) \geq 0$ .

The Lagrangian¶

VMCON is an augmented Lagrangian solver which means it uses the Lagrange multipliers, $\vec{λ}$ , of some solution to characterise the quality of the solution.

Of specific note is the Lagrangian function:

L (\vec{x}, \vec{λ}) = f (\vec{x}) - \sum_{i = 1}^{m} {\vec{λ}}_{i} c_{i} (\vec{x})

and the derivative of the Lagrangian function, with respect to $\vec{x}$ :

(1)¶

\nabla_{X} L (\vec{x}, \vec{λ}) = \nabla f (\vec{x}) - \sum_{i = 1}^{m} {\vec{λ}}_{i} \nabla c_{i} (\vec{x})

Initialisation of VMCON¶

VMCON is initialised with:

The objective function to minimise, $f (\vec{x})$ , as described above.
The constraints $c_{i} (\vec{x}), i = 1, . . ., m$ , as described above.
An initial sample point ${\vec{x}}_{0}$ .
$B$ : the initial Hessian approximation matrix, usually the identity matrix.
$ϵ$ : the “user-supplied error tolerance”.

It should be noted that $B$ will need to be of dimension $d \times d$ where $d = \max (n, m)$ .

We also set the iteration number to 1, $j = 1$ .

The Quadratic Programming Problem¶

The Quadratic Programming Probelm (QPP) will also be known as the Quadratic Sub-Problem (QSP) because it forms only a part of the VMCON algorithm–with the other half being the Augmented Lagrangian.

The QPP provides the search direction $δ_{j}$ which is a vector upon which ${\vec{x}}_{j}$ will lay. Solving the QPP also provides the Lagrange multipliers, $λ_{j}$ .

The quadratic program to be minimised on iteration $j$ is:

Q (δ) = f ({\vec{x}}_{j - 1}) + δ^{T} \nabla f ({\vec{x}}_{j - 1}) + \frac{1}{2} δ^{T} B δ

subject to

\begin{array}{r} \begin{matrix} \nabla c_{i} ({\vec{x}}_{j - 1})^{T} δ + c_{i} ({\vec{x}}_{j - 1}) = 0, i = 1, . . ., k \\ \nabla c_{i} ({\vec{x}}_{j - 1})^{T} δ + c_{i} ({\vec{x}}_{j - 1}) \geq 0, i = k + 1, . . ., m \end{matrix} \end{array}

The Convergence Test¶

The convergence test is performed on the $j$ ’th iteration after the QSP. The convergence test is the sum of two terms:

The predicted change in magnitude of the objective function.
The complimentary error; where the complimentary error being 0 would mean that a specific constraint is at equality or the Lagrange multipliers are 0.

This is encapsulated in the equation:

| \nabla f ({\vec{x}}_{j - 1})^{T} \cdot δ_{j} | + \sum_{i = 1}^{m} | λ_{j, i} c_{i} ({\vec{x}}_{j - 1}) | < ϵ

The Line Search¶

The line search helps to mitigate poor initial conditions. It does this by searching in a line along the ‘search direction’ $δ$ such that:

{\vec{x}}_{j} = {\vec{x}}_{j - 1} + α_{j} {\vec{δ}}_{j}

$α$ is found via the minimisation of:

ϕ (α) = f ({\vec{x}}_{j}) + \sum_{i = 1}^{k} {\vec{μ}}_{j, i} | c_{i} ({\vec{x}}_{j}) | + \sum_{i = k + 1}^{m} {\vec{μ}}_{j, i} | m i n (c_{i} (0, {\vec{x}}_{j})) |

On the $j$ th iteration, ${\vec{μ}}_{j, i}$ is a 1D vector which contains $i = 1, . . ., m$ elements. On the first iteration:

{\vec{μ}}_{1} = | {\vec{λ}}_{1} |

On subsequent iterations:

{\vec{μ}}_{j} = m a x [| {\vec{λ}}_{0} |, \frac{1}{2} ({\vec{μ}}_{j - 1} + | {\vec{λ}}_{j} |)]

The line search iterates for a maximum of 10 steps and exits if the chosen value of $α$ satisfies either the Armijo condition:

ϕ (α) \leq ϕ (0) + 0.1 α (ϕ (1) - ϕ (0))

or the so-called Kovari condition, which was an ad-hoc break condition in the PROCESS implementation of VMCON, therefore does not appear in the paper:

ϕ (α) > ϕ (0)

Once the line search exits, we have found our optimal value and $α_{j} = α$ .

On each iteration of the line search, we revise $α$ using a quadratic approximation:

α = m i n (0.1 α, \frac{- α^{2}}{ϕ (α) - ϕ (0) - α (ϕ (1) - ϕ (0))})

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Quasi-Newton Update¶

The final stage of an iteration of the VMCON optimiser is to update the Hessian approximation via a BFGS update.

For an unconstrained problem, we use the following differences to update $B$ :

\vec{ξ} = {\vec{x}}_{j} - {\vec{x}}_{j - 1}

\vec{γ} = \nabla_{X} L ({\vec{x}}_{j}, {\vec{λ}}_{j}) - \nabla_{X} L ({\vec{x}}_{j - 1}, {\vec{λ}}_{j})

which is calculated using (1).

Since we have a constrained problem, we define a further quantity:

\vec{η} = θ \vec{γ} + (1 - θ) B \vec{ξ}

where

\begin{array}{r} θ = {\begin{cases} 1, & if {\vec{ξ}}^{T} \vec{γ} \geq 0.2 {\vec{ξ}}^{T} B \vec{ξ} \\ \frac{0.8 {\vec{ξ}}^{T} B \vec{ξ}}{{\vec{ξ}}^{T} B \vec{ξ} - {\vec{ξ}}^{T} \vec{γ}}, & otherwise \end{cases} \end{array}

The definition of $\vec{η}$ ensures $B$ remains positive semi-definite, which is a prereqesite to solving the QSP.

We can then perform the BFGS update:

B_{NEW} = B - \frac{B \vec{ξ} {\vec{ξ}}^{T} B}{{\vec{ξ}}^{T} B \vec{ξ}} + \frac{\vec{η} {\vec{η}}^{T}}{{\vec{ξ}}^{T} \vec{η}}