The VMCON Algorithm
===================

VMCON aims to solve a nonlinearly constrained problem (general nonlinear programming problem),
specifically, minimising an objective function subject to several nonlinear constraints.


The Problem
-----------
Also called the *general non-linear programming problem*, we want to minimise an objective function (figure of merit), :math:`f(\vec{x})`, subject to some non-linear constraints:

.. math::
    c_i(\vec{x}) = 0, \quad i = 1,...,k

    c_i(\vec{x}) \geq 0, \quad i = k+1,...,m

where the objective and constraint functions are all :math:`n`-dimensional.

Several facts are worth noting about problem formulation:

1. To maximise some function :math:`g(\vec{x})` we would minimise the objective :math:`f(\vec{x}) = -g(\vec{x})`.
2. To constrain the solution such that :math:`h(\vec{x}) = a` we would apply the constraint :math:`h(\vec{x}) - a = 0`.
3. To constrain the solution such that :math:`h(\vec{x}) \geq a` we would apply the constraint :math:`h(\vec{x}) - a \geq 0`.
4. To constrain the solution such that :math:`h(\vec{x}) \leq 0` we would apply the constraint :math:`-h(\vec{x}) \geq 0`.
5. To constrain the solution such that :math:`h(\vec{x}) \leq a` we would apply the constraint :math:`a-h(\vec{x}) \geq 0`.

The Lagrangian
--------------
VMCON is an augmented Lagrangian solver which means it uses the Lagrange multipliers, :math:`\vec{\lambda}`, of some
solution to characterise the quality of the solution.

Of specific note is the Lagrangian function:

.. math::
    L(\vec{x}, \vec{\lambda}) = f(\vec{x}) - \sum_{i=1}^m \vec{\lambda}_ic_i(\vec{x})

and the derivative of the Lagrangian function, with respect to :math:`\vec{x}`:

.. math::
    \nabla_XL(\vec{x}, \vec{\lambda}) = \nabla f(\vec{x}) - \sum_{i=1}^m \vec{\lambda}_i \nabla c_i(\vec{x})
    :label: lagrangian-derivative

Initialisation of VMCON
-----------------------
VMCON is initialised with:

* The objective function to minimise, :math:`f(\vec{x})`, as described above.
* The constraints :math:`c_i(\vec{x}), i = 1,...,m`, as described above.
* An initial sample point :math:`\vec{x}_0`.
* :math:`\mathbf{B}`: the initial Hessian approximation matrix, usually the identity matrix.
* :math:`\epsilon`: the "user-supplied error tolerance".

It should be noted that :math:`\mathbf{B}` will need to be of dimension :math:`d \times d` where :math:`d = \mathrm{max}(n, m)`.

We also set the iteration number to 1, :math:`j=1`.


The Quadratic Programming Problem
---------------------------------
The Quadratic Programming Probelm (QPP) will also be known as the Quadratic Sub-Problem (QSP) because it forms only a part of the
VMCON algorithm--with the other half being the Augmented Lagrangian.

The QPP provides the search direction :math:`\delta_j` which is a vector upon which :math:`\vec{x}_j` will lay.
Solving the QPP also provides the Lagrange multipliers, :math:`\lambda_{j}`.

The quadratic program to be minimised on iteration :math:`j` is:

.. math::
    Q(\delta) = f(\vec{x}_{j-1}) + \delta^T\nabla f(\vec{x}_{j-1}) + \frac{1}{2}\delta^TB\delta

subject to

.. math::
    \nabla c_i(\vec{x}_{j-1})^T\delta + c_i(\vec{x}_{j-1}) = 0, \quad i=1,...,k

    \nabla c_i(\vec{x}_{j-1})^T\delta + c_i(\vec{x}_{j-1}) \ge 0, \quad i=k+1,...,m


The Convergence Test
--------------------
The convergence test is performed on the :math:`j`'th iteration after the QSP. The convergence test is the sum of two terms:

* The predicted change in magnitude of the objective function.
* The complimentary error; where the complimentary error being 0 would mean that a specific constraint is at equality or the Lagrange multipliers are 0.

This is encapsulated in the equation:

.. math::
    \lvert \nabla f(\vec{x}_{j-1})^T \cdot \delta_j \rvert + \sum^m_{i=1}\lvert \lambda_{j,i} c_i(\vec{x}_{j-1}) \rvert < \epsilon


The Line Search
---------------
The line search helps to mitigate poor initial conditions. It does this by searching in a line along the 'search direction' :math:`\delta` such that:

.. math::
    \vec{x}_j = \vec{x}_{j-1} + \alpha_j\vec{\delta}_j

:math:`\alpha` is found via the minimisation of:

.. math::
    \phi(\alpha) = f(\vec{x}_j) + \sum_{i=1}^k \vec{\mu}_{j,i}|c_i(\vec{x}_j)| + \sum_{i=k+1}^m \vec{\mu}_{j,i}|min(c_i(0, \vec{x}_j))|


On the :math:`j` th iteration,  :math:`\vec{\mu}_{j,i}` is a 1D vector which contains :math:`i = 1,...,m` elements.
On the first iteration:

.. math::
    \vec{\mu}_1 = |\vec{\lambda}_1|

On subsequent iterations:

.. math::
    \vec{\mu}_j = max[|\vec{\lambda}_0|, \frac{1}{2}(\vec{\mu}_{j-1} + |\vec{\lambda}_j|)]

The line search iterates for a maximum of 10 steps and exits if the chosen value of :math:`\alpha` satisfies either the Armijo condition:

.. math::
    \phi(\alpha) \leq \phi(0) + 0.1\alpha(\phi(1) - \phi(0))

or the so-called Kovari condition, which was an ad-hoc break condition in the PROCESS implementation of VMCON, therefore does not appear in the paper:

.. math::
    \phi(\alpha) > \phi(0)

Once the line search exits, we have found our optimal value and :math:`\alpha_j = \alpha`.

On each iteration of the line search, we revise :math:`\alpha` using a quadratic approximation:

.. math::
    \alpha = min\left(0.1\alpha, \frac{-\alpha^2}{\phi(\alpha) - \phi(0) - \alpha(\phi(1) - \phi(0))}\right)


The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Quasi-Newton Update
---------------------------------------------------------------
The final stage of an iteration of the VMCON optimiser is to update the Hessian approximation via a BFGS update.

For an unconstrained problem, we use the following differences to update :math:`\mathbf{B}`:

.. math::
    \vec{\xi} = \vec{x}_j - \vec{x}_{j-1}

.. math::
    \vec{\gamma} = \nabla_XL(\vec{x}_j, \vec{\lambda}_j) - \nabla_XL(\vec{x}_{j-1}, \vec{\lambda}_j)

which is calculated using :eq:`lagrangian-derivative`.

Since we have a constrained problem, we define a further quantity:

.. math::
    \vec{\eta} = \theta\vec{\gamma} + (1-\theta)\mathbf{B}\vec{\xi}

where

.. math::
    \theta = \begin{cases}
        1 ,& \text{if } \vec{\xi}^T\vec{\gamma} \geq 0.2\vec{\xi}^T\mathbf{B}\vec{\xi}\\
        \frac{0.8\vec{\xi}^T\mathbf{B}\vec{\xi}}{\vec{\xi}^T\mathbf{B}\vec{\xi} - \vec{\xi}^T\vec{\gamma}},& \text{otherwise}
    \end{cases}


The definition of :math:`\vec{\eta}` ensures :math:`\mathbf{B}` remains positive semi-definite, which is a prereqesite to solving the QSP.

We can then perform the BFGS update:

.. math::
    \mathbf{B_{NEW}} = \mathbf{B} - \frac{\mathbf{B}\vec{\xi}\vec{\xi}^T\mathbf{B}}{\vec{\xi}^T\mathbf{B}\vec{\xi}} + \frac{ \vec{\eta} \vec{\eta}^T}{\vec{\xi}^T\vec{\eta}}


The VMCON Algorithm
-------------------
This page covers the mathematics and theory behind the VMCON algorithm. For completeness, the following flow diagram demonstrates
how the algorithm is implemented at a high level.

.. mermaid::

    flowchart
        setup("Initialisation of VMCON") --> j1("j = 1")
        j1 --> qsp("The Quadratic Programming Problem (Lagrange multipliers and search direction)")
        qsp --> convergence_test(["Convergence criterion met?"])
        convergence_test -- "Yes" --> exit[["Exit"]]
        convergence_test -- "No" --> linesearch("Line search (next evaluation point)")
        linesearch --> bfgs("BFGS update")
        bfgs --> incrementj("j = j + 1")
        incrementj --> qsp