Second Order Conditions for Optimization, Constrained and Unconstrained: The Hessian and Bordered Hessian Matrix

applet-magic.com Thayer Watkins Silicon Valley & Tornado Alley USA

Second Order Conditions for Optimization:
Constrained and Unconstrained

The purpose of this reading is the derivation of the second order conditions for optimization so that the reader will not only understand what those second order conditions are but also why those are the proper conditions. For the unconstrained case the conditions are stated in terms of the matrix of second derivatives called the Hessian matrix. the Hessian matrix is intuitively understandable. the conditions for the constrained case can be easily stated in terms of a matrix called the bordered Hessian. Generation after generation of applied mathematics students have accepted the bordered Hessian without a clue as to why it is the relevant entity.

In order to fulfill the goal of providing an intuitive derivation of the second order conditions the one, two and three variable cases will be given first before moving to the general n variable case.

The One Variable Case

Although this case is so simple it seems almost trivial its analysis sets the stage for the more complex cases.
Let f(x) be a differentiable function. the quadratic approximation to f(x) near some point x₀ is given by

f(x) = f(x₀) + f_x(x₀)(x-x₀) + ½f_xx(x₀)(x-x₀)²
or, equivalently
df = f_x(x₀)dx + ½f_xx(x₀)(dx)²

where dx=(x-x₀) and df=f(x)-f(x₀).
A critical point is a value of x₀ such that f_x(x₀)=0. If f_x(x₀)=0 then

df = ½f_xx(x₀)(dx)²

For a minimum df has to be positive and since (dx)² is always positive it means that f_xx(x₀) must be positive. On the other hand for a maximum df has to be negative and that requires that f_xx(x₀) be negative. The Hessian matrix for this case is just the 1×1 matrix [f_xx(x₀)]. (Hereafter the point at which the second derivatives are evaluated will not be expressed explicitly so the Hessian matrix for this case would be said to be [f_xx].
There is no corresponding constrained optimization problems for this one variable case.
The Two Variable Case

The quadratic approximation about the point (x₀, y₀) for a differentiable function f(x,y) is

df = f_xdx + f_ydy + ½[f_xxdxdx + f_xydxdy + f_yxdydx + f_yydydy]
which can be expressed in the form

df = f_xdx + f_ydy + ½(dx, dy)H(dx, dy)^T

where (dx, dy)^T is the column vector which is the transpose of the row vector (dx, dy) and

| f_xx    f_xy |

H = |              |

| f_yx    f_yy |

For the unconstrained case a critical point is one such that f_x=0 and f_y=0 so

df = ½(dx, dy)H(dx, dy)^T

For a minimum the second order condition is that H be a positive definite matrix. The conditon for a matrix to be positive definite is that its principal minors all be positive. For a maximum, H must be a negative definite matrix which will be the case if the pincipal minors alternate in sign.
For the constrained case a critical point is defined in terms of the Lagrangian multiplier method. Suppose the constraint is

p_xx + p_yy = I
in which case
the first order conditions are
f_x = λp_x
f_y = λp_y

where λ, the Lagrangian multiplier, is chosen so as to have the critical values satisfy the constraint.
The quadratic approximation can now be expressed as

df = λp_xdx + λp_ydy + ½(dx, dy)H(dx, dy)^T

Since any deviations from the critical point must also satisfy the constraint

p_xdx + p_ydy = 0

therefore

df = λ(p_xdx + p_ydy) + ½(dx, dy)H(dx, dy)^T
= ½(dx, dy)H(dx, dy)^T

but now H does not have to be either positive or negative definite for an extreme (maximum or minimum) because dx and dy are not unrestricted; i.e., only dx and dy such that p_xdx + p_ydy = 0 are allowed. This makes specifying the conditions on H very difficult. The further analysis of the constrained case will be postponed until after the consideration of the unconstrained case for three variables.
The Three Variable Case

For a trinary function f(x,y,z) the quadratic approximation of the deviations is

df = f_xdx + f_ydy + f_zdz + ½(dx, dy, dz)H(dx, dy, dz)^T

where now H is given by

| f_xx    f_xy    f_xz |

H = | f_yx    f_yy    f_yz |

| f_zx    f_zy    f_zz |

As in the one and two variable unconstrained case the first order terms vanish and the conditions for a minimum is the positive definiteness of H and similarly negative definiteness for the maximum. Those conditions in turn can be stated in terms of the signs of the principal minors of H. There is nothing new here for this case.
The significance of this case is that the constrained two variable case can be restated in terms of a three variable case through the use of the Lagrangian multiplier method.
In the Lagrangian multiplier method the optimization problem of minimizing f(x,y) with respect to x and y subject to the constraint p_xx + p_yy = I is transformed into a three variable problem of unconstrained minimizing

L(x,y,λ) = f(x,y) - λ(p_xx + p_yy - I)

with respect to x, y and λ.
The first order condition for λ is

L_λ = p_xx + p_yy - I = 0
which is equivalent to satisfying the constraint.

The quadratic approximation for L(x,y,λ) is then

dL = L_xdx + L_ydy + L_λdλ
+ ½(dx, dy, dλ)H*(dx, dy, dλ)^T
which on the basis
of the definition of L reduces to
dL = f_xdx + f_ydy + (p_xx + p_yy - I)dλ + ½(dx, dy, dλ)H*(dx, dy, dλ)^T
and for the first order conditions
dL = λp_xdx + λp_ydy + (p_xx + p_yy - I)dλ
+ ½(dx, dy, dλ)H*(dx, dy, dλ)^T

Because of the satisfication of the constraints the first two terms reduce to zero and likewise the third term. Thus the value of dL is given by

dL = + ½(dx, dy, dλ)H*(dx, dy, dλ)^T

The second order conditons for the constrained two variable case then can be stated in terms of the Hessian H* for the corrsponding three variable case. The character of H* is given by first noting that

L_x = f_x - λp_x
L_y = f_y - λp_y
L_λ = -(p_xx + p_yy - I)

And thus the second derivatives are given as:

| f_xx    f_xy    -p_x |

H* = | f_yx    f_yy    -p_y |

| -p_x    -p_y    0 |

So this is the enigmatic bordered Hessian. The positive or negative definiteness of H* then constitutes the second order conditions for the constrained optimization problem.

HOME PAGE OF applet-magic.com
HOME PAGE OF Thayer Watkins

The One Variable Case

f(x) = f(x0) + fx(x0)(x-x0) + ½fxx(x0)(x-x0)2 or, equivalently df = fx(x0)dx + ½fxx(x0)(dx)2

df = ½fxx(x0)(dx)2

The Two Variable Case

df = fxdx + fydy + ½[fxxdxdx + fxydxdy + fyxdydx + fyydydy] which can be expressed in the form df = fxdx + fydy + ½(dx, dy)H(dx, dy)T

| fxx fxy | H = | | | fyx fyy |

df = ½(dx, dy)H(dx, dy)T

pxx + pyy = I in which case the first order conditions are fx = λpx fy = λpy

df = λpxdx + λpydy + ½(dx, dy)H(dx, dy)T

pxdx + pydy = 0

df = λ(pxdx + pydy) + ½(dx, dy)H(dx, dy)T = ½(dx, dy)H(dx, dy)T

The Three Variable Case

df = fxdx + fydy + fzdz + ½(dx, dy, dz)H(dx, dy, dz)T

| fxx fxy fxz | H = | fyx fyy fyz | | fzx fzy fzz |

L(x,y,λ) = f(x,y) - λ(pxx + pyy - I)

Lλ = pxx + pyy - I = 0 which is equivalent to satisfying the constraint.

dL = Lxdx + Lydy + Lλdλ + ½(dx, dy, dλ)H*(dx, dy, dλ)T which on the basis of the definition of L reduces to dL = fxdx + fydy + (pxx + pyy - I)dλ + ½(dx, dy, dλ)H*(dx, dy, dλ)T and for the first order conditions dL = λpxdx + λpydy + (pxx + pyy - I)dλ + ½(dx, dy, dλ)H*(dx, dy, dλ)T

dL = + ½(dx, dy, dλ)H*(dx, dy, dλ)T

Lx = fx - λpx Ly = fy - λpy Lλ = -(pxx + pyy - I)

| fxx fxy -px | H* = | fyx fyy -py | | -px -py 0 |

f(x) = f(x₀) + f_x(x₀)(x-x₀) + ½f_xx(x₀)(x-x₀)²
or, equivalently
df = f_x(x₀)dx + ½f_xx(x₀)(dx)²

df = ½f_xx(x₀)(dx)²

df = f_xdx + f_ydy + ½[f_xxdxdx + f_xydxdy + f_yxdydx + f_yydydy]
which can be expressed in the form

df = f_xdx + f_ydy + ½(dx, dy)H(dx, dy)^T

| f_xx f_xy |

H = | |

| f_yx f_yy |

df = ½(dx, dy)H(dx, dy)^T

p_xx + p_yy = I
in which case
the first order conditions are
f_x = λp_x
f_y = λp_y

df = λp_xdx + λp_ydy + ½(dx, dy)H(dx, dy)^T

p_xdx + p_ydy = 0

df = λ(p_xdx + p_ydy) + ½(dx, dy)H(dx, dy)^T
= ½(dx, dy)H(dx, dy)^T

df = f_xdx + f_ydy + f_zdz + ½(dx, dy, dz)H(dx, dy, dz)^T

| f_xx f_xy f_xz |

H = | f_yx f_yy f_yz |

| f_zx f_zy f_zz |

L(x,y,λ) = f(x,y) - λ(p_xx + p_yy - I)

L_λ = p_xx + p_yy - I = 0
which is equivalent to satisfying the constraint.

dL = + ½(dx, dy, dλ)H*(dx, dy, dλ)^T

L_x = f_x - λp_x
L_y = f_y - λp_y
L_λ = -(p_xx + p_yy - I)

| f_xx f_xy -p_x |

H* = | f_yx f_yy -p_y |

| -p_x -p_y 0 |