Thayer Watkins
Silicon Valley
& Tornado Alley

Second Order Conditions for Optimization:
Constrained and Unconstrained

The purpose of this reading is the derivation of the second order conditions for optimization so that the reader will not only understand what those second order conditions are but also why those are the proper conditions. For the unconstrained case the conditions are stated in terms of the matrix of second derivatives called the Hessian matrix. the Hessian matrix is intuitively understandable. the conditions for the constrained case can be easily stated in terms of a matrix called the bordered Hessian. Generation after generation of applied mathematics students have accepted the bordered Hessian without a clue as to why it is the relevant entity.

In order to fulfill the goal of providing an intuitive derivation of the second order conditions the one, two and three variable cases will be given first before moving to the general n variable case.

The One Variable Case

Although this case is so simple it seems almost trivial its analysis sets the stage for the more complex cases.

Let f(x) be a differentiable function. the quadratic approximation to f(x) near some point x0 is given by

f(x) = f(x0) + fx(x0)(x-x0) + ½fxx(x0)(x-x0)2
or, equivalently
df = fx(x0)dx + ½fxx(x0)(dx)2

where dx=(x-x0) and df=f(x)-f(x0).

A critical point is a value of x0 such that fx(x0)=0. If fx(x0)=0 then

df = ½fxx(x0)(dx)2

For a minimum df has to be positive and since (dx)2 is always positive it means that fxx(x0) must be positive. On the other hand for a maximum df has to be negative and that requires that fxx(x0) be negative. The Hessian matrix for this case is just the 1×1 matrix [fxx(x0)]. (Hereafter the point at which the second derivatives are evaluated will not be expressed explicitly so the Hessian matrix for this case would be said to be [fxx].

There is no corresponding constrained optimization problems for this one variable case.

The Two Variable Case

The quadratic approximation about the point (x0, y0) for a differentiable function f(x,y) is

df = fxdx + fydy + ½[fxxdxdx + fxydxdy + fyxdydx + fyydydy]
which can be expressed in the form

df = fxdx + fydy + ½(dx, dy)H(dx, dy)T

where (dx, dy)T is the column vector which is the transpose of the row vector (dx, dy) and

  | fxx    fxy |
H = |              |
  | fyx    fyy |


For the unconstrained case a critical point is one such that fx=0 and fy=0 so

df = ½(dx, dy)H(dx, dy)T

For a minimum the second order condition is that H be a positive definite matrix. The conditon for a matrix to be positive definite is that its principal minors all be positive. For a maximum, H must be a negative definite matrix which will be the case if the pincipal minors alternate in sign.

For the constrained case a critical point is defined in terms of the Lagrangian multiplier method. Suppose the constraint is

pxx + pyy = I
in which case
the first order conditions are
fx = λpx
fy = λpy

where λ, the Lagrangian multiplier, is chosen so as to have the critical values satisfy the constraint.

The quadratic approximation can now be expressed as

df = λpxdx + λpydy + ½(dx, dy)H(dx, dy)T

Since any deviations from the critical point must also satisfy the constraint

pxdx + pydy = 0


df = λ(pxdx + pydy) + ½(dx, dy)H(dx, dy)T
= ½(dx, dy)H(dx, dy)T

but now H does not have to be either positive or negative definite for an extreme (maximum or minimum) because dx and dy are not unrestricted; i.e., only dx and dy such that pxdx + pydy = 0 are allowed. This makes specifying the conditions on H very difficult. The further analysis of the constrained case will be postponed until after the consideration of the unconstrained case for three variables.

The Three Variable Case

For a trinary function f(x,y,z) the quadratic approximation of the deviations is

df = fxdx + fydy + fzdz + ½(dx, dy, dz)H(dx, dy, dz)T

where now H is given by

  | fxx    fxy    fxz |
H = | fyx    fyy    fyz |
  | fzx    fzy    fzz |


As in the one and two variable unconstrained case the first order terms vanish and the conditions for a minimum is the positive definiteness of H and similarly negative definiteness for the maximum. Those conditions in turn can be stated in terms of the signs of the principal minors of H. There is nothing new here for this case.

The significance of this case is that the constrained two variable case can be restated in terms of a three variable case through the use of the Lagrangian multiplier method.

In the Lagrangian multiplier method the optimization problem of minimizing f(x,y) with respect to x and y subject to the constraint pxx + pyy = I is transformed into a three variable problem of unconstrained minimizing

L(x,y,λ) = f(x,y) - λ(pxx + pyy - I)

with respect to x, y and λ.

The first order condition for λ is

Lλ = pxx + pyy - I = 0
which is equivalent to satisfying the constraint.

The quadratic approximation for L(x,y,λ) is then

dL = Lxdx + Lydy + Lλ
+ ½(dx, dy, dλ)H*(dx, dy, dλ)T
which on the basis
of the definition of L reduces to
dL = fxdx + fydy + (pxx + pyy - I)dλ + ½(dx, dy, dλ)H*(dx, dy, dλ)T
and for the first order conditions
dL = λpxdx + λpydy + (pxx + pyy - I)dλ
+ ½(dx, dy, dλ)H*(dx, dy, dλ)T

Because of the satisfication of the constraints the first two terms reduce to zero and likewise the third term. Thus the value of dL is given by

dL = + ½(dx, dy, dλ)H*(dx, dy, dλ)T

The second order conditons for the constrained two variable case then can be stated in terms of the Hessian H* for the corrsponding three variable case. The character of H* is given by first noting that

Lx = fx - λpx
Ly = fy - λpy
Lλ = -(pxx + pyy - I)

And thus the second derivatives are given as:

  | fxx    fxy    -px |
H* = | fyx    fyy    -py |
  | -px    -py    0   |


So this is the enigmatic bordered Hessian. The positive or negative definiteness of H* then constitutes the second order conditions for the constrained optimization problem.

HOME PAGE OF Thayer Watkins