Our goal is to track error propagation in a numerical algorithm. If we input x with an error, how large the error would the output y carry? Measuring these in the sense of relative errors, the concept “condition number” is exactly the scaling of the error that goes through the system, i.e., how much the error would be amplified, or thankfully how much it would be dampened. Formally speaking, the condition number is the change of relative error from inputs to outputs. A good example of an ill-conditioned algorithm is the addition/subtraction operations in a floating machine.
- 1D function condition number
1D to 1D case. How to understand the form xf’(x)/f(x) ? From the error analysis, we notice that numerator f’(x) comes from the Taylor expansion of f(x+dx), numerator x comes from the construction of x relative error (RHS), while denominator f(x) comes from the construction of y relative error (LHS). Because of this origination, special cases just remove the zero term(s), e.g., x=0 then cond f(x) = f’(x)/f(x), y = 0 then cond f(x) = xf’(x), x=y=0 then cond f(x) = f’(x). Notice that we no longer use relative error in these special cases (it is simply impossible) and use absolute error instead, if one or more relative error is undefined.
- General function condition number
General mapping case. Similar works, x becomes xi, f’(x) becomes partial fj w.r.t partial xi evaluated at x, f(x) becomes fj(x), so basically just in a general form. Note that this is a general expression of the i-th row j-th column entry in a big matrix Gamma. We define our condition number by the matrix norm of this Gamma (so this gives us a lot of options).
This is mathematically beautiful, and captures information about the error propagation. However, the price is inconvenient. Computation mathematicians are not devout believers for the beauty of math, they have to compromise on practical engineering works. Hence, an alternative option, but a widely used one, comes to us.
So in the simplified version of condition number definition, x becomes the max norm of x, f’(x) becomes the max norm of the Jacobian matrix, f(x) becomes the max norm of f(x). It looks like we extract every piece from every single entry of the Gamma matrix out and rudely take their max norm. This way of definition will possibly miss some ill-conditioned situation, but this is the tradeoff of simplification.
Now let’s turn to the matrix condition number. Solving a linear system, say Ax=b, b is the input and x is the output. Nothing special, this is just a special case of the general mapping situation, since now our function f is linear transformation A inverse. The Jacobian matrix is exactly the A inverse, but what about the rest parts? Here the most funny thing happens: numerator x corresponds to b which equals to Ax, while the guy on the bottom now is x, going with a norm on everyone (not necessarily be max norm, just an arbitrary one), this is exactly in the form of matrix norm (more precisely, bounded by the vector norm induced matrix norm). Coupled with the A inverse we just mentioned, the matrix condition number is defined to be norm A times norm A inverse (so we can say they follow the same philosophy, but the definition of matrix condition number is not exactly derived from the definition of function condition number). Directly from the inequality of matrix norm, this condition number is bounded below by 1 (norm of identity, from A times A inverse) and equal to 1 if and only if A is identity matrix.
Note that this matrix condition number is independent of input b! This means no matter how large the error gets in with the input, the relative error of output would always obtain it with the same level of amplification. Simply because this is a special case, a linear problem with f(0)=0.
The end of the story goes with the example of the Hilbert matrix. Its condition number soars exponentially w.r.t the dimension. This means large dimensional data input would be a disaster, like what we mentioned at the beginning, the system will amplify the input error on a crazy scale, making this system an ill-conditioned one.
- Algorithm condition number
The exact mathematical process is f, let the algorithm A be fA. We have an assumption that there exists a xA for each x s.t. fA(x) = f(xA), the closer , the more confidence. In fact, this is exactly the way we used to measure the condition number of an algorithm. The formal definition of a condition number of an algorithm is the relative error in xA normalized by machine epsilon. If the algorithm is well conditioned, the condition number would be close to 1, i.e., the distance between xA and x is around the machine epsilon times (the scaling of) the magnitude of x.
So in sum, what is the error in the output? Check the relative error in f*(xA) normalized by machine epsilon, the value is around cond(f) (1+cond(A)), i.e., there are 2 sources of amplification: from the algorithm and from the function itself.