Category Archives: Numerical Analysis

[NEWS] PIEZO1 regulation of collective keratinocyte migration

Welcome to read and cite our paper:

Jinghao Chen, Jesse R. Holt, Elizabeth L. Evans, John S. Lowengrub, Medha M. Pathak
PIEZO1 regulates leader cell formation and cellular coordination during collective keratinocyte migration.
PLOS Computational Biology (2024) 20(4): e1011855. https://doi.org/10.1371/journal.pcbi.1011855

Our paper is featured as the cover article for the April 2024 issue of the journal!

[NA] Newton’s Method Revisit

Some resources:

Here is a very nice introduction to Newton’s method. Solutions can be found here.
Here provides an example solving multivariable ODEs by Newton’s method.

[NA] Basic Approximation Theory

“Best” approximation function is based on the choice of norm.
Weierstrass approximation theory.
Existence and uniqueness of polynomial interpolation.
Natural basis interpretation: need to solve a linear system (matrix) which maybe ill-conditioned.
Lagrange interpolation: extra efforts to construct Lagrange polynomials but no matrix need to be solved (tradeoff).
Error of polynomial interpolant: = ep_n(f) (1+Lambda_n) (similar to output error in condition numbers, right?), where ep_n(f) is the error measurement of the best polynomial approximation (not necessarily the interpolant for sure!) and Lambda_n is Lebesgue constant (the max norm of Lebesgue function, which is the sum of absolute Lagrange polynomials). Note that Lambda_n is in the order of log(n), while ep_n(f) goes to 0 as n goes to infinity by Weierstrass, but the rate of convergence depends on the smoothness of the function f (more roughly speaking, we need to check the derivatives of f). So we can’t say anything about the convergence rate of polynomial interpolation error in general.
Error rate (error estimate) of polynomial interpolation (smaller than Taylor series approximation in general, since multiple points vs. single point), and we should control the derivatives (smoothness) to get the error bounds.
Convergence of polynomial approximation (w.r.t the degree n) is guaranteed.
The interpolation nodes can be smartly chosen to reduce the approximation error (comes from zeros of Chebyshev polynomials).
Shortcoming: adding a new interpolation point requires all the Lagrange polynomials to be recomputed.
Solution: Newton form using Newton’s divided differences, which are defined recursively, working as the coefficients of the linear combination of the Newton form basis.
Useful tool: divided difference table.
Geometric intuition: the divided differences are like the secant lines which parallel to a tangent line in 1D, and in general they are like the Taylor expansion coefficients but more precise since using multiple points instead of single one.
Question: what if we match not only the function values, but also the derivatives?
Answer: Hermite polynomials.
Overfitting: higher degree polynomials introduce more zeros, which means more oscillations. (“you couldn’t guarantee it wouldn’t look like this, run into this pathology.”)
Solutions: (1) don’t interpolate all the data points, i.e., approximation instead of interpolation, (2) interpolate all data points, but not with a single polynomial, i.e., piecewise interpolation (spline).
1st (linear) and 3rd (cubic) degree polynomials work (spline), 2nd degree (quadratic) the number of unknowns does not match the number of equations, unless we add a boundary one, which is ambiguous since we are usually not sure which side we would like to contain.
Basic rule: number of unknowns/coefficients should match the number of conditions/equations.
Cubic spline: 4n unknowns 4n-2 conditions, so we only need 2 more equations at 2 sides (boundaries).
2 standard conditions: natural conditions (zero 2nd order) or clamped conditions (given 1st order). They are corresponding to natural cubic splines and clamped cubic splines. Of course there are more than these 2.
In the absence of any information about function, we could simply use natural splines.
By carefully handling the coefficients and the consecutive width (knowns), there is a solvable linear system (although annoyingly large) containing (1) function values, (2) 1st order, (3) 2nd order, (4) boundary conditions.
Idea: substitute leading order coefficients d’s (unknown) and linear coefficients b’s (unknown) by quadratic coefficients c’s (unknown) and constant coefficients a’s (known, since they are exactly just function values at nodes). I.e., represent everything unknowns by c’s.
Method: make the linear coefficients left. Ax = b. A contains node width h’s, x contains quadratic coefficients c’s, b contains leading width h’s and leading order coefficients a’s. A is a diagonally dominant matrix, which indicates a unique solution since A is invertible.
The existence and uniqueness of the solution are guaranteed in both natural cubic spline and clamped cubic spline cases.
Natural cubic spline is the “smoothest” interpolation method.

[NA] Intro to Computer Arithmetic

Why do we need to care about computer arithmetic? Finite precision arithmetic can introduce unwanted errors. Number representation in computers is in a finite way: sign indicator, exponent, mantissa.

Types of truncation: chopping and rounding (more accurate). Some basic concepts: roundoff error, significant digit, machine epsilon.

Error propagation through both multiplication and division are just keeping epsilon level, while through addition/subtraction would possibly get into trouble. E.g., a-b, when a and b have the same sign and similar magnitudes. In this case, the exact result of the operation (subtraction) is a very small number, which takes the denominator of the relative error, while the numerator of the relative error is most likely in the level of machine epsilon. If their absolute difference is smaller than machine epsilon, the relative error would be a large number. (loss of accuracy)

There are ways to avoid this, which requires a clever algorithm design. For the same example, a-b trouble, we can easily multiply (a+b)/(a+b) and hopefully a^2-b^2 would help. This was used in solving quadratic equations, if a root meets this trouble, we usually rationalize the numerator (not denominator like usual algebra).

[NA] Intro to the Condition Number

Our goal is to track error propagation in a numerical algorithm. If we input x with an error, how large the error would the output y carry? Measuring these in the sense of relative errors, the concept “condition number” is exactly the scaling of the error that goes through the system, i.e., how much the error would be amplified, or thankfully how much it would be dampened. Formally speaking, the condition number is the change of relative error from inputs to outputs. A good example of an ill-conditioned algorithm is the addition/subtraction operations in a floating machine.

1D function condition number

1D to 1D case. How to understand the form xf’(x)/f(x) ? From the error analysis, we notice that numerator f’(x) comes from the Taylor expansion of f(x+dx), numerator x comes from the construction of x relative error (RHS), while denominator f(x) comes from the construction of y relative error (LHS). Because of this origination, special cases just remove the zero term(s), e.g., x=0 then cond f(x) = f’(x)/f(x), y = 0 then cond f(x) = xf’(x), x=y=0 then cond f(x) = f’(x). Notice that we no longer use relative error in these special cases (it is simply impossible) and use absolute error instead, if one or more relative error is undefined.

General function condition number

General mapping case. Similar works, x becomes xi, f’(x) becomes partial fj w.r.t partial xi evaluated at x, f(x) becomes fj(x), so basically just in a general form. Note that this is a general expression of the i-th row j-th column entry in a big matrix Gamma. We define our condition number by the matrix norm of this Gamma (so this gives us a lot of options).

This is mathematically beautiful, and captures information about the error propagation. However, the price is inconvenient. Computation mathematicians are not devout believers for the beauty of math, they have to compromise on practical engineering works. Hence, an alternative option, but a widely used one, comes to us.

So in the simplified version of condition number definition, x becomes the max norm of x, f’(x) becomes the max norm of the Jacobian matrix, f(x) becomes the max norm of f(x). It looks like we extract every piece from every single entry of the Gamma matrix out and rudely take their max norm. This way of definition will possibly miss some ill-conditioned situation, but this is the tradeoff of simplification.

Matrix condition number

Now let’s turn to the matrix condition number. Solving a linear system, say Ax=b, b is the input and x is the output. Nothing special, this is just a special case of the general mapping situation, since now our function f is linear transformation A inverse. The Jacobian matrix is exactly the A inverse, but what about the rest parts? Here the most funny thing happens: numerator x corresponds to b which equals to Ax, while the guy on the bottom now is x, going with a norm on everyone (not necessarily be max norm, just an arbitrary one), this is exactly in the form of matrix norm (more precisely, bounded by the vector norm induced matrix norm). Coupled with the A inverse we just mentioned, the matrix condition number is defined to be norm A times norm A inverse (so we can say they follow the same philosophy, but the definition of matrix condition number is not exactly derived from the definition of function condition number). Directly from the inequality of matrix norm, this condition number is bounded below by 1 (norm of identity, from A times A inverse) and equal to 1 if and only if A is identity matrix.

Note that this matrix condition number is independent of input b! This means no matter how large the error gets in with the input, the relative error of output would always obtain it with the same level of amplification. Simply because this is a special case, a linear problem with f(0)=0.

The end of the story goes with the example of the Hilbert matrix. Its condition number soars exponentially w.r.t the dimension. This means large dimensional data input would be a disaster, like what we mentioned at the beginning, the system will amplify the input error on a crazy scale, making this system an ill-conditioned one.

Algorithm condition number

The exact mathematical process is f, let the algorithm A be fA. We have an assumption that there exists a xA for each x s.t. fA(x) = f(xA), the closer , the more confidence. In fact, this is exactly the way we used to measure the condition number of an algorithm. The formal definition of a condition number of an algorithm is the relative error in xA normalized by machine epsilon. If the algorithm is well conditioned, the condition number would be close to 1, i.e., the distance between xA and x is around the machine epsilon times (the scaling of) the magnitude of x.

Error in the output

So in sum, what is the error in the output? Check the relative error in f*(xA) normalized by machine epsilon, the value is around cond(f) (1+cond(A)), i.e., there are 2 sources of amplification: from the algorithm and from the function itself.

[NA] Summary of Basic Numerical Methods

Methods for solving single variable equations (polynomial roots finding)

Bisection method
Fixed point iteration method
Newton’s method
Secant method

Methods for accelerating convergence

Aitken’s delta^2 method
Steffensen’s method

Methods for solving linear systems

Standard Gaussian elimination
Gaussian elimination with partial pivoting
Gaussian elimination with scaled partial pivoting
Gaussian elimination with complete (full) partial pivoting
Jacobi iteration method
Gauss-Seidel iteration method
Successive Over Relaxation (SOR) method

Methods for computing eigenvalues (spectral radius)

Standard power method
Inverse power method
Inverse power method + deflation method
Symmetric power method
Householder’s method + QR factorization method

Methods for polynomial Interpolation

Taylor expansion (single point, but can do derivatives)
Lagrange interpolation (multiple points)
Neville’s method (easy updating)
Newton’s divided-difference formula (generalized Taylor form)
Hermite interpolation (multiple points with derivatives)
Cubic spline interpolation (piecewise, to avoid oscillation)

Methods for numerical differentiation

Forward/backward difference formula (1st order accuracy)
Three-point endpoint/midpoint formula (2nd order accuracy)
Five-point endpoint/midpoint formula (4th order accuracy)
2nd derivative midpoint formula (2nd order accuracy)
* balancing the truncation error and round-off error
* Richardson’s extrapolation (a general way to enhance the accuracy)

Methods for numerical integration

Trapezoidal rule (3rd order accuracy)
Simpson’s rule (5th order accuracy)
Closed Newton-Cotes formulas (generalization)
Open Newton-Cotes formulas (generalization)
Composite numerical integration
Romberg integration (refinement)
Adaptive quadrature methods
Gaussian quadrature (with Legendre polynomials)

Methods for approximation

Least squares approximation (discrete/continuous, Chebyshev polynomials)
Rational function approximation
Trigonometric polynomial approximation

[NA] Basic Spectrum Methods

Standard Power Method:

Matrix iteration using infinity-norm. Need to normalize the vector everytime before next loop, and the (first) entry agrees with the infinity-norm is the approximation of eigenvalue with the largest magnitude (spectral radius of the matrix). Notice that this includes the sign.

Inverse Power Method:

Matrix iteration using infinity-norm. This matrix is not our original matrix A, instead, it’s the inverse of A-qI, where we hope to find the eigenvalue closest to q. This method converges faster than the standard one, and also can be treated as a modified method from the standard one by making p be the approximation of the spectral radius, e.g. using Rayleigh quotient. The final result you get should be 1/(lambda-q), and the expensive inverse operation is unnecessary: we have multiple choices of iteration techniques (e.g. standard Gaussian elimination, Gaussian elimination with partial/scaled partial/complete pivoting strategies, Jacobi, Gauss-Seidel, SOR, etc.) to solve a linear system in each step.

Symmetric Power Method:

Matrix iteration using 2-norm. Special methods for special cases, symmetric power method converges faster than the standard one. It uses 2-norm instead of infinity-norm, and making full use of Rayleigh quotient, i.e. normalize the vector everytime before next loop, then use that unit vector to construct a quadratic form, which is the approximation of eigenvalue with the largest magnitude (spectral radius of the matrix). The sign of this eigenvalue mainly depends on the definite of our matrix.

Deflation Method:

This technique helps us to find out other eigenvalues inductively, i.e. given k known dominant eigenvalues, we can compute the k+1 th dominant eigenvalue. A – (sum of) lambda x^t v, where x is the vector s.t. x^t v be an unit vector. However, this method may get roundoff error blows up, so usually coupled with IPM.

QR Factorization

This method is focusing on a class of more specialized matrices: tridiagonal matrices. Equipped with Householder method, we can also use QR factorization method to arbitrary symmetric matrices. Q represents an orthogonal matrix, while R represents an upper triangular matrix. Like we can do A1=Q1R1, then our A2:=R1Q1 and do factorization again for A2=Q2R2, etc. The sequence of {An} produced from this process is a sequence of similar matrices, with the larger n, the smaller magnitude of off-diagonal entries, and An approaches to the diagonalized A with diagonal entries approach to eigenvalues. The exact way to factorize A is using rotation matrices multiple times, and those matrices are orthogonal so their inverse are direct transpose.

* for MATH 105A Review, Ch 9