LU Implementation

Now, it’s time to implement the algorithm itself. We’ll go step by step. First, we’ll define a function with an appropriate name, which takes two parameters: \(\mathbf{A}\), the input matrix, and \(\epsilon\), a small number that will serve as our zero threshold. Any value smaller than \(\epsilon\) will be treated as zero.

Python

def lu_decomposition(A, epsilon: float = 1e-10):

def lu_decomposition(A, epsilon: float = 1e-10):

Next, we need to check some properties of \(\mathbf{A}\), since not every matrix can be decomposed into \(\mathbf{LU}\). Here’s a list of the requirements \(\mathbf{A}\) must satisfy:

Shape: The array A must be square, i.e., it should have dimensions \(n \times n\).
Dimensionality: A must be a 2-dimensional array.
Non-emptiness: A cannot be empty; it must contain at least one element.
Data Type: The elements of A must be of a type that can be converted to a numeric numpy array.

We will raise the appropriate errors if these conditions are not met. It’s always a good idea to foolproof your code. (The checks are included at the end of the post in the full code.)

Now, let’s dive into the logic. We’ll begin by creating the necessary matrices: \(\mathbf{P}\), \(\mathbf{U}\), and \(\mathbf{L}\). We’ll copy \(\mathbf{A}\) into \(\mathbf{U}\) to preserve the original matrix during elimination, while \(\mathbf{L}\) will initially be set as the identity matrix.

Rather than repeatedly modifying a full \(\mathbf{P}\) matrix—which would be computationally expensive—we’ll use an index array to track the row swaps. These indices will be updated during the process, and once elimination is complete, we’ll construct \(\mathbf{P}\) from this array.

Python

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)

When implementing LU decomposition, choosing the right pivot is crucial. This is because we divide by the pivot when calculating the multipliers. If the pivot is a small number, the resulting multiplier can become very large, which can make the algorithm unstable.

To avoid this, we select the largest number in the current column as the pivot. This ensures we’re dividing by a larger value, preventing the numbers from exploding. We begin at position (\(k, k\)) which serves as our starting pivot. For the current column \(k\), we look for the largest value by examining each entry below the pivot. To make the comparison fair, we take the absolute value of each entry. Then, we find the index (i.e., the row) of the largest number, so we can swap it with the pivot if necessary—if there’s a larger value.

We’ll use the argmax function to find the index of the largest value. However, since we’re only examining the window from the pivot downward (pivot included), the index returned will be relative to that window, not the entire matrix.

To convert it to a global index, we need to add the offset \(k\), which is the number of rows above the pivot. For example, let’s say we’re working with the third pivot at position (2,2). If the largest value is directly below it, the argmax function will return 1, indicating the second row of the window (with the first row being the pivot). However, this index is incorrect—the actual global index should be 2 + 1 = 3. We need to add the offset to account for the rows above the pivot.

Python

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")

Notice the final check? If the resulting number is zero, or so close to zero that we consider it negligible, this means every number in the column is effectively zero, as there’s no larger absolute value. In this case, the matrix is singular, and we raise the appropriate exception.

So far, we’ve only identified the index of the largest number; now it’s time to actually swap the rows. If a row swap is needed (i.e., the pivot index is different from the current row), we swap the corresponding rows in all relevant matrices: \(\mathbf{L}\), \(\mathbf{U}\), and the array representing \(\mathbf{P}\).

When swapping rows in \(\mathbf{U}\) during partial pivoting, we also swap the corresponding entries in \(\mathbf{L}\), but only in the columns that have already been filled. This ensures that \(\mathbf{L}\) stays consistent with the new row order of \(\mathbf{U}\), so that at the end, \(\mathbf{PA} = \mathbf{LU} \) still holds.

Python

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")
        
        if pivot_row != k:
            perm[k], perm[pivot_row] = perm[pivot_row], perm[k]
            U[[k, pivot_row], :] = U[[pivot_row, k], :]
            L[[k, pivot_row], :k] = L[[pivot_row, k], :k]

def lu_decomposition(A, epsilon: float = 1e-10): 

    m, n = A.shape

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")
        
        if pivot_row != k:
            perm[k], perm[pivot_row] = perm[pivot_row], perm[k]
            U[[k, pivot_row], :] = U[[pivot_row, k], :]
            L[[k, pivot_row], :k] = L[[pivot_row, k], :k]

Finally, we eliminate all entries below the pivot. For each row beneath the pivot row, we compute the corresponding multiplier \(l_{ik}\), subtract \(l_{ik}\) times the pivot row from the current row, and store \(l_{ik}\) in the matrix \(\mathbf{L}\).
To improve efficiency and numerical stability, we skip any row where the entry below the pivot is already zero or sufficiently small (i.e., less than \(\epsilon\)). Now we have the complete code (including the checks for the properties of \(\mathbf{A}\)):

Python

def lu_decomposition(A, epsilon: float = 1e-10): 

    try:
        A = np.asarray(A, dtype=float)
    except ValueError as e:
        raise TypeError(f"Input could not be converted to a numeric array. Got error: {e}")
    
    if A.ndim != 2:
        raise ValueError(f"Expected 2D array, got {A.ndim}D")

    if A.size < 1:
        raise ValueError("Matrix A cannot be empty. Please provide a non-empty matrix.")

    m, n = A.shape
    
    if m != n:
        raise ValueError(f"Expected square matrix, got shape {A.shape}")

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")
        
        if pivot_row != k:
            perm[k], perm[pivot_row] = perm[pivot_row], perm[k]
            U[[k, pivot_row], :] = U[[pivot_row, k], :]
            L[[k, pivot_row], :k] = L[[pivot_row, k], :k]
        
        # Eliminate below pivot
        for i in range(k + 1, n):
            factor = U[i, k] / U[k, k]
            if abs(U[i, k]) <= epsilon:
                continue
            L[i, k] = factor
            U[i, :] -= factor * U[k, :]

    P = np.eye(n)[perm]
    return P, L, U

def lu_decomposition(A, epsilon: float = 1e-10): 

    try:
        A = np.asarray(A, dtype=float)
    except ValueError as e:
        raise TypeError(f"Input could not be converted to a numeric array. Got error: {e}")
    
    if A.ndim != 2:
        raise ValueError(f"Expected 2D array, got {A.ndim}D")

    if A.size < 1:
        raise ValueError("Matrix A cannot be empty. Please provide a non-empty matrix.")

    m, n = A.shape
    
    if m != n:
        raise ValueError(f"Expected square matrix, got shape {A.shape}")

    # Initialize U as a copy of A, L as identity, and a vector for row swaps
    U = A.copy()
    L = np.eye(n, dtype=float)
    perm = np.arange(n)
    
    for k in range(n):
        # Partial pivoting: find row with max |U[i, k]| for i >= k
        pivot_row = k + np.argmax(np.abs(U[k:, k]))
        if np.abs(U[pivot_row, k]) < epsilon:
            raise np.linalg.LinAlgError("The matrix is singular and cannot be decomposed via LU.")
        
        if pivot_row != k:
            perm[k], perm[pivot_row] = perm[pivot_row], perm[k]
            U[[k, pivot_row], :] = U[[pivot_row, k], :]
            L[[k, pivot_row], :k] = L[[pivot_row, k], :k]
        
        # Eliminate below pivot
        for i in range(k + 1, n):
            factor = U[i, k] / U[k, k]
            if abs(U[i, k]) <= epsilon:
                continue
            L[i, k] = factor
            U[i, :] -= factor * U[k, :]

    P = np.eye(n)[perm]
    return P, L, U

Do not forget to return the respective matrices. Note that we construct \(\mathbf{P}\) at the very end using the permutation list — essentially by permuting the identity matrix. This way, we create the entire \(\mathbf{P}\) in one go.

Example Run

We’ll use the following matrix A, which requires a row swap to bring the largest pivot into position:

\(
\begin{bmatrix}
1 & 6 \\
2 & 4\\
\end{bmatrix}
\)

The example code corresponds to:

Python

A = np.array([[1,6], [2,4]])
P,L,U = lu_decomposition(A)

print("P:\n", P)
print("L:\n", L)
print("U:\n", U)
print(np.allclose(P @ A, L @ U))

A = np.array([[1,6], [2,4]])
P,L,U = lu_decomposition(A)

print("P:\n", P)
print("L:\n", L)
print("U:\n", U)
print(np.allclose(P @ A, L @ U))

Notice that we also check whether \(\mathbf{PA} = \mathbf{LU}\). The results are what we expected:

Terminal

P:
 [[0. 1.]
 [1. 0.]]
L:
 [[1.  0. ]
 [0.5 1. ]]
U:
 [[2. 4.]
 [0. 4.]]
True

P:
 [[0. 1.]
 [1. 0.]]
L:
 [[1.  0. ]
 [0.5 1. ]]
U:
 [[2. 4.]
 [0. 4.]]
True

What’s Not in the Code

This is by no means the best implementation. For example, we don’t distinguish between symmetric, sparse, or band matrices—cases where certain tricks can make the algorithm much more efficient. (If the matrix is symmetric, you only need to store half of it.) Also, because of Python’s overhead, this code is quite slow compared to implementations that use LAPACK under the hood. Libraries like numpy.linalg or scipy.linalg do use LAPACK, so you should prefer those whenever possible.

KADOMIN