<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Linear systems – iterative methods | Nicholas Hu</title>
    <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/</link>
      <atom:link href="https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/index.xml" rel="self" type="application/rss+xml" />
    <description>Linear systems – iterative methods</description>
    <generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-ca</language><lastBuildDate>Mon, 16 Jun 2025 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://www.math.ucla.edu/~njhu/media/icon_hu_d46824b1c45312fd.png</url>
      <title>Linear systems – iterative methods</title>
      <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/</link>
    </image>
    
    <item>
      <title>Linear stationary iterative methods</title>
      <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/lsiter/</link>
      <pubDate>Mon, 13 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/lsiter/</guid>
      <description>&lt;div class=&#34;btn-links mb-3&#34;&gt;
&lt;a class=&#34;btn btn-outline-primary btn-page-header btn-sm&#34; href=&#34;../lsiter.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;
  PDF
&lt;/a&gt;
&lt;/div&gt;
&lt;!--
No newlines allowed between $$&#39;s below!
--&gt;
&lt;div style=&#34;display: none;&#34;&gt;
$$
%% Sets and functions %%
\newcommand{\set}[1]{\{ #1 \}}
\newcommand{\Set}[1]{\left \{ #1 \right\}}
\renewcommand{\emptyset}{\varnothing}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rn}{\mathbb{R}^n}
\newcommand{\Rm}{\mathbb{R}^m}
\newcommand{\C}{\mathbb{C}}
\newcommand{\F}{\mathbb{F}}
%% Linear algebra %%
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\Abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\inner}[2]{\langle #1, #2 \rangle}
\newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle}
\newcommand{\norm}[1]{\lVert #1 \rVert}
\newcommand{\Norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\trans}{{\top}}
\newcommand{\span}{\mathop{\mathrm{span}}}
\newcommand{\im}{\mathop{\mathrm{im}}}
\newcommand{\ker}{\mathop{\mathrm{ker}}}
\newcommand{\rank}{\mathop{\mathrm{rank}}}
%% Colours %%
\definecolor{cblue}{RGB}{31, 119, 180}
\definecolor{corange}{RGB}{255, 127, 14}
\definecolor{cgreen}{RGB}{44, 160, 44}
\definecolor{cred}{RGB}{214, 39, 40}
\definecolor{cpurple}{RGB}{148, 103, 189}
\definecolor{cbrown}{RGB}{140, 86, 75}
\definecolor{cpink}{RGB}{227, 119, 194}
\definecolor{cgrey}{RGB}{127, 127, 127}
\definecolor{cyellow}{RGB}{188, 189, 34}
\definecolor{cteal}{RGB}{23, 190, 207}
$$
&lt;/div&gt;
&lt;!-- BODY --&gt;
&lt;p&gt;Let 

$A \in \R^{n \times n}$ be invertible and 

$b \in \R^n$. An &lt;strong&gt;iterative method&lt;/strong&gt; for solving 

$Ax = b$ generates a sequence of iterates 

$(x^{(k)})_{k=1}^\infty$ approximating the exact solution 

$x^*$, given an initial guess 

$x^{(0)}$. We can express this as 

$x^{(k+1)} := \phi_{k+1}(x^{(0)}, \dots, x^{(k)}; A, b)$ for some functions 

$\phi_k$; if these functions are eventually independent of 

$k$, the method is said to be &lt;strong&gt;stationary&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We will consider stationary iterative methods of the form 

$x^{(k+1)} := G x^{(k)} + f$, called &lt;strong&gt;(first-degree) linear stationary iterative methods&lt;/strong&gt;. Such methods are typically derived from a &lt;strong&gt;splitting&lt;/strong&gt; 

$A = M - N$, where 

$M$ is an invertible matrix that “approximates” 

$A$ but is easier to solve linear systems with. Specifically, since 

$Mx^* = Nx^* + b$, we take 

$G := M^{-1} N = I - M^{-1} A$ and 

$f = M^{-1} b$ so that the exact solution is a fixed point of the iteration. Equivalently, we can view 

$x^{(k+1)} = x^{(k)} + M^{-1} (b - Ax^{(k)})$ as a correction of 

$x^{(k)}$ based on the &lt;strong&gt;residual&lt;/strong&gt; 

$r^{(k)} := b - Ax^{(k)}$. We also note that the &lt;strong&gt;error&lt;/strong&gt; 

$e^{(k)} := x^* - x^{(k)}$ satisfies 

$e^{(k+1)} = Ge^{(k)}$, so the convergence of a splitting method depends on the properties of its &lt;strong&gt;iteration matrix&lt;/strong&gt; 

$G$.&lt;/p&gt;
&lt;h2 id=&#34;splitting-methods&#34;&gt;Splitting methods&lt;/h2&gt;
&lt;p&gt;Let 

$L$, 

$D$, and 

$U$ denote the &lt;em&gt;strictly&lt;/em&gt; lower, diagonal, and &lt;em&gt;strictly&lt;/em&gt; upper triangular parts of 

$A$. The splittings for some basic linear stationary iterative methods are as follows.&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;&lt;strong&gt;Method&lt;/strong&gt;&lt;/th&gt;
          &lt;th&gt;

$M$&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Jacobi&lt;/td&gt;
          &lt;td&gt;

$D$&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;

$\omega$-Jacobi (

$\omega \neq 0$)&lt;/td&gt;
          &lt;td&gt;

$\frac{1}{\omega} D$&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Gauss–Seidel&lt;/td&gt;
          &lt;td&gt;

$L + D$&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;

$\omega$-Gauss–Seidel/successive overrelaxation (SOR) (

$\omega \neq 0$)&lt;/td&gt;
          &lt;td&gt;

$L + \frac{1}{\omega} D$&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Symmetric successive overrelaxation (SSOR) (

$\omega \neq 0, 2$)&lt;/td&gt;
          &lt;td&gt;

$\frac{\omega}{2-\omega}(L + \frac{1}{\omega} D) D^{-1} (U + \frac{1}{\omega} D)$&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Richardson (

$\alpha \neq 0$)&lt;/td&gt;
          &lt;td&gt;

$\frac{1}{\alpha} I$&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The parameter 

$\omega$ is known as the &lt;strong&gt;relaxation&lt;/strong&gt;/&lt;strong&gt;damping&lt;/strong&gt; parameter and arises from taking 

$x^{(k+1)} = (1-\omega) x^{(k)} + \omega \hat{x}^{(k+1)}$, where 

$\hat{x}^{(k+1)}$ denotes the result of applying the corresponding nonparametrized (

$\omega = 1$) method to 

$x^{(k)}$. If 

$\omega &lt; 1$, the method is said to be &lt;strong&gt;underrelaxed&lt;/strong&gt;/&lt;strong&gt;underdamped&lt;/strong&gt;; if 

$\omega &gt; 1$, it is said to be &lt;strong&gt;overrelaxed&lt;/strong&gt;/&lt;strong&gt;overdamped&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The SSOR method arises from performing a “forward” 

$\omega$-Gauss–Seidel step with 

$M = L + \frac{1}{\omega} D$ followed by a “backward” 

$\omega$-Gauss–Seidel step with 

$M = U + \frac{1}{\omega} D$.&lt;/p&gt;
&lt;h2 id=&#34;convergence-theorems&#34;&gt;Convergence theorems&lt;/h2&gt;
&lt;p&gt;Clearly, since 

$e^{(k)} = G^k e^{(0)}$, if 

$\norm{G} &lt; 1$ for some operator norm, then the method &lt;strong&gt;converges&lt;/strong&gt; in the sense that 

$x^{(k)} \to x^*$ for all 

$x^{(0)}$.&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; More generally, we see that the method is convergent if and only if 

$G^k \to 0$, which in turn depends on the &lt;strong&gt;spectral radius&lt;/strong&gt; 

$\rho(G)$ of 

$G$​, the maximum of the absolute values of its eigenvalues when regarded as a complex matrix.&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Namely, if 

$(\lambda, v)$ is an eigenpair of 

$G$ with 

$\norm{v} = 1$, then 

$\abs{\lambda}^k = \abs{\lambda^k} = \norm{G^k v} \leq \norm{G^k}$ for the induced operator norm, so 

$\rho(G)^k = \rho(G^k) \leq \norm{G^k}$. Thus, if 

$G^k \to 0$, then 

$\rho(G) &lt; 1$. In fact, the converse is also true.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let 

$G \in \C^{n \times n}$. Then 

$G^k \to 0$ if and only if 

$\rho(G) &lt; 1$ (such a matrix is called &lt;strong&gt;convergent&lt;/strong&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; It remains to show that 

$G^k \to 0$ if 

$\rho(G) &lt; 1$. Let 

$UTU^*$ be a Schur factorization of 

$G$ and let 

$D$ and 

$N$ denote the diagonal and strictly upper triangular parts of 

$T$. Since the product of a diagonal matrix and a strictly upper triangular matrix is strictly upper triangular, and the product of 

$n$ (or more) strictly upper triangular 

$n \times n$ matrices is zero, for all 

$k \geq n$, we have


$$
\norm{G^k}_2 
= \norm{(D+N)^k}_2
\leq \sum_{j=0}^{n-1} \binom{k}{j} \norm{D}_2^{k-j} \norm{N}_2^j
= \sum_{j=0}^{n-1} \binom{k}{j} \rho(G)^{k-j} \norm{N}_2^j \to 0. \quad \blacksquare
$$
Using this fact, we can also prove a well-known formula for the spectral radius.&lt;sup id=&#34;fnref:3&#34;&gt;&lt;a href=&#34;#fn:3&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Gelfand’s formula&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Let 

$G \in \C^{n \times n}$ and 

$\norm{{}\cdot{}}$ be an operator norm. Then 

$\rho(G) = \lim_{k \to \infty} \norm{G^k}^{1/k}$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; We previously saw that 

$\rho(G) \leq \norm{G^k}^{1/k}$ for all 

$k$, so 

$\rho(G) \leq \liminf_{k \to \infty} \norm{G^k}^{1/k}$. On the other hand, if 

$\varepsilon &gt; 0$ is arbitrary and 

$\hat{G} := \frac{G}{\rho(G) + \varepsilon}$, then 

$\rho(\hat{G}) &lt; 1$, so by the preceding result, 

$\norm{\hat{G}^k} \leq 1$ for all sufficiently large 

$k$, which is to say that 

$\norm{G^k} \leq (\rho(G) + \varepsilon)^k$. Hence we also have 

$\limsup_{k \to \infty} \norm{G^k}^{1/k} \leq \rho(G) + \varepsilon$. ∎&lt;/p&gt;
&lt;p&gt;As 

$\norm{e^{(k)}} \leq \norm{G^k} \norm{e^{(0)}} \approx \rho(G)^k \norm{e^{(0)}}$ for large 

$k$, the rate of convergence can often be estimated using the spectral radius of the iteration matrix. More precisely, a direct computation shows that if 

$G$ is diagonalizable and has a unique dominant eigenvalue, then 

$\frac{\norm{e^{(k+1)}}}{\norm{e^{(k)}}} \sim \rho(G)$, provided that 

$e^{(0)}$ has a nonzero component in the corresponding eigenspace.&lt;/p&gt;
&lt;h3 id=&#34;general-matrices&#34;&gt;General matrices&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;If the Jacobi method converges, then the 

$\omega$-Jacobi method converges for 

$\omega \in (0, 1]$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; For the 

$\omega$-Jacobi method, we have 

$G = (1-\omega) I + \omega G_\mathrm{J}$, where 

$G_\mathrm{J}$ denotes the iteration matrix for the Jacobi method. Hence 

$\rho(G) \leq (1-\omega) + \omega \rho(G_\mathrm{J}) &lt; 1$. ∎&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If the 

$\omega$-Gauss–Seidel method converges, then 

$\omega \in (0, 2)$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; For the 

$\omega$-Gauss–Seidel method, we have 

$G = (L + \frac{1}{\omega} D)^{-1} ((\frac{1}{\omega} - 1) D - U)$, so 

$\det(G) = \det(\frac{1}{\omega} D)^{-1} \det((\frac{1}{\omega} - 1) D) = \det((1-\omega) I) = (1-\omega)^n$. On the other hand, the determinant of 

$G$ is the product of its eigenvalues, so 

$\abs{1-\omega}^n \leq \rho(G)^n &lt; 1$. ∎&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If the SSOR method converges, then 

$\omega \in (0, 2)$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; For the SSOR method, we have 

$G = G_b G_f$, where 

$G_f$ and 

$G_b$ denote the iteration matrices for the forward and backward 

$\omega$-Gauss–Seidel methods. Arguing as in the preceding proof, we obtain 

$\abs{1-\omega}^{2n} \leq \rho(G)^n &lt; 1$. ∎&lt;/p&gt;
&lt;h3 id=&#34;symmetric-positive-definite-matrices&#34;&gt;Symmetric positive definite matrices&lt;/h3&gt;
&lt;p&gt;If 

$A$ is symmetric positive definite (SPD), then 

$A$ is invertible and all the splitting methods above are applicable since its diagonal entries must be positive. Recall also that symmetric matrices are partially ordered by the Loewner order 

$\prec$ in which 

$A \prec B$ if and only if 

$B-A$ is SPD, and that an SPD matrix 

$A$ defines an inner product 

$\inner{x}{y}_A := \inner{Ax}{y}_2$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ is SPD and 

$A \prec M + M^\trans$, then 

$\norm{G}_A &lt; 1$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; Let 

$x$ be a vector with 

$\norm{x}_A = 1$ such that 

$\norm{G}_A = \norm{Gx}_A$ (which exists by the extreme value theorem) and let 

$y := M^{-1} Ax$. Then


$$
\begin{align*}
\norm{G}_A^2
&amp;= \norm{x-y}_A^2 \\
&amp;= 1 - \inner{x}{y}_A - \inner{y}{x}_A + \inner{y}{y}_A \\
&amp;= 1 - \inner{(M + M^\trans - A)y}{y}_2 &lt; 1. \quad \blacksquare
\end{align*}
$$&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Convergence for SPD matrices&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Let 

$A$ be SPD.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If 

$A \prec \frac{2}{\omega} D$, then the 

$\omega$-Jacobi method converges.&lt;/li&gt;
&lt;li&gt;If 

$\omega \in (0, 2)$, then the 

$\omega$-Gauss–Seidel method converges.&lt;/li&gt;
&lt;li&gt;If 

$\omega \in (0, 2)$, then the SSOR method converges.&lt;/li&gt;
&lt;li&gt;If 

$\alpha \in (0, \frac{2}{\rho(A)})$, then the Richardson method converges.
Moreover, if the eigenvalues of 

$A$ are 

$\lambda_1 \geq \cdots \geq \lambda_n &gt; 0$, then 

$\rho(G)$ is minimized when 

$\alpha = \alpha^* := \frac{2}{\lambda_1 + \lambda_n}$, in which case 

$\norm{e^{(k+1)}}_2 \leq (1 - \frac{2}{\kappa + 1}) \norm{e^{(k)}}_2$, where 

$\kappa$ is the 2-norm condition number of 

$A$.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; The convergence statements follow immediately from the preceding result.
For the Richardson method, 

$\rho(G(\alpha)) = \max_i \, \abs{1 - \alpha \lambda_i} = \max \, \set{1-\alpha\lambda_n, \alpha \lambda_1 - 1}$, so 

$\rho(G(\alpha)) = 1-\alpha\lambda_n \geq \rho(G(\alpha^*))$ when 

$\alpha \leq \alpha^*$ and 

$\rho(G(\alpha)) = \alpha\lambda_1 - 1 \geq \rho(G(\alpha^*))$ when 

$\alpha \geq \alpha^*$. Finally, since 

$G$ and 

$A$ are normal, we have 

$\norm{G(\alpha^*)}_2 = \rho(G(\alpha^*)) = 1 - \frac{2}{\kappa + 1}$. ∎&lt;/p&gt;
&lt;h3 id=&#34;diagonally-dominant-matrices&#34;&gt;Diagonally dominant matrices&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;weakly diagonally dominant (WDD)&lt;/strong&gt; if every row 

$i$ is WDD: 

$\abs{a_{ii}} \geq \sum_{j \neq i} \abs{a_{ij}}$.&lt;/li&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;strictly diagonally dominant (SDD)&lt;/strong&gt; if every row 

$i$ is SDD: 

$\abs{a_{ii}} &gt; \sum_{j \neq i} \abs{a_{ij}}$.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;directed graph&lt;/strong&gt; of 

$A$ is 

$\mathcal{G}_A = (V, E)$ with 

$V = \set{1, \dots, n}$ and 

$(i, j) \in E$ if and only if 

$a_{ij} \neq 0$.
&lt;ul&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;irreducible&lt;/strong&gt; if 

$\mathcal{G}_A$ is strongly connected.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;irreducibly diagonally dominant (IDD)&lt;/strong&gt; if it is irreducible, WDD, &lt;em&gt;and some row is SDD&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;weakly chained diagonally dominant (WCDD)&lt;/strong&gt; if it is WDD and for every row 

$i$, there exists an SDD row 

$j$ with a path from 

$i$ to 

$j$ in 

$\mathcal{G}_A$. (Thus, SDD and IDD matrices are both WCDD.)&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ is WCDD, then 

$A$ is invertible.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; Suppose for the sake of contradiction that there exists an 

$x \in \ker(A)$ with 

$\norm{x}_\infty = 1$, and let 

$i_1$ be such that 

$\abs{x_{i_1}} = 1$. Let 

$(x_{i_1}, \dots, x_{i_k})$ be a path in 

$\mathcal{G}_A$ such that row 

$i_k$ is SDD. Since 

$\sum_j a_{i_1 j} x_j = 0$, we have


$$
\abs{a_{i_1 i_1}} = \abs{-a_{i_1 i_1} x_{i_1}} \leq \sum_{j \neq i_1} \abs{a_{i_1 j}} \abs{x_j} \leq \sum_{j \neq i_1} \abs{a_{i_1 j}},
$$
so row 

$i_1$ is not SDD. However, since it is WDD, equality must hold throughout, which implies that 

$\abs{x_{i_2}} = 1$ because 

$a_{i_1 i_2} \neq 0$. Iterating this argument, we ultimately deduce that row 

$i_k$ is not SDD, which is a contradiction. ∎&lt;/p&gt;
&lt;p&gt;As a result, if 

$A$ is WCDD, then all the splitting methods above are applicable since its diagonal entries must be nonzero (otherwise, it would have a zero row and fail to be invertible).&lt;/p&gt;
&lt;p&gt;We also note that this immediately implies the &lt;strong&gt;Levy–Desplanques theorem&lt;/strong&gt;: if 

$A$ is SDD, then 

$A$ is invertible. This, in turn, is equivalent to the &lt;strong&gt;Gershgorin circle theorem&lt;/strong&gt;: if 

$\lambda$ is an eigenvalue of 

$A$, then 

$\abs{\lambda - a_{ii}} \leq \sum_{j \neq i} \abs{a_{ij}} =: r_i$ for &lt;em&gt;some&lt;/em&gt; 

$i$ (in other words, 

$\lambda \in B_{r_i}(a_{ii})$ for &lt;em&gt;some&lt;/em&gt; 

$i$). Similarly, if 

$A$ is IDD, then 

$A$ is invertible; or equivalently, if 

$A$ is irreducible and 

$\lambda$ is an eigenvalue of 

$A$ such that 

$\abs{\lambda - a_{ii}} \geq r_i$ for &lt;em&gt;every&lt;/em&gt; 

$i$, then 

$\abs{\lambda - a_{ii}} = r_i$ for &lt;em&gt;every&lt;/em&gt; 

$i$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Convergence for WCDD matrices&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If 

$A$ is WCDD, then the 

$\omega$-Jacobi and 

$\omega$-Gauss–Seidel methods converge for 

$\omega \in (0, 1]$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; If 

$\abs{\lambda} \geq 1$, then 

$\abs{\frac{\lambda - 1}{\omega} + 1} \geq \abs{\lambda}$, so 

$(\lambda - 1) M + A$ has the same WDD/SDD rows and directed graph as 

$A$. Hence 

$\lambda I - G = M^{-1} ((\lambda - 1) M + A)$ is invertible for such 

$\lambda$, so 

$\rho(G) &lt; 1$. ∎&lt;/p&gt;
&lt;h3 id=&#34;consistently-ordered-matrices&#34;&gt;Consistently ordered matrices&lt;/h3&gt;
&lt;p&gt;In this section, we &lt;em&gt;assume&lt;/em&gt; that the diagonal entries of 

$A$ are nonzero and consider the iteration matrices 

$G_\mathrm{J} := -D^{-1} (L+U)$ and 

$G_\mathrm{GS}(\omega) := (L + \frac{1}{\omega} D)^{-1}(\frac{1-\omega}{\omega} D - U)$ for the Jacobi and 

$\omega$-Gauss–Seidel methods, respectively.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;

$A$ has &lt;strong&gt;property 

$\mathrm{A}_{q, r}$&lt;/strong&gt; (

$q, r \geq 1$) if there exists a partition 

$\set{S_k}_{k=1}^p$ of 

$\set{1, \dots, n}$ with 

$p = q + r$ such that if 

$a_{ij} \neq 0$ for some 

$i \neq j$ and 

$i \in S_k$, then either 

$i \in S_1 \cup \cdots \cup S_q$ and 

$j \in S_{k+r}$, or 

$i \in S_{q+1} \cup \cdots \cup S_p$ and 

$j \in S_{k-q}$.
&lt;ul&gt;
&lt;li&gt;

$A$ has &lt;strong&gt;property 

$\mathrm{A}_p$&lt;/strong&gt; (or is &lt;strong&gt;

$p$-cyclic&lt;/strong&gt;) (

$p \geq 2$) if it has property 

$\mathrm{A}_{1,p-1}$.
(Property 

$\mathrm{A}_2$ is usually called “property 

$\mathrm{A}$” and is equivalent to the directed graph of 

$A$ being bipartite, ignoring loops.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;

$A$ is &lt;strong&gt;

$(q, r)$-consistently ordered&lt;/strong&gt; (

$q, r \geq 1$) if there exists a partition 

$\set{S_k}_{k=1}^p$ of 

$\set{1, \dots, n}$ (not necessarily with 

$p = q + r$) such that if 

$a_{ij} \neq 0$ and 

$i \in S_k$, then 

$i \in S_1 \cup \cdots \cup S_{p-r}$ and 

$j \in S_{k+r}$ if 

$i &lt; j$, or 

$i \in S_{q+1} \cup \cdots \cup S_p$ and 

$j \in S_{k-q}$ if 

$i &gt; j$.
(A 

$(1, p-1)$-consistently ordered matrix is sometimes called “consistently ordered 

$p$-cyclic”.)&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ has property 

$\mathrm{A}_{q, r}$, then it has a &lt;strong&gt;

$(q, r)$-ordering vector&lt;/strong&gt;: a 

$\gamma \in \Z^n$ such that if 

$a_{ij} \neq 0$ for some 

$i \neq j$, then 

$\gamma_j - \gamma_i = r$ or 

$\gamma_j - \gamma_i = -q$; and for each 

$k \in \set{1, \dots, p := q+r}$, there exists an 

$i$ such that 

$\gamma_i \equiv k \pmod{p}$. Conversely, if 

$A$ has such a vector, then 

$A$ has property 

$\mathrm{A}_{q, r}$.&lt;/p&gt;
&lt;p&gt;If 

$A$ is 

$(q, r)$-consistently ordered, then it has a &lt;strong&gt;&lt;em&gt;compatible&lt;/em&gt; 

$(q, r)$-ordering vector&lt;/strong&gt;: a 

$(q, r)$-ordering vector 

$\gamma$ such that if 

$a_{ij} \neq 0$ for some 

$i \neq j$, then 

$\gamma_j - \gamma_i = r$ or 

$\gamma_j - \gamma_i = -q$ according as 

$i &lt; j$ or 

$i &gt; j$. Conversely, if 

$A$ has such a vector, then 

$A$ is 

$(q, r)$-consistently ordered.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; If 

$A$ has property 

$\mathrm{A}_{q, r}$ and 

$\set{S_k}_{k=1}^p$ is as in the definition of this property, take 

$\gamma_i := k$, where 

$k$ is such that 

$i \in S_k$. Conversely, if 

$A$ has a 

$(q, r)$-ordering vector 

$\gamma$, take 

$S_k := \set{i : \gamma_i \equiv k \pmod{p}}$, where 

$p = q + r$. The same constructions apply in the consistently ordered case. ∎&lt;/p&gt;
&lt;p&gt;This proof also shows that we may assume without loss of generality that 

$p = q + r$ in the definition of a 

$(q, r)$-consistently ordered matrix. We also note that 

$A$ has property 

$\mathrm{A}_{q, r}$ if and only if it can be &lt;em&gt;symmetrically permuted&lt;/em&gt; (that is, conjugated by a permutation matrix) to a 

$(q, r)$-consistently ordered matrix, since the former property is invariant under symmetric permutations.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ is 

$(q, r)$-consistently ordered, then 

$\det(\lambda D + \alpha^{-q} L + \alpha^r U)$ is independent of 

$\alpha$ (for all 

$\lambda$).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; Suppose that 

$\sigma$ is a permutation of 

$\set{1, \dots, n}$ with 

$a_{\sigma(i), i} \neq 0$ for all 

$i$. Then


$$
\prod_{i=1}^n (\lambda D + \alpha^{-q} L + \alpha^r U)_{\sigma(i), i} = \lambda^{n-(n_L + n_U)} (\alpha^{-q})^{n_L} (\alpha^r)^{n_U} \prod_{i=1}^n a_{\sigma(i), i},
$$
where 

$n_L := \# \set{i : \sigma(i) &gt; i}$ and 

$n_U := \# \set{i : \sigma(i) &lt; i}$. Now if 

$\gamma$ is a compatible 

$(q, r)$-ordering vector for 

$A$, then 

$\sum_{\sigma(i) &gt; i} \gamma_i - \gamma_{\sigma(i)} = -qn_L$ and 

$\sum_{\sigma(i) &lt; i} \gamma_i - \gamma_{\sigma(i)} = rn_U$, so 

$-qn_L + rn_U = \sum_{\sigma(i) \neq i} \gamma_i - \gamma_{\sigma(i)} = \sum_{\sigma(i) \neq i} \gamma_i - \sum_{j \neq \sigma^{-1}(j)} \gamma_j = 0$. Hence any product of the form above – of which the determinant is a sum – is independent of 

$\alpha$. ∎&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Suppose that 

$A$ is 

$(q, r)$-consistently ordered and let 

$p := q + r$. Then 

$\lambda$ is an eigenvalue of 

$G_\mathrm{GS}(\omega)$ if and only if


$$
(\lambda + \omega - 1)^p = \lambda^{r} \omega^p \mu^p
$$
for some eigenvalue 

$\mu$ of 

$G_\mathrm{J}$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; Since 

$D$ is invertible, 

$\mu$ is an eigenvalue of 

$G_\mathrm{J}$ if and only if 

$D(\mu I - G_\mathrm{J}) = \mu D + L + U$ is singular. Similarly, 

$\lambda$ is an eigenvalue of 

$G_\mathrm{GS}(\omega)$ if and only if 

$(L + \frac{1}{\omega}D)(\lambda I - G_\mathrm{GS}(\omega)) = \frac{\lambda + \omega - 1}{\omega} D + \lambda L + U$ is singular, or equivalently for 

$\lambda$ nonzero, 

$\lambda^{-r/p} \cdot \frac{\lambda + \omega - 1}{\omega} D + (\lambda^{-1/p})^{-q} L + (\lambda^{-1/p})^r U$ is singular. Thus, the result follows from the determinantal invariance property above (note that 

$0$ is an eigenvalue of 

$G_\mathrm{GS}(\omega)$ if and only if 

$\omega = 1$, while 

$0$ is always an eigenvalue of 

$G_\mathrm{J}$). ∎&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;In the special case of a 

$(1, p-1)$-consistently ordered matrix, we obtain the following corollary.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Convergence for consistently ordered matrices&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Suppose that 

$A$ is 

$(1, p-1)$-consistently ordered. Then 

$\rho(G_\mathrm{GS}(1)) = \rho(G_\mathrm{J})^p$. In particular, the Gauss–Seidel method converges if and only if the Jacobi method converges.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Furthermore, in this case, if 

$\mu$ is an eigenvalue of 

$G_\mathrm{J}$, then so is 

$\theta \mu$ for each 

$p$&lt;sup&gt;th&lt;/sup&gt; root of unity 

$\theta$, since 

$\mu D + L + U$ is singular if and only if 

$\theta(\mu D + \theta^{-1} L + \theta^{p-1} U)$ is.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Optimal Gauss–Seidel relaxation parameter for consistently ordered matrices&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Suppose that 

$A$ is 

$(1, p-1)$-consistently ordered, that the Jacobi method converges, and that the eigenvalues of 

$G_\mathrm{J}^p$ are real and nonnegative. Then the 

$\omega$-Gauss–Seidel method converges for all 

$\omega \in (0, \frac{p}{p-1})$, and if 

$\omega_*$ is the unique solution in 

$(0, \frac{p}{p-1})$ of


$$
[\rho(G_\mathrm{J}) \omega_*]^p = \left(\frac{p}{p-1}\right)^p (p-1)(\omega_* - 1),
$$
then for all 

$\omega \neq \omega_*$, we have


$$
\rho(G_\mathrm{GS}(\omega_*)) = (p - 1)(\omega_* - 1) &lt; \rho(G_\mathrm{GS}(\omega)).
$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; We sketch the proof for the case 

$p = 2$; for details and the general case, see &lt;a href=&#34;http://dx.doi.org/10.2140/pjm.1959.9.617&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Varga (1959)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;When 

$A$ is 

$(1, 1)$-consistently ordered, the eigenvalues 

$\lambda$ of 

$G_\mathrm{GS}(\omega)$ are the zeroes of the equations 

$(\lambda + \omega - 1)^2 = \lambda \omega^2 \mu^2$ for 

$\mu$ in the set of eigenvalues of 

$G_\mathrm{J}$. For a fixed 

$\mu \geq 0$, these zeroes are the abscissae of the intersection points of the line 

$y = \frac{\lambda + \omega - 1}{\omega}$ (which has slope 

$\frac{1}{\omega}$ and passes through 

$(1, 1)$) and the curve 

$y = \pm \lambda^{1/2} \mu$ in the 

$\lambda$-

$y$ plane. The larger abscissa is seen to &lt;em&gt;decrease&lt;/em&gt; as 

$\omega$ increases from 

$0$ until the line and the curve are tangent, beyond which the equation has two conjugate complex zeroes of modulus 

$\omega - 1$, which &lt;em&gt;increases&lt;/em&gt; as 

$\omega$ increases. The abscissa of the point of tangency is defined by the equations 

$\frac{\lambda + \omega - 1}{\omega} = \lambda^{1/2} \mu$ and 

$\frac{1}{\omega} = \frac{1}{2} \lambda^{-1/2} \mu$, and is maximal when 

$\mu = \rho(G_\mathrm{J})$. Thus, 

$[\rho(G_\mathrm{J}) \omega_*]^2 = 2^2 \lambda$, where 

$\lambda = \omega_* - 1 = \rho(G_\mathrm{GS}(\omega_*))$. ∎&lt;/p&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;In other words, if 

$\norm{G} &lt; 1$, then 

$G^k \to 0$ strongly (because 

$G^k \to 0$ in norm).&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34;&gt;
&lt;p&gt;In other words, 

$G^k \to 0$ strongly if and only if 

$G^k \to 0$ in norm (because 

$G$ is an operator on a finite-dimensional space).&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:3&#34;&gt;
&lt;p&gt;This formula remains true if 

$G$ is a continuous linear operator on a Banach space 

$X$. Consequently, in this setting, we still have that 

$G^k \to 0$ in norm if and only if 

$\rho(G) &lt; 1$. These are in turn equivalent to the invertibility of 

$I-G$ and the convergence of the fixed-point iteration of 

$x \mapsto Gx + f$ for all 

$f \in X$ (to 

$(I-G)^{-1} f$).&amp;#160;&lt;a href=&#34;#fnref:3&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>Projection methods for linear systems</title>
      <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/projlin/</link>
      <pubDate>Fri, 17 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/projlin/</guid>
      <description>&lt;div class=&#34;btn-links mb-3&#34;&gt;
&lt;a class=&#34;btn btn-outline-primary btn-page-header btn-sm&#34; href=&#34;../projlin.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;
  PDF
&lt;/a&gt;
&lt;/div&gt;
&lt;!--
No newlines allowed between $$&#39;s below!
--&gt;
&lt;div style=&#34;display: none;&#34;&gt;
$$
%% Sets and functions %%
\newcommand{\set}[1]{\{ #1 \}}
\newcommand{\Set}[1]{\left \{ #1 \right\}}
\renewcommand{\emptyset}{\varnothing}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rn}{\mathbb{R}^n}
\newcommand{\Rm}{\mathbb{R}^m}
\newcommand{\C}{\mathbb{C}}
\newcommand{\F}{\mathbb{F}}
%% Linear algebra %%
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\Abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\inner}[2]{\langle #1, #2 \rangle}
\newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle}
\newcommand{\norm}[1]{\lVert #1 \rVert}
\newcommand{\Norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\tp}{{\top}}
\newcommand{\trans}{{\top}}
\newcommand{\span}{\mathop{\mathrm{span}}}
\newcommand{\im}{\mathop{\mathrm{im}}}
\newcommand{\ker}{\mathop{\mathrm{ker}}}
\newcommand{\rank}{\mathop{\mathrm{rank}}}
\newcommand{\proj}{\mathrm{proj}}
\newcommand{\K}{\mathcal{K}}
\newcommand{\L}{\mathcal{L}}
%% Colours %%
\definecolor{cblue}{RGB}{31, 119, 180}
\definecolor{corange}{RGB}{255, 127, 14}
\definecolor{cgreen}{RGB}{44, 160, 44}
\definecolor{cred}{RGB}{214, 39, 40}
\definecolor{cpurple}{RGB}{148, 103, 189}
\definecolor{cbrown}{RGB}{140, 86, 75}
\definecolor{cpink}{RGB}{227, 119, 194}
\definecolor{cgrey}{RGB}{127, 127, 127}
\definecolor{cyellow}{RGB}{188, 189, 34}
\definecolor{cteal}{RGB}{23, 190, 207}
$$
&lt;/div&gt;
&lt;!-- BODY --&gt;
&lt;p&gt;Let 

$A \in \R^{n \times n}$ be invertible and 

$b \in \R^n$. A &lt;strong&gt;projection method&lt;/strong&gt; for solving 

$Ax = b$ produces an approximation 

$\tilde{x}$ to the exact solution 

$x^*$ within an 

$m$-dimensional &lt;strong&gt;search subspace&lt;/strong&gt; 

$\K$ translated by an initial guess 

$x^{(0)}$, such that the &lt;strong&gt;residual&lt;/strong&gt; 

$b - A\tilde{x}$ is &lt;em&gt;orthogonal to&lt;/em&gt; an 

$m$-dimensional &lt;strong&gt;constraint subspace&lt;/strong&gt; 

$\L$. In other words, 

$\tilde{x} \in x^{(0)} + \K$ with 

$b - A\tilde{x} \in \L^\perp$. If 

$\L = \K$, the projection method is said to be &lt;strong&gt;orthogonal&lt;/strong&gt; and its orthogonality constraints are known as the &lt;strong&gt;Galerkin conditions&lt;/strong&gt;; otherwise, the method is said to be &lt;strong&gt;oblique&lt;/strong&gt; and its constraints are known as the &lt;strong&gt;Petrov–Galerkin conditions&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Such a method is well-defined if and only if 

$A\K \cap \L^\perp = \set{0}$. Indeed, if 

$A\K \cap \L^\perp = \set{0}$ and 

$V, W \in \R^{n \times m}$ are matrices whose columns are bases of 

$\K$ and 

$\L$, then we must have 

$\tilde{x} = x^{(0)} + Vy$ for some 

$y \in \R^m$ such that 

$W^\tp (r^{(0)} - AVy) = 0$, where 

$r^{(0)} := b - Ax^{(0)}$. Hence


$$
\tilde{x} = x^{(0)} + V(W^\tp AV)^{-1} W^\trans r^{(0)},
$$
where 

$W^\tp AV$ is invertible because 

$\im(AV) \cap \ker(W^\tp) = \set{0}$. In addition, if 

$\tilde{x}’ \in x^{(0)} + \K$ with 

$b-A\tilde{x}’ \in \L^\perp$, then 

$A(\tilde{x}-\tilde{x}’) \in A\K \cap \L^\perp$, so 

$\tilde{x} = \tilde{x}’$. Conversely, if the method is well-defined and 

$Av \in \L^\perp$ for some 

$v \in \K$, then 

$\tilde{x} + v \in x^{(0)} + \K$ and 

$b - A(\tilde{x} + v) \in \L^\perp$, so 

$v = 0$ and hence 

$Av = 0$.&lt;/p&gt;
&lt;p&gt;This projection process may be iterated by selecting new subspaces 

$\K$ and 

$\L$ and using 

$\tilde{x}$ as the initial guess for the next iteration, yielding a variety of iterative methods for linear systems, such as the well-known Krylov subspace methods. These iterative methods can sometimes experience a “lucky breakdown” when the projection produces the exact solution:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$r^{(0)} \in \K$ and 

$\K$ is 

$A$-invariant, then 

$A\tilde{x} = b$ (or equivalently, 

$\tilde{x} = x^*$).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; By definition, 

$\tilde{x} - x^{(0)} \in \K$ and 

$b - A\tilde{x} \in \L^\perp$. On the other hand, 

$A\K \subseteq \K$ and 

$\dim(A\K) = \dim(\K)$ since 

$A$ is invertible, so 

$A\K = \K$. Hence 

$b - A\tilde{x} = r^{(0)} - A(\tilde{x} - x^{(0)}) \in A\K \cap \L^\perp = \set{0}$. ∎&lt;/p&gt;
&lt;h2 id=&#34;error-projection-methods&#34;&gt;Error projection methods&lt;/h2&gt;
&lt;p&gt;An &lt;strong&gt;error projection method&lt;/strong&gt; is a projection method where 

$A$ is symmetric positive definite (SPD) and 

$\L = \K$. Such methods are well-defined because if 

$Av \in \K^\perp$ for some 

$v \in \K$, then 

$\norm{v}_A^2 = 0$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ is SPD and 

$\L = \K$, then 

$\tilde{x}$ uniquely minimizes the 

$A$-norm of the &lt;strong&gt;error&lt;/strong&gt; 

$x^* - \tilde{x}$ over 

$x^{(0)} + \K$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; For all 

$x \in x^{(0)} + \K$, we have 

$\norm{x^* - x}_A^2 = \norm{x^* - \tilde{x}}_A^2 + \norm{\tilde{x} - x}_A^2$ because 

$\tilde{x} - x \in \K$ and 

$x^* - \tilde{x} \perp_A \K$ according to the Galerkin conditions. ∎&lt;/p&gt;
&lt;h3 id=&#34;the-gradient-descent-method&#34;&gt;The gradient descent method&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;gradient descent method&lt;/strong&gt; for solving 

$Ax = b$ when 

$A$ is SPD is the iterative method with 

$\K = \L := \span \set{r^{(k)}}$, where 

$x^{(k)}$ denotes the 

$k$&lt;sup&gt;th&lt;/sup&gt; iterate and 

$r^{(k)} := b - Ax^{(k)}$. Thus, 

$x^{(k+1)}$ minimizes the 

$A$-norm of the error over the line 

$x^{(k)} + \span \set{r^{(k)}}$; indeed, if 

$f(x) := \frac{1}{2} \norm{x^* - x}_A^2$, then 

$\nabla f(x^{(k)}) = -r^{(k)}$, so 

$r^{(k)}$ represents the direction of steepest descent of 

$f$. The projection formula above reduces to


$$
x^{(k+1)} = x^{(k)} + \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{Ar^{(k)}}{r^{(k)}}} \, r^{(k)} =: x^{(k)} + \alpha_k r^{(k)}.
$$
We also note that 

$r^{(k+1)} = r^{(k)} - \alpha_k Ar^{(k)}$, so this method can be implemented with only one multiplication by 

$A$ per iteration.&lt;/p&gt;
&lt;p&gt;To analyze the convergence of the gradient descent method, we consider the error 

$e^{(k)} := x^* - x^{(k)}$. Using the fact that 

$e^{(k+1)} = e^{(k)} - \alpha_k r^{(k)} \perp_A r^{(k)}$, we compute that


$$
\begin{align*}
\norm{e^{(k+1)}}_A^2
&amp;= \inner{e^{(k+1)}}{e^{(k)}}_A \\
&amp;= \norm{e^{(k)}}_A^2 - \alpha_k \inner{r^{(k)}}{e^{(k)}}_A \\
&amp;= \left(1 - \frac{\inner{r^{(k)}}{r^{(k)}}^2}{\inner{r^{(k)}}{r^{(k)}}_A \inner{r^{(k)}}{r^{(k)}}_{A^{-1}}}\right) \norm{e^{(k)}}_A^2.
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;Next, we establish a useful algebraic inequality:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Kantorovich’s inequality&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If 

$\theta_i \geq 0$ and 

$0 &lt; a \leq x_i \leq b$ for 

$1 \leq i \leq n$, then


$$
\left(\sum_{i=1}^n \theta_i x_i\right) \left(\sum_{i=1}^n \frac{\theta_i}{x_i}\right) \leq \frac{(a+b)^2}{4ab} \left(\sum_{i=1}^n \theta_i\right)^2.
$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; By homogeneity, we may assume that 

$\sum_i \theta_i = 1$ and 

$ab = 1$. Since 

$x \mapsto x + \frac{1}{x}$ is convex on 

$[a, b]$, we have 

$x_i + \frac{1}{x_i} \leq a + b$ and hence 

$\sum_i \theta_i x_i + \sum_i \frac{\theta_i}{x_i} \leq \sum_i \theta_i (a+b) = a+b$. The result then follows from the AM–GM inequality. ∎&lt;/p&gt;
&lt;p&gt;Now if the eigenvalues of 

$A$ are 

$\lambda_1 \geq \cdots \geq \lambda_n &gt; 0$, then by Kantorovich’s inequality,


$$
\frac{\inner{r^{(k)}}{r^{(k)}}^2}{\inner{r^{(k)}}{r^{(k)}}_A \inner{r^{(k)}}{r^{(k)}}_{A^{-1}}}
\geq \frac{4\lambda_1 \lambda_n}{(\lambda_1 + \lambda_n)^2} = \frac{4\kappa}{(\kappa + 1)^2},
$$
where 

$\kappa$ is the (2-norm) condition number of 

$A$. Hence&lt;/p&gt;
&lt;blockquote&gt;


$$
\norm{e^{(k)}}_A \leq \left(\frac{\kappa - 1}{\kappa + 1}\right)^k \norm{e^{(0)}}_A.
$$
&lt;/blockquote&gt;
&lt;h2 id=&#34;residual-projection-methods&#34;&gt;Residual projection methods&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;residual projection method&lt;/strong&gt; is a projection method where 

$A$ is invertible and 

$\L = A\K$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If 

$A$ is invertible and 

$\L = A\K$, then 

$\tilde{x}$ uniquely minimizes the norm of the &lt;strong&gt;residual&lt;/strong&gt; 

$b - A\tilde{x}$ over 

$x^{(0)} + \K$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; For all 

$x \in x^{(0)} + \K$, we have 

$\norm{b - Ax}^2 = \norm{b - A\tilde{x}}^2 + \norm{A(\tilde{x} - x)}^2$ because 

$A(\tilde{x} - x) \in A\K$ and 

$b - A\tilde{x} \perp A\K$ according to the Petrov–Galerkin conditions. ∎&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Krylov subspace methods</title>
      <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/krylov/</link>
      <pubDate>Tue, 28 Jan 2025 00:00:00 +0000</pubDate>
      <guid>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/krylov/</guid>
      <description>&lt;div class=&#34;btn-links mb-3&#34;&gt;
&lt;a class=&#34;btn btn-outline-primary btn-page-header btn-sm&#34; href=&#34;../krylov.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;
  PDF
&lt;/a&gt;
&lt;/div&gt;
&lt;!--
No newlines allowed between $$&#39;s below!
--&gt;
&lt;div style=&#34;display: none;&#34;&gt;
$$
%% Sets and functions %%
\newcommand{\set}[1]{\{ #1 \}}
\newcommand{\Set}[1]{\left \{ #1 \right\}}
\renewcommand{\emptyset}{\varnothing}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rn}{\mathbb{R}^n}
\newcommand{\Rm}{\mathbb{R}^m}
\newcommand{\C}{\mathbb{C}}
\newcommand{\F}{\mathbb{F}}
%% Linear algebra %%
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\Abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\inner}[2]{\langle #1, #2 \rangle}
\newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle}
\newcommand{\norm}[1]{\lVert #1 \rVert}
\newcommand{\Norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\tp}{{\top}}
\newcommand{\trans}{{\top}}
\newcommand{\span}{\mathop{\mathrm{span}}}
\newcommand{\im}{\mathop{\mathrm{im}}}
\newcommand{\ker}{\mathop{\mathrm{ker}}}
\newcommand{\rank}{\mathop{\mathrm{rank}}}
\newcommand{\proj}{\mathrm{proj}}
\newcommand{\K}{\mathcal{K}}
\newcommand{\L}{\mathcal{L}}
\newcommand{\deg}{\mathop{\mathrm{deg}}}
%% Colours %%
\definecolor{cblue}{RGB}{31, 119, 180}
\definecolor{corange}{RGB}{255, 127, 14}
\definecolor{cgreen}{RGB}{44, 160, 44}
\definecolor{cred}{RGB}{214, 39, 40}
\definecolor{cpurple}{RGB}{148, 103, 189}
\definecolor{cbrown}{RGB}{140, 86, 75}
\definecolor{cpink}{RGB}{227, 119, 194}
\definecolor{cgrey}{RGB}{127, 127, 127}
\definecolor{cyellow}{RGB}{188, 189, 34}
\definecolor{cteal}{RGB}{23, 190, 207}
$$
&lt;/div&gt;
&lt;!-- BODY --&gt;
&lt;p&gt;Let 

$A \in \R^{n \times n}$ be invertible and 

$b \in \R^n$. A &lt;strong&gt;Krylov subspace method&lt;/strong&gt; for solving 

$Ax = b$ is a projection method in which the search subspaces are &lt;strong&gt;Krylov subspaces&lt;/strong&gt; – subspaces of the form 

$\K_k(A, v) := \span \set{A^j v}_{j=0}^{k-1}$ for some 

$k \geq 0$ and 

$v \in \R^n$.&lt;/p&gt;
&lt;h2 id=&#34;krylov-subspaces&#34;&gt;Krylov subspaces&lt;/h2&gt;
&lt;p&gt;In this section, we regard 

$A$ as a linear operator on a nontrivial finite-dimensional vector space 

$V$. Recall that the &lt;strong&gt;minimal polynomial of 

$A$&lt;/strong&gt; is the &lt;em&gt;monic&lt;/em&gt; polynomial 

$\mu_A$ of minimal degree such that 

$\mu_A(A) = 0$ and that the &lt;strong&gt;minimal polynomial of 

$v$ with respect to 

$A$&lt;/strong&gt; (also known as the &lt;strong&gt;

$A$-annihilator of 

$v$&lt;/strong&gt;) is the &lt;em&gt;monic&lt;/em&gt; polynomial 

$\mu_{A, v}$ of minimal degree such that 

$\mu_{A, v}(A)v = 0$. In particular, any polynomial 

$p$ with 

$p(A) = 0$ must be a multiple of 

$\mu_A$, and similarly for 

$\mu_{A, v}$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;

$\deg(\mu_A) \leq \dim(V)$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; If 

$\dim(V) = 1$, this is trivial, so suppose that for some 

$n &gt; 1$ it is true whenever 

$\dim(V) &lt; n$. Let 

$v \in V \setminus \set{0}$ and 

$m := \deg(\mu_{A, v})$. Then 

$\set{A^j v}_{j=0}^{m-1}$ is linearly independent and annihilated by 

$\mu_{A, v}(A)$. Hence 

$W := \im(\mu_{A, v}(A))$ is an 

$A$-invariant subspace of 

$V$ with 

$\deg(\mu_{A\restriction_W}) \leq n-m &lt; n$, and 

$(\mu_{A\restriction_W} \mu_{A, v})(A) = 0$. ∎&lt;/p&gt;
&lt;p&gt;Now suppose that 

$v \in V \setminus \set{0}$ and let 

$m := \deg(\mu_{A, v})$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;

$\set{A^j v}_{j=0}^{\min \set{k, m} - 1}$ is a basis of 

$\K_k(A, v)$. In particular, 

$\dim(\K_k(A, v)) = \min \set{k, m}$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; If 

$x \in \K_k(A, v)$, then 

$x = p(A) v$ for some polynomial 

$p$ with 

$\deg(p) \leq k-1$. Dividing 

$p$ by 

$\mu_{A, v}$, we obtain 

$p = q \mu_{A, v} + r$ for some polynomials 

$q$ and 

$r$ with 

$\deg(r) \leq m-1$, so 

$x = r(A) v$ and hence 

$\set{A^j v}_{j=0}^{\min \set{k, m} - 1}$ spans 

$\K_k(A, v)$. Moreover, if 

$\sum_{j=0}^{\min \set{k, m} - 1} c_j A^j v = 0$, the 

$c_j$ must be zero by the minimality of 

$m$. ∎&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;

$A$-cyclic subspace generated by 

$v$&lt;/strong&gt; is 

$\mathcal{C}(A, v) := \span \set{A^j v}_{j=0}^\infty$ and is the smallest 

$A$-invariant subspace of 

$V$ containing 

$v$. Clearly, 

$\mathcal{C}(A, v) = \K_m(A, v)$, so the cyclic subspace is also the largest Krylov subspace generated by 

$v$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The following are equivalent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;

$k \geq m$&lt;/li&gt;
&lt;li&gt;

$\K_k(A, v)$ is 

$A$-invariant&lt;/li&gt;
&lt;li&gt;

$\K_k(A, v) = \mathcal{C}(A, v)$&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Moreover, if 

$v = r^{(0)} := b - Ax^{(0)}$, these are equivalent to 

$x^* := A^{-1} b \in x^{(0)} + \K_k(A, r^{(0)})$.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Proof.&lt;/em&gt; The equivalence of the first three statements follows from the discussion above, and from the general theory of projection methods, we know that 

$x^* \in x^{(0)} + \K_k(A, r^{(0)})$ if 

$\K_k(A, r^{(0)})$ is 

$A$-invariant. On the other hand, if 

$x^* \in x^{(0)} + \K_k(A, r^{(0)})$, then 

$r^{(0)} = Ap(A) r^{(0)}$ for some polynomial 

$p$ with 

$\deg(p) \leq k-1$, so 

$q(t) := 1-tp(t)$ is a polynomial such that 

$q(A) r^{(0)} = 0$. Hence 

$m \leq \deg(q) \leq k$. ∎&lt;/p&gt;
&lt;h2 id=&#34;the-arnoldi-iteration&#34;&gt;The Arnoldi iteration&lt;/h2&gt;
&lt;p&gt;To construct Krylov subspace methods, it is useful to generate well-conditioned bases of such subspaces. The &lt;strong&gt;Arnoldi iteration&lt;/strong&gt; produces &lt;em&gt;orthonormal&lt;/em&gt; bases of successive Krylov subspaces 

$\K_j(A, v)$ (

$v \neq 0$) using (modified) Gram–Schmidt orthogonalization&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;


$$
\begin{align*}
	&amp;q_1 = v / \norm{v} \\ \\
	&amp;\texttt{for 

$j = 1$ to 

$k$:} \\
	&amp;\quad v_j = Aq_j \\
	&amp;\quad \texttt{for 

$i = 1$ to 

$j$:} \\
	&amp;\quad \quad h_{ij} = \inner{v_j}{q_i} \\
	&amp;\quad \quad v_j = v_j - h_{ij} q_i \\
	&amp;\quad h_{j+1,\,j} = \norm{v_j} \\
	&amp;\quad q_{j+1} = v_j / h_{j+1,\,j}
\end{align*}
$$
&lt;p&gt;Indeed, if 

$\set{q_i}_{i=1}^j$ is an orthonormal basis of 

$\K_j(A, v)$ (which it is for 

$j = 1$), then initially 

$v_j \in A\K_j(A, v) \subseteq \K_{j+1}(A, v)$. Subsequently, 

$v_j$ is orthogonalized against 

$q_1, \dots, q_j$ and normalized to form 

$q_{j+1}$ (provided that 

$h_{j+1,\,j} \neq 0$), which implies that 

$\set{q_i}_{i = 1}^{j+1}$ is an orthonormal set of vectors in 

$\K_{j+1}(A, v)$ and hence a basis thereof.&lt;/p&gt;
&lt;p&gt;Thus, if 

$m = \deg(\mu_{A, v})$, the Arnoldi iteration will &lt;em&gt;break down in the 

$m$&lt;sup&gt;th&lt;/sup&gt; iteration&lt;/em&gt; (in the sense that 

$h_{m+1,\,m} = 0$). For if it did not break down by the 

$m$&lt;sup&gt;th&lt;/sup&gt; iteration, we would have 

$\dim(\K_{m+1}(A, v)) = m+1$; and if it breaks down (for the first time) in the 

$j$&lt;sup&gt;th&lt;/sup&gt; iteration, then 

$Aq_j - \sum_{i=1}^j h_{ij} q_i = 0$, which is to say that 

$p(A) v = 0$ for some polynomial 

$p$ with 

$\deg(p) \leq j$, so 

$j \geq m$.&lt;/p&gt;
&lt;p&gt;After completing the Arnoldi iteration, we obtain 

$Aq_j = \sum_{i=1}^{j+1} h_{ij} q_i$ for 

$1 \leq j \leq k$, which we can express in matrix form as


$$
A \underbrace{\begin{bmatrix} q_1 &amp; \cdots &amp; q_k \end{bmatrix}}_{=:\,Q_k}
=
\underbrace{\begin{bmatrix} q_1 &amp; \cdots &amp; q_{k+1} \end{bmatrix}}_{Q_{k+1}}
\underbrace{\begin{bmatrix}
h_{11} &amp; \cdots &amp; h_{1k} \\
h_{21} &amp; \ddots &amp; \vdots \\
&amp; \ddots &amp; h_{kk} \\
&amp; &amp; h_{k+1,\,k}
\end{bmatrix}}_{=:\,\widetilde{H}_k}.
$$
We can also view this as a reduction of 

$A$ to upper Hessenberg form: 

$Q_k^\tp A Q_k = H_k$, where 

$H_k$ denotes the upper 

$k \times k$ submatrix of 

$\widetilde{H}_k$.&lt;/p&gt;
&lt;h2 id=&#34;gmres&#34;&gt;GMRES&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;generalized minimal residual (GMRES)&lt;/strong&gt; method for solving 

$Ax = b$ is an iterative &lt;em&gt;residual projection method&lt;/em&gt; whose 

$k$&lt;sup&gt;th&lt;/sup&gt; iterate 

$x^{(k)}$ lies in 

$x^{(0)} + \K_k(A, r^{(0)})$, where 

$r^{(k)} := b - Ax^{(k)}$. Thus, if we apply the Arnoldi iteration to 

$r^{(0)}$, we can write 

$x^{(k)} = x^{(0)} + Q_k y^{(k)}$ with 

$Q_k$ as above and 

$y^{(k)} \in \R^k$ minimizing 

$\norm{r^{(k)}} = \norm{r^{(0)} - AQ_k y^{(k)}} = \norm{\beta_0 e_1^{(k)} - \widetilde{H}_k y^{(k)}}$, where 

$\beta_0 := \norm{r^{(0)}}$ and 

$e_1^{(k)} := \begin{bmatrix} 1 &amp; 0 &amp; \cdots &amp; 0 \end{bmatrix}^\tp \in \R^{k+1}$ (since 

$Q_{k+1}$ has orthonormal columns).&lt;/p&gt;
&lt;p&gt;These least squares problems can be solved by incrementally triangularizing the upper Hessenberg matrices 

$\widetilde{H}_k$. More precisely, if 

$\Omega_k$ is an orthogonal matrix such that 

$\Omega_k \widetilde{H}_k =: \begin{bmatrix} R_k \\ 0_{1 \times k} \end{bmatrix} \in \R^{(k+1) \times k}$ is upper triangular, then


$$
\begin{bmatrix}
\Omega_k \vphantom{\widetilde{H}_k} &amp; \vphantom{h_{k+1}} \\
&amp; 1 \vphantom{h_{k+2,\,k+1}}
\end{bmatrix}
\underbrace{\begin{bmatrix}
\widetilde{H}_k &amp; * \\
&amp; h_{k+2,\,k+1}
\end{bmatrix}}_{\widetilde{H}_{k+1}}
=
\begin{bmatrix}
R_k &amp; {}*{} \\
0 &amp; \color{cblue} {}*{} \\
&amp; \color{cblue} h_{k+2,\,k+1}
\end{bmatrix}.
$$
Hence this matrix can in turn be triangularized using a single Givens rotation 

$G_{k+1}$ (and 

$\Omega_1$ itself can be chosen to be a Givens rotation):


$$
\underbrace{G_{k+1}
\begin{bmatrix}
\Omega_k \vphantom{\widetilde{H}_k} &amp; \vphantom{h_{k+1}} \\
&amp; 1 \vphantom{h_{k+2,\,k+1}}
\end{bmatrix}}_{=:\,\Omega_{k+1}}
\underbrace{\begin{bmatrix}
\widetilde{H}_k &amp; * \\
&amp; h_{k+2,\,k+1}
\end{bmatrix}}_{\widetilde{H}_{k+1}}
=
\begin{bmatrix}
R_k &amp; {}*{} \\
0 &amp; \color{corange} {}*{} \\
&amp; \color{corange} 0
\end{bmatrix}
=:
\begin{bmatrix}
\vphantom{\widetilde{H}_k} R_{k+1} \\
0_{1\times(k+1)} 
\end{bmatrix}.
$$
Furthermore, if 

$\Omega_k (\beta_0 e_1^{(k)}) =: \begin{bmatrix} b_k \\ \beta_k \end{bmatrix} \in \R^{k+1}$, then 

$\norm{r^{(k)}} = \Norm{\begin{bmatrix} b_k \\ \beta_k \end{bmatrix} - \begin{bmatrix} R_k \\ 0_{1 \times k} \end{bmatrix} y^{(k)}}$ is minimized when 

$y^{(k)} = R_k^{-1} b_k$ and 

$\norm{r^{(k)}} = \abs{\beta_k}$, and we have


$$
\underbrace{G_{k+1}
\begin{bmatrix}
\Omega_k \vphantom{\widetilde{H}_k} &amp; \vphantom{h_{k+1}} \\
&amp; 1 \vphantom{h_{k+2,\,k+1}}
\end{bmatrix}}_{\Omega_{k+1}}
\underbrace{\begin{bmatrix}
\beta_0 e_1^{(k)} \\
0
\end{bmatrix}}_{\beta_0 e_1^{(k+1)}}
=
G_{k+1}
\begin{bmatrix}
b_k \\
\color{cblue} \beta_k \\
\color{cblue} 0
\end{bmatrix}
=
\begin{bmatrix}
b_k \\
\color{corange} * \\
\color{corange} *
\end{bmatrix}
=:
\begin{bmatrix}
b_{k+1} \\
\beta_{k+1} 
\end{bmatrix}.
$$
We note that GMRES breaks down precisely when the underlying Arnoldi iteration does, which in view of the discussion above is equivalent to the approximate solution being exact.&lt;/p&gt;
&lt;h3 id=&#34;convergence&#34;&gt;Convergence&lt;/h3&gt;
&lt;p&gt;By definition, the residuals in GMRES satisfy


$$
\norm{r^{(k)}} = \min_{p_{k-1} \in P_{k-1}} \norm{(I - A p_{k-1}(A)) r^{(0)}} = \min_{p_k \in P_k,\, p_k(0) = 1} \norm{p_k(A) r^{(0)}},
$$
where 

$P_k$ denotes the vector space of polynomials with degree at most 

$k$. This immediately yields the following estimate for diagonalizable matrices.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let 

$\sigma(A)$ denote the spectrum of 

$A$. If 

$A = V \Lambda V^{-1}$ for some diagonal matrix 

$\Lambda$, then


$$
\norm{r^{(k)}} \leq \kappa(V) \cdot \min_{p_k \in P_k,\, p_k(0) = 1} \max_{\lambda \in \sigma(A)} \, \abs{p_k(\lambda)} \cdot \norm{r^{(0)}}.
$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;the-lanczos-iteration&#34;&gt;The Lanczos iteration&lt;/h2&gt;
&lt;p&gt;When 

$A$ is &lt;em&gt;symmetric&lt;/em&gt;, the Arnoldi iteration reduces to what is known as the &lt;strong&gt;Lanczos iteration&lt;/strong&gt;. Since 

$H_k = Q_k^\tp A Q_k$, the upper Hessenberg matrix 

$H_k$ must also be symmetric and therefore &lt;em&gt;symmetric tridiagonal&lt;/em&gt;. For this reason, we denote it by 

$T_k$ and define 

$\alpha_j := t_{jj}$ and 

$\beta_j := t_{j,\,j+1} = t_{j+1,\,j}$. With this notation, the algorithm is as follows.


$$
\begin{align*}
	&amp;\beta_0 = 0,\,q_0 = 0 \\
	&amp;q_1 = v / \norm{v} \\ \\
	&amp;\texttt{for 

$j = 1$ to 

$k$:} \\
	&amp;\quad v_j = Aq_j \\
	&amp;\quad \alpha_j = \inner{v_j}{q_j} \\
	&amp;\quad v_j = v_j - \beta_{j-1} q_{j-1} - \alpha_j q_j \\
	&amp;\quad \beta_j = \norm{v_j} \\
	&amp;\quad q_{j+1} = v_j / \beta_j
\end{align*}
$$&lt;/p&gt;
&lt;h2 id=&#34;cg&#34;&gt;CG&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;conjugate gradient (CG)&lt;/strong&gt; method for solving 

$Ax = b$ when 

$A$ is &lt;em&gt;symmetric positive definite&lt;/em&gt; is an iterative &lt;em&gt;error projection method&lt;/em&gt; whose 

$k$&lt;sup&gt;th&lt;/sup&gt; iterate 

$x^{(k)}$ lies in 

$x^{(0)} + \K_k(A, r^{(0)})$, where 

$r^{(k)} := b - Ax^{(k)}$. Although it is possible to derive CG from the Lanczos iteration just as GMRES was derived from the Arnoldi iteration, a simpler and more direct derivation is given in the notes on the conjugate gradient method.&lt;/p&gt;
&lt;h3 id=&#34;convergence-1&#34;&gt;Convergence&lt;/h3&gt;
&lt;p&gt;By definition, the errors in CG satisfy


$$
\norm{e^{(k)}}_A = \min_{p_{k-1} \in P_{k-1}} \norm{(I - A p_{k-1}(A)) e^{(0)}}_A = \min_{p_k \in P_k,\, p_k(0) = 1} \norm{p_k(A) e^{(0)}}_A,
$$
where 

$P_k$ denotes the vector space of polynomials with degree at most 

$k$. Using the fact that 

$A = Q \Lambda Q^{-1}$ for some orthogonal matrix 

$Q$ and some diagonal matrix 

$\Lambda$, and that 

$\norm{Qp_k(\Lambda)Q^{-1}}_A = \norm{p_k(\Lambda)}_2$, we obtain an estimate analogous to the one for GMRES.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Let 

$\sigma(A)$ denote the spectrum of 

$A$. Then


$$
\norm{e^{(k)}}_A \leq \min_{p_k \in P_k,\, p_k(0) = 1} \max_{\lambda \in \sigma(A)} \, \abs{p_k(\lambda)} \cdot \norm{e^{(0)}}_A.
$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Now suppose that the eigenvalues of 

$A$ are 

$\lambda_1 \geq \cdots \geq \lambda_n &gt; 0$ with 

$\lambda_1 \neq \lambda_n$. We know from approximation theory that the polynomial 

$p_k \in P_k$ with 

$p_k(0) = 1$ that minimizes 

$\max_{\lambda \in [\lambda_n, \lambda_1]} \abs{p_k(\lambda)}$ is 

$p_k = \frac{T_k \circ \alpha}{(T_k \circ \alpha)(0)}$, where 

$T_k$ is the 

$k$&lt;sup&gt;th&lt;/sup&gt; Chebyshev polynomial and 

$\alpha(t) := -1 + \frac{1 - (-1)}{\lambda_1 - \lambda_n}(t-\lambda_n)$ maps 

$[\lambda_n, \lambda_1]$ affinely to 

$[-1, 1]$. Since 

$\abs{T_k} \leq 1$ on 

$[-1, 1]$ and 

$\alpha(0) = -\frac{\kappa + 1}{\kappa - 1}$, where 

$\kappa := \kappa_2(A)$, upon evaluating 

$T_k$ at 

$\alpha(0)$ we obtain&lt;/p&gt;
&lt;blockquote&gt;


$$
\norm{e^{(k)}}_A 
\leq \frac{2}{\left(\frac{\sqrt{\kappa} + 1}{\sqrt{\kappa} - 1}\right)^k + \left(\frac{\sqrt{\kappa} + 1}{\sqrt{\kappa} - 1}\right)^{-k}} \, \norm{e^{(0)}}_A
\leq 2 \left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^{k} \norm{e^{(0)}}_A.
$$
&lt;/blockquote&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;In the algorithm below, standard Gram–Schmidt orthogonalization would set 

$v_j = v_j - \sum_{i=1}^j h_{ij} q_i$ after computing the 

$h_{ij}$ instead of updating it inside the loop. It is also possible to use Householder reflections.&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
    <item>
      <title>The conjugate gradient method</title>
      <link>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/cg/</link>
      <pubDate>Wed, 26 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://www.math.ucla.edu/~njhu/notes/nla/lin-iter/cg/</guid>
      <description>&lt;div class=&#34;btn-links mb-3&#34;&gt;
&lt;a class=&#34;btn btn-outline-primary btn-page-header btn-sm&#34; href=&#34;../cg.pdf&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;
  PDF
&lt;/a&gt;
&lt;/div&gt;
&lt;!--
No newlines allowed between $$&#39;s below!
--&gt;
&lt;div style=&#34;display: none;&#34;&gt;
$$
%% Sets and functions %%
\newcommand{\set}[1]{\{ #1 \}}
\newcommand{\Set}[1]{\left \{ #1 \right\}}
\renewcommand{\emptyset}{\varnothing}
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rn}{\mathbb{R}^n}
\newcommand{\Rm}{\mathbb{R}^m}
\newcommand{\C}{\mathbb{C}}
\newcommand{\Alpha}{\mathrm{A}}
\newcommand{\Beta}{\mathrm{B}}
%% Linear algebra %%
\newcommand{\abs}[1]{\lvert #1 \rvert}
\newcommand{\Abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\inner}[2]{\langle #1, #2 \rangle}
\newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle}
\newcommand{\norm}[1]{\lVert #1 \rVert}
\newcommand{\Norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\trans}{{\top}}
\newcommand{\span}{\mathop{\mathrm{span}}}
\newcommand{\K}{\mathcal{K}}
$$
&lt;/div&gt;
&lt;!-- BODY --&gt;
&lt;p&gt;The conjugate gradient method is an iterative method for solving the linear system 

$Ax = b$, where 

$A \in \mathbb{R}^{n \times n}$ is symmetric positive-definite.&lt;/p&gt;
&lt;p&gt;Let 

$\inner{x}{y}_A := \inner{Ax}{y}$ be the inner product defined by 

$A$ and 

$\norm{x}_A := \sqrt{\inner{x}{x}_A}$ be the induced norm. Given an initial guess 

$x^{(0)}$ for the solution 

$x^*$, the 

$k$&lt;sup&gt;th&lt;/sup&gt; iterate of the method is defined as


$$
x^{(k)} = \mathop{\mathrm{arg\,min}}_{x \in x^{(0)} + \K_k(A, r^{(0)})}\ \norm{x^* - x}_A\,,
$$
where 

$\mathcal{K}_k(A, r^{(0)})$ is the Krylov subspace 

$\span \set{A^j r^{(0)}}_{j = 0}^{k-1}$ and 

$r^{(0)} = b - Ax^{(0)}$. (In other words, the 

$A$-norm of the error is minimized over the 

$k$&lt;sup&gt;th&lt;/sup&gt; affine Krylov subspace generated by the initial residual and translated by the initial guess.)&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Let us abbreviate 

$\K_k(A, r^{(0)})$ as 

$\K_k$ and write 

$r^{(k)} = b - Ax^{(k)}$ for the residual of the 

$k$&lt;sup&gt;th&lt;/sup&gt; iterate. The iterate 

$x^{(k)}$ is therefore the 

$A$-orthogonal projection of 

$x^*$ onto 

$x^{(0)} + \K_k$, defined by the Galerkin conditions 

$x^{(k)} - x^{(0)} \in \K_k$ and 

$x^* - x^{(k)} \perp_A \K_k$; we note that the orthogonality condition is equivalent to 

$r^{(k)} \perp \K_k$.&lt;/p&gt;
&lt;p&gt;Now suppose that 

$\set{p^{(j)}}_{j &lt; k}$ is a basis of 

$\K_k$ and let 

$P_k = \begin{bmatrix} p^{(0)} &amp; \cdots &amp; p^{(k-1)} \end{bmatrix}$. Then 

$x^{(k)} = x^{(0)} + P_k y^{(k)}$, where


$$
y^{(k)} = \mathop{\mathrm{arg\,min}}_{y \in \R^k}\ \norm{x^* - (x^{(0)} + P_k y)}_A\,.
$$
If 

$p^{(k)}$ is such that 

$\set{p^{(j)}}_{j &lt; k+1}$ is a basis of 

$\K_{k+1}$, we can express the next iterate 

$x^{(k+1)}$ in an analogous manner – that is, 

$x^{(k+1)} = x^{(0)} + P_{k+1} y^{(k+1)}$, where 

$P_{k+1} = \begin{bmatrix} P_k &amp; p^{(k)} \end{bmatrix}$. Writing 

$y^{(k+1)} = \begin{bmatrix} \tilde{y}^{(k)} \\ \alpha_k \end{bmatrix}$ for some 

$\tilde{y}^{(k)} \in \R^k$ and 

$\alpha_k \in \R$, we see that


$$
\begin{align*}
x^* - (x^{(0)} + P_{k+1} y^{(k+1)})
&amp;= [x^{(k)} - (x^{(0)} + P_k \tilde{y}^{(k)})] + [(x^* - x^{(k)}) - \alpha_k p^{(k)}] \\
&amp;= P_k(y^{(k)} - \tilde{y}^{(k)}) + [(x^* - x^{(k)}) - \alpha_k p^{(k)}]\,.
\end{align*}
$$
Thus, if we select 

$p^{(k)}$ to be 

$A$-orthogonal to 

$p^{(j)}$ for all 

$j &lt; k$, then by the Pythagorean theorem,


$$
\begin{align*}
\norm{x^* - (x^{(0)} + P_{k+1} y^{(k+1)})}_A^2
&amp;= \norm{P_k(y^{(k)} - \tilde{y}^{(k)})}_A^2 + \norm{(x^* - x^{(k)}) - \alpha_k p^{(k)}}_A^2\,,
\end{align*}
$$
so the solution to the least squares problem for 

$y^{(k+1)}$ is given recursively by 

$\tilde{y}^{(k)} = y^{(k)}$ and 

$\alpha_k p^{(k)} = \mathrm{proj}^A_{p^{(k)}}(x^* - x^{(k)})$. It follows that


$$
\begin{align*}
x^{(k+1)}
% &amp;= x^{(0)} + P_{k+1} y^{(k+1)} \\
&amp;= x^{(0)} + P_k y^{(k)} + \alpha_k p^{(k)} \\
&amp;= x^{(k)} + \alpha_k p^{(k)}, \label{X}\tag{X}
\end{align*}
$$
where


$$
\begin{align*}
\alpha_k
&amp;= \frac{\inner{x^* - x^{(k)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\
&amp;= \frac{\inner{r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha}\tag{

$\Alpha$}
\end{align*}
$$
This also implies that


$$
\begin{equation}
r^{(k+1)} = r^{(k)} - \alpha_k Ap^{(k)}. \label{R}\tag{R}
\end{equation}
$$
To generate 

$A$-orthogonal vectors 

$p^{(j)}$ such that 

$\set{p^{(j)}}_{j &lt; k}$ is a basis of 

$\K_k$ for each 

$k$, we notice that 

$r^{(k+1)} \perp_A \K_k = \span \set{p^{(j)}}_{j &lt; k}$ because 

$r^{(k+1)} \perp \K_{k+1}$ and 

$A\K_k \subseteq \K_{k+1}$. As a result, when 

$r^{(k+1)}$ is 

$A$-orthogonalized against 

$p^{(k)}$, the resulting vector will automatically be 

$A$-orthogonal to 

$p^{(j)}$ for all 

$j &lt; k+1$, suggesting that we define


$$
\begin{align*}
p^{(k+1)}
&amp;= r^{(k+1)} - \mathrm{proj}^A_{p^{(k)}} r^{(k+1)} \\
&amp;= r^{(k+1)} + \beta_k p^{(k)}, \label{P}\tag{P}
\end{align*}
$$
where 

$p^{(0)} = r^{(0)}$ and


$$
\begin{equation}
\beta_k = -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Beta}\tag{

$\Beta$}
\end{equation}
$$&lt;/p&gt;
&lt;p&gt;Referring back to the residual equation (

$\ref{R}$), we can show by induction that the 

$p^{(j)}$ thus defined will also constitute bases of successive Krylov subspaces. More precisely, suppose that the solution has not been found by the beginning of the 

$k$&lt;sup&gt;th&lt;/sup&gt; iteration, in the sense that 

$r^{(j)} \neq 0$ for all 

$j &lt; k$. We claim then that 

$r^{(k-1)} \in \K_k$ and that 

$\set{p^{(j)}}_{j &lt; k}$ is an 

$A$-orthogonal basis of 

$\K_k$.&lt;/p&gt;
&lt;p&gt;Indeed, if 

$r^{(0)} \neq 0$, then 

$r^{(0)} \in \K_1 = \span \set{r^{(0)}}$ and 

$\set{p^{(0)}} = \set{r^{(0)}}$ is an 

$A$-orthogonal basis of 

$\K_1$. Now suppose that the claim holds up to the 

$k$&lt;sup&gt;th&lt;/sup&gt; iteration and that its hypothesis is satisfied at the beginning of the 

$(k+1)$&lt;sup&gt;th&lt;/sup&gt; iteration. Then


$$
r^{(k)} = r^{(k-1)} - \alpha_{k-1} Ap^{(k-1)} \in \K_k + A\K_k \subseteq \K_{k+1}\,,
$$
so


$$
p^{(k)} = r^{(k)} + \beta_{k-1} p^{(k-1)} \in \K_{k+1} + \K_k \subseteq \K_{k+1}\,.
$$
In addition, 

$p^{(k)} \neq 0$ because 

$r^{(k)} \perp \K_k$ and 

$r^{(k)} \neq 0$. Hence, by construction, 

$\set{p^{(j)}}_{j &lt; k+1}$ is an 

$A$-orthogonal set of nonzero vectors in 

$\K_{k+1}$ and is moreover a basis thereof, since 

$\dim(\K_{k+1}) \leq k + 1$.&lt;/p&gt;
&lt;p&gt;An immediate consequence is that 

$\set{r^{(j)}}_{j &lt; k}$ will be an orthogonal basis of 

$\K_k$ for all such iterations: if (say) 

$i &lt; j &lt; k$, then 

$r^{(i)} \in \K_{i+1} \subseteq \K_j$, and we know that 

$r^{(j)} \perp \K_j$. Furthermore, the iteration will break down exactly when 

$r^{(k)} \in \K_k$, or equivalently, 

$r^{(k)} = 0$, meaning that the solution was attained in the 

$k$&lt;sup&gt;th&lt;/sup&gt; iteration.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;We can also derive alternative formulas for the scalars 

$\alpha_k$ and 

$\beta_k$ that reduce the number of inner products in each iteration. First, using the fact that 

$r^{(k)} \perp \K_k = \span \set{p^{(j)}}_{j &lt; k}$, we obtain


$$
\begin{align*}
\alpha_k &amp;= \frac{\inner{r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\
&amp;= \frac{\inner{r^{(k)}}{r^{(k)} + \beta_{k-1} p^{(k-1)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\
&amp;= \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha2}\tag{

$\Alpha$}
\end{align*}
$$
Hence


$$
\begin{align*}
\beta_k &amp;= -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\
&amp;= -\alpha_k\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{r^{(k)}}{r^{(k)}}} \\
&amp;= \frac{\inner{r^{(k+1)}}{r^{(k+1)} - r^{(k)}}}{\inner{r^{(k)}}{r^{(k)}}} \\
&amp;= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}}\,. \label{Beta2}\tag{

$\Beta$}
\end{align*}
$$&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;In summary,&lt;/p&gt;
&lt;blockquote&gt;


$$
\begin{align*}
	&amp;r^{(0)} &amp;&amp;= b - Ax^{(0)} \\
	&amp;p^{(0)} &amp;&amp;= r^{(0)} \\
	\\
  &amp;\alpha_k &amp;&amp;= \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} &amp;&amp; \ref{Alpha2} \\
	&amp;x^{(k+1)} &amp;&amp;= x^{(k)} + \alpha_k p^{(k)} &amp;&amp; \ref{X} \\
	&amp;r^{(k+1)} &amp;&amp;= r^{(k)} - \alpha_k Ap^{(k)} &amp;&amp; \ref{R} \\
	&amp;\beta_k &amp;&amp;= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}} &amp;&amp; \ref{Beta2} \\
	&amp;p^{(k+1)} &amp;&amp;= r^{(k+1)} + \beta_k p^{(k)} &amp;&amp; \ref{P}
\end{align*}
$$
&lt;/blockquote&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;The choice of this minimization problem can be partially motivated as follows. In view of the fact that 

$x^* = x^{(0)} + A^{-1} r^{(0)}$​ and that 

$A^{-1}$​ is a polynomial in 

$A$​ of degree at most 

$n-1$​, in the 

$k$​&lt;sup&gt;th&lt;/sup&gt; iteration of the method, we seek an approximation to the solution of the form 

$x^{(0)} + p_{k-1}(A) r^{(0)}$​, where 

$p_{k-1}$​ is a polynomial of degree at most 

$k-1$​. This guarantees that the 

$A$​-norm of the error decreases monotonically and that the solution is found in at most 

$n$​ iterations (in exact arithmetic). Although the choice of the objective function is not canonical, it turns out that this choice leads to a particularly tractable method.&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
