# Numerical methods

We need surprisingly few numerical methods to be able to compute the CMB power-spectrum. A good ODE solver and a way of interpolating functions of one and two variables is basically all we need and we will briefly talk about the theory behind these below. The only special (non-standard) function we need is the spherical Bessel function. This is provided in the standard library of some programming languages like modern C++ and Python, but if its not provided in the language you use we have some notes on how you can compute it yourself if you cannot find a good library for this.

## Solving coupled ODEs

Systems of ordinary differential equations is something that comes up a lot in physics, and it will be crucial to be able to solve such equations in this course. The equations of most interest to us is initial value problems on the form $$\dot{\vec{y}} = \vec{f}(t,\vec{y})$$ with some initial value $\vec{y}(t_{\rm ini}) = \vec{y}_{\rm ini}$ where $\vec{y} = (y_1,y_2,\ldots,y_n)$. A simpler example for $n=2$ is the system $z' = w$ and $w' = -z$ which can be written on this form by taking $y_1 = z$ and $y_2 = w$ such that $\vec{y} = (z,w)$ and for which $\vec{f} = (z,-w) = (y_2,-y_1)$.

Note that any $k$th order differential equations in one variable, i.e. $a_ny^{(k)} + \ldots + a_1 y' + a_0 y = b$, can be reduced to this form. For example if we want to solve the second order ODE $y''(x) + \omega^2 y(x) = 0$ then we can define $y_1 = y$ and $y_2 = y_1'$ the system reduces to the two coupled equations $y_1' = y_2$ and $y_2' = -w^2 y_1$. On vector form with $\vec{y} = \pmatrix{y_1\\y_2}$ this can be written $\vec{y}'(x) = \vec{f}(x,\vec{y})$ with $\vec{f}(x,\vec{y}) = \pmatrix{y_2\\-w^2y_1}$. Most of the equations we will encounter are first order to begin with, but its very useful to know of this way of rewriting such equations as a system of first order equation to allow you to use standard ODE methods, like the ones mentioned below, to solve them.

There are many methods for solving such equations (Eulers method, implicit Euler, Runge-Kutta methods, ...). For the biggest systems we will solve in this course we will need to use an explicit method as implicit methods are just too slow. For these Eulers method is the simplest to implement $$\vec{y}(t_{n+1}) = \vec{y}(t_{n}) + \vec{f}(t_n,\vec{y}_n) \Delta t$$ where $\Delta t = t_{n+1} - t_n$ is the time-step. The next simplest is Runge-Kutta 2 (RK2) which consists of making the steps $$\vec{k}_1 = \vec{f}(t_n,\vec{y}_n)$$ $$\vec{k}_2 = \vec{f}(t_n + \frac{\Delta t}{2},\vec{y}_n + \frac{\vec{k}_1\Delta t}{2})$$ $$\vec{y}(t_{n+1}) = \vec{y}(t_{n}) + \vec{k}_2 \Delta t$$ and then we have perhaps the most commonly used solver in the world Runge-Kutta 4 (RK4) $$\vec{k}_1 = \vec{f}(t_n,\vec{y}_n)$$ $$\vec{k}_2 = \vec{f}(t_n + \frac{\Delta t}{2},\vec{y}_n + \frac{\vec{k}_1\Delta t}{2})$$ $$\vec{k}_3 = \vec{f}(t_n + \frac{\Delta t}{2},\vec{y}_n + \frac{\vec{k}_2\Delta t}{2})$$ $$\vec{k}_4 = \vec{f}(t_n + \Delta t,\vec{y}_n + \vec{k}_3\Delta t)$$ $$\vec{y}(t_{n+1}) = \vec{y}(t_{n}) + \frac{\Delta t}{6}(\vec{k}_1+2\vec{k}_2+2\vec{k}_3+\vec{k}_4)$$ and there are many many more. The higher the order of the method we use the more accurate the solution (typically) is. However as you see the futher up in order we go the more evaluations of $\vec{f}$ (the right hand side of the ODE system) we have to make which can make it more expensive to compute so one have to make some compromises between speed and accuracy and RK4 is usually a good such compromise. When we get to integrate the cosmologial perturbations then we get systems of $\sim 20$ equations and we need to compute a lot of exponentials, logarithms, spline lookups etc. so it can get quite expensive and for this RK2 works quite well and is quite a bit faster. Anyway you can experiment a bit with this if you want.

Having choosen the method we need to select the time-step. This is often best done adaptively, i.e. we pick the time-step after each step such that the error in the solution will be below some user-defined tolerance allowing us to pick larger time-steps when little is happening and small time-steps when the solution changes fast. For example for the GSL ODE solver we use you can specify a tolerance for the absolute and relative error. The smaller the values the better, but its also slower. The best values are typically best found by experimenting. For more info take a look at the GSL documentation.

The methods above are just the simplest ones. For cases where the equation system is stiff then we often end up with having to pick a very small time-step to guarantee the accuracy of the solution and it becomes super slow. For these cases there are much better methods however they often require us to provide $\frac{\partial\vec{f}}{\partial t}$ and the Jacobian $J_{ij} = \frac{\partial\vec{f}_i}{\partial y_j}$ as additional inputs. If you have $20$ equations this requires you to set a $20\times 20$ matrix so can get messy and is much more work to implement, but can significantly speed up the integration. The state of the art CMB codes today often have specialized integrators to make it as efficient as possible (plus they also rely on different approximation schemes for the equations themselves to help speed it up). Anyway I just wanted to mention this and its not something I would reccomend spending time on unless you really want to try it out.

Note that many of the integrals we encounter in this course like can be rewritten as ODEs (e.g. $f(x) = \int_0^x g(x')dx'$ becomes $f'(x) = g(x)$ with $f(0) = 0$) and solved with our ODE solver. In fact we can also solve improper integrals this way, e.g. for $I = \int_0^\infty g(x)dx$ we can define $f(x) = \int_0^x g(x)dx$ and integrate $f'(x) = g(x)$ with $f(0) = 0$ from $x=0$ till some large value of $x$ (ensuring that it has converged) and take this as the value of the improper integral. Note that there are better (faster / more accurate) methods for evaluating integrals than this, but this is a quick and dirty way of doing it that often gives good results. (Though for some of the oscillatory integrals we will encounter in the end of this course its better to just to the integration directly).

## Spline interpolation

For functions where you only have the result on a discrete set of points (like the result of an ODE solver) then we need a method to interpolate the function to any other value. The way we are going to deal with this is to make a so-called cubic spline. This is the spline equivalent of what RK4 is for ODEs - a good compromise between accuracy and speed. It has the property that it gives exact results for any polynomial of degree $3$ or lower and since most of the functions we will spline and nice smooth functions this will give us good results.

Given data $(x_i, y_i)_{i=1}^n$ we want to approximate this with a qubic polynomial on each interval $[x_i,x_{i+1})$, i.e. we want to find a set of coefficients $(a_i,b_i,c_i,d_i)_{i=1^{n-1}}$ such that $$S_i(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$$ has the property that $S_i(x_i) = y_i$ and the neighboring polynomials $S_i$ and $S_{i+1}$ have a matching function value and first two derivatives at $x=x_{i+1}$. This ensures that the function we get is two times differentiable. This is always possible to do (see link above for a concrete algorithm). We have a free choice of boundary conditions (values for $y'$ at the endpoints) that must be specified in addition to this. The "standard" choice for doing this is to select the so-called natural spline where the boundary conditions are chosen such that we have $y'' = 0$ at the end-points. The advantage of this is that we don't have to supply the boundary conditions (which we might not know) and the disadvantage is that it will be inaccurate close to the boundaries if the function we spline do not satisfy $y'' = 0$. This is often not a big issue in practice, but useful to be aware of this. If you have issues close to the boundaries then one can just extend the range you spline over or use more points when making the spline (only a real issue for $x$ values close to the first or last bin). Another advantage of making a cubic spline is that once we have made it, we can use it to look up not just the function value but its first two derivatives at any point. This will be very useful for us as we often will have to compute derivatives of the functions we will spline.

Having made a spline, how does it work to do a lookup? After making it what we effectively have is arrays with the coefficients $a_i,b_i,c_i,d_i$ above and we can then compute the function value by evaluating $S_i(x)$ as given above. This only costs a few artihmetic operations so its very fast. But to do so we need to first find the interval $[x_i,x_{i+1})$ that contains the $x$ we want to compute and this is usualy what takes time. This is often done using binary search on the $x_i$-array. Also to make lookups faster what one typically do is to cache (save) the last interval one did a lookup on and if the next $x$-value happens to be in the same interval one can avoid the binary search. It is therefore often faster to evaluate a spline in a ordered fashion rather than doing random lookups.

For 2D splines, i.e. splines of functions $f(x,y)$ of two variables, we can use the natural generalization of the algorithm above called the bicubic spline (basically a qubic spline in each variable). Once we have this then this also allows us to evaluate cross derivatives $\frac{\partial ^2 f}{\partial x\partial y}$ in addition to the usual derivatives $\frac{\partial ^2 f}{\partial x^2}$, $\frac{\partial ^2 f}{\partial y^2}$, $\frac{\partial f}{\partial x}$, $\frac{\partial f}{\partial y}$.

## Spherical Bessel Functions  Figure: The spherical bessel functions $j_\ell(x)$ for $\ell = 5$ (above) and $\ell = 50$ (below).

The regular (at $x=0$) solution to the ODE $$x^{2}{\frac {d^{2}y}{dx^{2}}}+2x{\frac {dy}{dx}}+\left(x^{2}-\ell(\ell+1)\right)y=0$$ is called the spherical Bessel function $j_\ell(x)$. Close to $x=0$ it satisfies the asymptotic form $$j_\ell \sim \frac{2^\ell \ell!}{(2 \ell + 1)!}x^\ell$$ See the figure above for how it behaves. For large $\ell$ we have $j_\ell(x) \approx 0$ for $x \lesssim 0.8\ell$, then it rises to a peak at around $x \sim \ell$ followed by damped oscillations with period $\sim 2\pi$.

The number $0.8$ above was just a rough estimate. In practice, for $\ell \geq 10$, one can show numerially that $j_\ell$ is roughly $10^{-6}$ of its maximum value around $x \sim (1.0 - 2.6/\sqrt{\ell}) \ell$.

The ODE above can be used to solve for it, however as $x=0$ is a singular point we must start the integration at some finite $x$ and use the asymptotical formula to set the initial condition. This finite value must be large enough to avoid numerical issues (the initial value cannot be too small compared to machine precision).

Another way of solving it numerically is to use recursion formulas. We have $$j_{\ell+1}(x) + j_{\ell-1}(x) = \frac{2\ell+1}{x}j_{\ell}(x)$$ Introducing $y_\ell(x) = \frac{j_{\ell+1}(x)}{j_{\ell}(x)}$ then we can write $$y_{\ell-1} = \frac{x}{(2\ell+1) - xy_{\ell}}$$ which gives us a recursion relation that is very useful. We cannot use forward recursion as this is not numerically stable, but backward recursion works very well. The asymptotic formula we have above tells us that for large enough $\ell$ we have $$y_\ell \sim \frac{2(\ell+1) x}{(2\ell+2)(2\ell+3)} \sim \frac{x}{2\ell} \ll 1$$ So for a fixed $x$ and large enough ${\ell_{\rm max}}$ we have $y_{\ell_{\rm max}} \approx 0$ and we can use backward recursion with this as the initial seed to compute $y_{\ell_{\rm max}-1},y_{\ell_{\rm max}-2},\ldots,y_2,y_1,y_0$. Having found these we can compute $j_\ell(x)$ by multiplying together the terms we have computed $$j_\ell = j_0 \cdot (y_0 \cdot y_1 \cdots y_{\ell-1})$$ where $j_0 = \frac{\sin(x)}{x}$. This algorithm is numerically stable which is a big plus! The other advantage is that it computes all of $j_0(x),j_1(x),\ldots,j_{\ell_{\rm max}}(x)$ at the same time. The disadvantage is that you might need a large $\ell_{\rm max}$ to ensure $y_{\ell_{\rm max}} \approx 0$ (we need atleast $\ell \gt x$) so it can get expensive, espesially if you only want to use this to compute one particular value $j_\ell(x)$. However the good thing about this function is that we only really need to compute it once. For example you can compute it for all $x$ and $\ell$ of interest and then store the data in a binary file. Then every time you want to work with them you can read this data from file and make splines. If you do this then it doesn't matter how long time it takes to compute it that one time. That said if you have a good library to use its not that expensive to compute it from scratch every time.

Some languages have this function built-in. For example in C++ (17 or larger) you can compute this by calling std::sph_bessel(ell, x). This implementation breaks down for $x \gtrsim 14000$. The GSL library can also be used for this by calling gsl_sf_bessel_jl(ell, x). For $x \gtrsim 9000$ this implementation also breaks down. It might also have some issues for small arguments and large $\ell$ (for which the function is extremely close to zero - but this is easy to fix as we just need to put the value to $0$ here). In Python there is an implementation of this in scipy.special.spherical_jn. In C you can probably "steal" the implementation in CLASS and in Fortran you can probably "steal" the implementation in CAMB (though we also have an implementation of this in the Fortan template of the numerical project). For all of these implementations remember that $j_\ell(0)$ is undefined so its a good idea to test for this and return $1$ if $\ell=0$ and $0$ otherwise

### Ultra Spherical Bessel Functions

If we want to study cosmological models with non-zero curvature then things are a bit more complicated. Instead of getting spherical bessel function in the line of sight integrals we get a more general class of functions called Ultra / Hyper Spherical Bessel Functions. The CMB literature has several papers on how to compute these efficiently (, , ), but I won't go into detail about it here as this is beyond the scope of this course.

## External codes: Recombination solvers

There are several codes availiable online that computes the recombination history and does this in much greater detail than we do. First we have Recfast++. This is the same code that is (most commonly) used within Einstein-Boltzmann solvers like CAMB and CLASS. Then there is also CosmoRec and HyRec. If you want a more accurate recombination analysis (that takes many excited states into account which again has a $\%$-level effect on $X_e,T_b$) then you can use these codes and feed that as input to your code (or better: call these routins from within your code. This is very easy to do with Recfast++. I'm not that familiar with the others to know how easy that would be). These codes also provides a good way of checking how accurate our simplified treatment really is.

If we want to compare the result of $C_\ell$ with another Einstein-Boltzmann solver (and you want to check accuracy at the $\%$-level) then its a good idea to make sure that the recombination history is exactly the same in both codes to more clearly see what the differences (coming from how you solve the equations, accuracy settings, approximations you use etc.) are.

## External codes: Einstein-Boltzmann solvers

There are several high-quality codes to compute the CMB and matter power-spectra, just as we are doing in this course, availiable online. These are the codes that you want to use for actual science. For us these codes are a good way of checking that the results you get are correct. The two main ones are CAMB (Fortran) and CLASS (C).

NASA has also made a very useful online tool that you can use to compute the CMB and matter power-spectra using CAMB directy in your browser (just set the parameters on a website and press play), see CAMB Web Interface.

## Generating a gaussian random field

A gaussian random field has the nice property that it's (statistically) fully described by the two-point correlelation function (all higher moments are functions of the second moment). It's convenient to study this in Fourier space since there each mode will be independent of each other and here the field is fully specified by the power-spectrum (the fourier transform of the two-point correlation function). In Fourier space we have $$\left<\delta({\bf k}_1) \delta^*({\bf k}_2)\right> = (2\pi)^3P(k)\delta({\bf k}_1 - {\bf k}_2)$$ where brackets denote an assemble mean (the mean over many "realisations" of our universe. By the ergotic theorem this is equal to the volume average in the limit where we have a large number of modes.). In a simulation where we discretize the density field on a grid (which leads to a discrete set of wavevectors) we simply have $\left<|\delta({\bf k})|^2\right> = P(k)$ where the mean is over all wavevectors we have that have satisfy $|{\bf k}| = k$. We can write $\delta({\bf k}) = \delta_{\rm Re} + i \delta_{\rm Im} = |\delta({\bf k})| e^{2\pi i \theta}$. The real and imaginary part will be gaussian random numbers with mean $0$ and variance $P(k)$. Alternatively the norm can be shown to be Raylight distributed and the phase will be a uniform random number in $[0,1]$. One way to generate the norm is to generate a uniform number (here we are using the result that if $X$ is a random variable with CDF $F(X)$ then $X = F^{-1}(U)$ where $U$ is a uniform random number in $[0,1]$. For a Raylight distribution we have $f(x) \propto xe^{-x^2}$ so $F(x) = 1 - e^{-x^2}$ and $F^{-1}(x) = \sqrt{-\log(1-x)}$ and if $x$ is uniform in $[0,1]$ then so is $1-x$.) $A$ in $(0,1)$ and then put $|\delta({\bf k})| = \sqrt{-P(k)\log(A)}$. Thus we can generate the field numerically by the following recipe

• Set up a uniform grid with $N$ points in each dimension.
• In Fourier space this will represent $\delta({\bf k}_j)$ for wavenumbers (for each component) from $-2\pi j /N$ to $2\pi j /N$ where $j=0,1,2,\ldots,N/2$.
• Loop through the grid and for each point generate two uniform (in $[0,1)$) random numbers $A_j$ and $\theta_j$ and set $\delta({\bf k}_j) = \sqrt{-P(k)A_j}e^{2\pi i \theta_j}$. But be careful with symmetries: since $\delta(x)$ is real so from the definition of a Fourier transform we have $\delta({\bf k}) = \delta^*(-{\bf k})$ so we must ensure that this property is satisfied.

You can find this implemented in my Cosmology C++ library (GitHub).

See Lectures on non-gaussianity by Eiichiro Komatsu and Lectures on gaussian random fields by M. Dijkstra for some more detailed discussion on the theory of gaussian random fields.

## Generating a CMB map

From the theoretical $C_\ell$ power-spectrum we can generate a CMB map. Generating a map is a fairly straight forward thing to do, but the hard part of this is to pixelate the sphere and project that down to a 2D map that we can plot. Luckily there is a great library for this namely the HEALPIX library (for this you also need the CFITSIO library). There is also a Python wrapper for this library called Healpy. You can supply that library with the $C_\ell$'s or a realization of the $a_{\ell m}$'s (which you can generate from the $C_\ell$'s) and have it provide you the map.

Healpix has routines for generating maps directly from a power-spectrum or from $a_{\ell m}$'s. If you want to generate your own realization of $T(\hat{n})$ then note that the $a_{\ell m} = x + iy$ with $x,y$ being random numbers drawn from a gaussian distribution with variance $C_\ell/\sqrt{2}$. A simple way to generate a realization is to draw (for each value of $\ell,m$) two random numbers $A,\theta$ from a uniform distribution on $[0,1)$ and set $$a_{\ell m} = \sqrt{-\log(A)C_\ell}e^{2\pi i \theta}$$ Note that $T = T^*$ is real so $$a_{\ell, -m} = (-1)^m a_{\ell m}^*$$ and we only need to generate them for $m\ge 0$. The other way around: if you want to estimate the angular power-spectrum from a set of $a_{\ell m}$'s by computing $$\hat{C}_\ell = \frac{1}{2\ell + 1}\sum_{m=-\ell}^\ell |a_{\ell m}|^2$$ This should give you back the input power-spectrum for large $\ell$ (and will suffer from cosmic variance for small $\ell$).

If you want to plot the CMB map (high resolution) as measured by Planck then you can download a FITS file with the data (2 GB), the Planck color palette here and then run the Python script that can be found here. This will give you a 170MB image.

## List of various cosmology codes

A list of some useful cosmology codes for doing various things:

N-body and hydro initial conditions:

• MUSIC by Oliver Hahn. Initial condition up to third order in Lagrangian perturbation theory. Multiscale initial conditions. With baryons etc etc etc. Written in C++.
• 2LPTic by Roman Scoccimarro. Second order in LPT.

High resolution N-body simulations:

• RAMSES by Romain Teyssier. Highly parallel adaptive mesh refinement N-body / hydro code with radiation transfer and many other features. Written in Fortran 90.
• GADGET by Volker Springel. Highly parallel tree / particle mesh N-body code with smoothed particle hydrodynamics and many other features. Written in C.

Fixed resolution N-body simulations:

• PICOLA by Cullan Howlett. Fast approximative N-body in LCDM. Written in C.
• MG-PICOLA. Fast approximative N-body extended to modified gravity etc. theories and adds some useful functionality like power-spectra (real and multipoles) and halos on the fly to the L-PICOLA code. Written in C.
• GEVOLUTION by Julian Adamek. Includes all metric perturbations up to second order + neutrinos. Written in C++.
• CONCEPT by Jeppe Dakin and Thomas Tram. N-body with massive neutrinos. Written in (C)Python.

Halo finders:

• Rockstar by Peter Behroozi. Phase space halofinder. Has become the one everybody uses in cosmology. Written in C.
• AHF by Alexander Knebe. Grid based halo finder. Good documentation and has usergroup for question. Written in C.
• MatchMaker by David Alonso. Friends of Friends halo finder. Written in C.

Void finders:

• ZOBOV by Mark Neyrinck. Tesselation based void finder (using qhull). Written in C.
• REVOLVER by Sesh Nadathur. Tesselation based void finder. Includes RSD reconstruction. Based on ZOBOV. Written in Python.
• VIDE by Guilhem Lavaux and P.M. Sutter. Tesselation based void finder. Based on ZOBOV. Written in Python.

Einstein-Boltzmann solvers:

• CAMB by Anthony Lewis and Anthony Challinor. Good and fast. Many extensions availiable. Connects to CosmoMC by the same author for doing parameter inference. Written in Fortran 90, but includes Python interface.
• CLASS by Julien Lesgourgues. Written in C. The fastest solver to date. Tons of functionality and exensions. Well thoughtout design and structure of the code.
• Hi-CLASS by Miguel Zumalacarregui and Emilio Bellini. Extension of the CLASS code to a huge class of cosmological models (dark energy and modified gravity). Built on CLASS. Written in C.
• EFTCAMB by B. Hu, M. Raveri, N. Frusciante and A. Silvestri. Extension of the public Einstein-Boltzmann solver CAMB, which implements the Effective Field Theory approach to cosmic acceleration. Similar to Hi-CLASS, but for CAMB. Written in Fortran.

Recombination:

• RECFAST by Seager, Scott, Sasselov. Written in C, but Fortran and C++ versions availiable.
• HYREC by Yacine Ali-Haïmoud and Chris Hirat. A code for primordial hydrogen and helium recombination including radiative transfer. Written in C.
• COSMOREC by Jens Chluba.

MCMC and likelihood inference:

• CosmoMC by Anthony Lewis. Parallelized MCMC sampler (general and cosmology). For cosmology it includes likelihoods for CMB, galaxy surveys, supernovea etc. etc. Very easy to get started with (but a bit harder to modify). Written in Fortan 2008.
• Monte Python by Benjamin Audren. The CosmoMC of CLASS. Written in Python.
• Victor by Seshadri Nadathur. Modelling and likelihood analysis of void-galaxy cross-correlation data. Written in Python.
• emcee by Dan Foreman-Mackey. Ensemble sampling toolkit for affine-invariant MCMC. Written in Python.

Other things:

• CUTE by David Alonso. Compute correlation functions from galaxy catalogs and simulation boxes. Written in C.
• FFTLog by A. Slosar. C and C++ implementations of the FFTLog algorithm.
• FML by me. A C++ library for working with particles and grids and solving PDEs in parallel with up to thousands of tasks. The library uses MPI, OpenMP (or both) for the parallelization, though everything also works without a MPI compiler. We take care of most of the communication between tasks under the cover so the user don't have to deal too much with that. It is made mainly for analyzing the large scale structure of the Universe, but it is written in general so it can be useful outside of cosmology. It is not meant to be a replacement for common cosmology codes that does similar things (like initial condition generation, simple N-body simuations, halo and void finders, reconstruction etc.) although these things are very easy to do with the library. The goal of the library is rather provide users with the basic tools needed to working in parallel so that one don't have to start by making the tools before doing the science. The philosophy of the library is to implement general algorithms without making (too many) assumptions about how they are to be used and let the user take all these decitions thus making it easier to do new things without having to modify the whole code. The classes we provide are therefore templated on dimension (well knowing that nobody is going to use it for anything else that $N=2$ or $N=3$) and the types and the algorithms are templated on classes that determine how to process and compile up the data it computes. To be able to do Fourier transforms we settle for a simple slab-based domain decomposition for the parallelization. The library contains algorithms for assigning particles to grids, linking particles together in groups, general watershed, tesselations, fourier transforms, computing correlation function in real and fourier space (powerspectrum, bispectrum, trispectrum, ...), creating particle distributions, Lagrangian perturbation theory, a general multigrid solver for solving non-linear PDEs and many more things. We also provide wrappers of methods in libraries like GSL, CGAL and LUA to more easily being able to perform various tasks.