# 4.3. Running MCMC¶

This is the vanilla Monte Carlo Markov Chain with the Metropolis algorithm, as introduced in the previous section The Metropolis-Hastings sampler.

We will now proceed to run MCMC for the \(\Lambda\text{CDM}\) model.
The `epic.py`

script accepts four commands: `run`

, `analyze`

, `plot`

and `monitor`

.
The commands must be issued from your terminal at the `EPIC`

directory.

## Sampling the posterior distribution¶

Standard MCMC is the default option for sampling.
We start a MCMC simulation with the `run`

command:

```
$ python epic.py run LCDM.ini 12 10000 --sim-full-name MyFirstRun --check-interval 6h
```

where the first argument is the `.ini`

file, the second is the number of
chains to be run in parallel and the third is the number of steps to be run in
each MCMC iteration.
This number of steps means that the chains will be written to the disk (the new
states are appended to the chains files) after each `steps`

states in all
chains.
A large number prevents frequent writing operations, which could impact overall
performance unnecessarily.
Each chain will create an independent process with Python’s
`multiprocessing`

, so you should not choose a
number higher than the number of CPUs `multiprocessing.cpu_count()`

of your
machine.
The number of chains should not be less than 2.

If correlations in the initial proposal covariance matrix are needed or
desired, the user can overwrite the diagonal proposal covariance matrix with
the option `--proposal-covariance`

followed by the name of the file
containing the desired covariance matrix in a table format.

The program creates a directory for this new simulation. Inside this directory,
another one named `chains`

receives the files that will store the states of
the chains, named `chain_0.txt`

, `chain_1.txt`

, etc.
Other relevant files are equally named but stored in the folder
`current_states`

. These store only the last state of each chain, to allow
fast resuming from where the chain stopped, without need to loading the entire
chains.
The chains start at points randomly sampled from the priors, unless
`--no-multi-start`

is given, in which case all chains start at the same
point, with coordinates given by the default values of each parameter.
A good starting point may help to obtain convergence faster, besides
eliminating the need for burn-in [1].
The name of the folder is the date-time of creation (in UTC), unless the option
`--sim-full-name MyFirstRun`

is used, then `MyFirstRun`

will be the
folder name.
A custom label or tag can also be prepended with the `--sim-tag my_label`

option, for example, the folder `my_label-171110-173000`

.
It will be stored within another folder with the same name of the `.ini`

file, thus in this case `LCDM/MyFirstRun`

.
The full path of the simulation will be displayed in the first line of the output.

An existing simulation can be resumed from where it last saved information to
disk with the same command, just giving the path of the simulation instead of
the `.ini`

file name, and omitting the number of chains, which has already
been defined in the first run, for example:

```
$ python epic.py run <FULL-OR-RELATIVE-PATH-TO>/LCDM/MyFirstRun/ 5000
```

Another important parameter is the tolerance accepted for convergence assessment.
By default, the program will stop when the convergence check finds, according
to the Gelman-Rubin method, convergence with \(\epsilon < 0.01\).
To change this value, you can use `--tolerance 0.002`

or just `-t 2e-3`

,
for example.
In the MCMC mode, the code will periodically check for convergence according
to the Gelman-Rubin method (by default it is done every two hours but can be
specified differently as `--check-interval 12h`

or `-c 45min`

in the
arguments, for example.
This does not need to (and should not) be a small time interval, but the option
to specify this time in minutes or even in seconds (`30sec`

) is implemented
and available for testing purposes.

The relevant information will be displayed, in our example case looking similar to the following:

```
Simulation at <FULL-PATH-TO>/LCDM/MyFirstRun
Mode MCMC.
The following datasets will be used:
6dF+SDSS_MGS
HST_local_H0
JLA_simplified
Planck2015_distances_LCDM
SDSS_BOSS_CMASS
SDSS_BOSS_LOWZ
SDSS_BOSS_Lyalpha-Forests
SDSS_BOSS_QuasarLyman
SDSS_BOSS_consensus
cosmic_chronometers
Initiating MCMC...
```

and the MCMC will start.

The MCMC run stops if convergence is achieved with a tolerance smaller than
the default or given `tolerance`

.

The following is the output of our example after the MCMC has started.
The 10000 steps take a bit more than thirty minutes in my workstation running 12
chains in parallel.
The number of chains will not make much impact on this unless we use too many
steps by iteration and work close to the machine’s memory limit.
After approximately six hours, convergence is checked. Since it is larger than
our required `tolerance`

, the code continues with new iterations for another six
hours before checking convergence again and so on. When convergence smaller than
`tolerance`

is achieved the code makes the relevant plots and quits.

```
Initiating MCMC...
i 1, 10000 steps, 12 ch; 32m45s, Sat Mar 24 00:29:13 2018. Next: ~5h27m.
i 2, 20000 steps, 12 ch; 32m53s, Sat Mar 24 01:02:07 2018. Next: ~4h54m.
i 3, 30000 steps, 12 ch; 32m52s, Sat Mar 24 01:34:59 2018. Next: ~4h21m.
i 4, 40000 steps, 12 ch; 32m54s, Sat Mar 24 02:07:54 2018. Next: ~3h48m.
i 5, 50000 steps, 12 ch; 32m51s, Sat Mar 24 02:40:46 2018. Next: ~3h15m.
i 6, 60000 steps, 12 ch; 32m53s, Sat Mar 24 03:13:39 2018. Next: ~2h42m.
i 7, 70000 steps, 12 ch; 33m3s, Sat Mar 24 03:46:43 2018. Next: ~2h9m.
i 8, 80000 steps, 12 ch; 32m52s, Sat Mar 24 04:19:35 2018. Next: ~1h36m.
i 9, 90000 steps, 12 ch; 32m55s, Sat Mar 24 04:52:30 2018. Next: ~1h3m.
i 10, 100000 steps, 12 ch; 32m52s, Sat Mar 24 05:25:23 2018. Next: ~31m3s.
i 11, 110000 steps, 12 ch; 32m46s, Sat Mar 24 05:58:10 2018. Checking now...
Loading chains... [##########] 12/12
Monitoring convergence... [##########] 100%
R-1 tendency: 7.993e-01, 8.185e-01, 7.745e-01
i 12, 120000 steps, 12 ch; 32m49s, Sat Mar 24 06:31:21 2018. Next: ~5h27m.
i 13, 130000 steps, 12 ch; 32m53s, Sat Mar 24 07:04:15 2018. Next: ~4h54m.
i 14, 140000 steps, 12 ch; 32m49s, Sat Mar 24 07:37:04 2018. Next: ~4h21m.
...
```

After the first loop, the user can inspect the acceptance ratio
of the chains, updated after every loop in the section `acceptance rates`

of
the `simulation_info.ini`

file.
Chains presenting bad performance based on this acceptance rate will be discarded.
The values considered as good performance are any rate in the interval from
\(0.1\) to \(0.5\).
This can be changed with `--acceptance-limits 0.2 0.5`

, for example.
If you want to completely avoid chain removal use `--acceptance-limits 0 1`

.

## Adaptation (beta)¶

When starting a new run, use the option `--adapt FREE [ADAPT [STEPS]]`

, where
`FREE`

is an integer representing the number of loops of free random-walk
before adaptation, `ADAPT`

is the actual number of loops during which
adaptation will take place, and `STEPS`

is the number of steps for both,
pre-adapting and adapting phases.
Note that the last two are optional.
If `STEPS`

is omitted, the number of steps of the regular loops will be used
(from the mandatory `steps`

argument);
if only one argument is given, it is assigned to both `FREE`

and `ADAPT`

.
At least one loop of free random-walk is necessary, so the starting states can
serve as input for the adaptation phase.
The total number of states in the two phases will be registered in the
`simulation_info.ini`

file (`adaptive burn-in`

) as the minimal necessary
burn-in size if one wants to keep the chains Markovian.
The adaptation is an iterative process that updates the covariance matrix
\(\mathbf{S}_t\) of the proposal function for the current \(t\)-th
state based on the chains history and goes as follows.
Let \(\mathbf{\Sigma}_t\) be a matrix derived from \(\mathbf{S}_t\), defined by
\(\mathbf{\Sigma}_t \equiv \mathbf{S}_t \, m/2.38^2\), where \(m\) is
the number of free parameters.
After the initial free random-walk phase, we calculate, for a given chain, the chain mean

where \(\mathbf{x}_i = \left(x_{1,i}, \ldots, x_{m,i}\right)\) is the \(i\)-th state vector of the chain. The covariance matrix for the sampling of the next state \(t+1\) will then be given by

where \(\theta_{t+1} = \theta_{t} + \gamma_t \left( \eta_t - 0.234 \right)\), \(\gamma_t = t^{-\alpha}\), \(\theta_t\) is a parameter that we set to zero initially, \(\alpha\) is a number between \(0.5\) and \(1\), here set to \(0.6\), \(\eta_t\) is the current acceptance rate, targeted at \(0.234\), and

In this program, the covariance matrix is updated based on the first chain and applied to all chains. This method is mostly based on the adaptation applied to the parallel tempering algorithm by Łącki & Miasojedow (2016) [2]. The idea is to obtain a covariance matrix such that the asymptotic acceptance rate is \(0.234\), considered to be a good value. Note, however, that this feature is in beta and may require some trial and error with the choice of both free random-walk and adaptation phases lengths. It might take too long for the proposal covariance matrix to converge and the simulation is not guaranteed to keep good acceptance rates with the final matrix obtained after the adaptation phase.

## Analyzing the chains¶

Once convergence is achieved and MCMC is finished, the code reads the chains
and extract the information for the parameter inference.
But this can be done to assess the results while MCMC is still going on.
The distributions are compiled for a nice plot with the command `analyze`

:

```
$ python epic.py analyze <FULL-OR-RELATIVE-PATH-TO>/LCDM/MyFirstRun/
```

Histograms are generated for the marginalized distributions using 20 bins or
any other number given with `--bins`

or `-b`

.
At any time one can run this command, optionally with `--convergence`

or
`-c`

to check the state of convergence.
By default, this will calculate \(\hat R^p - 1\) for twenty
different sizes (which you can change with `--GR-steps`

) considering the size
of the chains to provide an idea of the evolution of the convergence.

Convergence is assessed based on the Gelman-Rubin criterium.
I will not enter into details of the method here, but I refer the reader to the
original papers [4] [5] for more information.
The variation of the original method for multivariate distribution is implemented.
When using `MCMC`

, all the chains are checked for convergence and the final
resulting distribution which is analyzed and plotted is the concatenation of
all the chains, since they are essentially all the same once they have
converged.

### Additional options¶

If for some reason you want to view the results for an intermediate point of
the simulation, you can tell the script to `--stop-at 18000`

, everything will
be analyzed until that point.

If you want to check the random walk of the chains you can plot the sequences
with `--plot-sequences`

. This will make a grid plot containing all the chains
and all parameters. Keep in mind that this can take some time and generate a
big output file if the chains are very long. You can contour this problem by
thinning the distributions by some factor `--thin 10`

, for example.
This also applies for the calculation of the correlation of the parameters in
each chain, enabled with the `--correlation-function`

option.

We generally represent distributions by their histograms but sometimes we may
prefer to exhibit smooth curves. Although it is possible to choose a higher
number of histogram bins, this may not be sufficient and may required much more
data.
Much better (although possibly slow when there are too many states) are the
kernel density estimates (KDE) tuned for Gaussian-like distributions [6].
To obtain smoothed shapes use `--kde`

. This will compute KDE curves
for the marginalized parameter distributions and also the two-parameter joint
posterior probability distributions.

## Making the triangle plots¶

The previous command will also produce the plots automatically (unless
supressed with `--dont-plot`

), but you can always redraw everything when you
want, maybe you would like to tweak some colors? Loading the chains again is
not necessary since the analysis already saves the information for the plots
anyway.
This is done with:

```
$ python epic.py plot <FULL-OR-RELATIVE-PATH-TO>/LCDM/MyFirstRun/
```

The code will find the longest chain analysis results and plot them.
In this plot, you can, for example, choose the main color with `--color m`

for magenta, or with any other Python color name, and omit the best-fit point mark
with `--no-best-fit-marks`

.
The default color is `C0`

(a shade of blue, from the default Matplotlib
palette that ranges from `C0`

to `C9`

).

Plot ranges and eventually necessary factors of power of 10 are automatically
handled by the code, unless you use `--no-auto-range`

and `--no-auto-factors`

or choose your own ranges or powers of 10 by specifying a dictionary with
parameter label as keys and a pair of floats in a list or an integer power of
10 for the entries `custom range`

and `custom factors`

under the section
`analysis`

in the `.ini`

file.

If you have used the option `--kde`

previously in the analysis you need to
specify it here too to make the corresponding plots, otherwise the histograms
will be drawn.

The \(1\sigma\) and \(2\sigma\) confidence levels are shown and are
written to the file `hist_table.tex`

(and `kde_table.tex`

if it is the
case) inside the folder with the size of the chain preceeded by the letter
`n`

.
These \(\LaTeX\)-ready files can be compiled to make a nice table in a PDF
file or included in your paper as you want.
To view and save information for more levels [3], use `--levels 1 2 3 4 5`

.

### Additional options¶

You can further tweak your plots and tables. `--use-tex`

will make the program
render the plot using \(\LaTeX\), fitting nicely to the rest of your paper.
You can plot Gaussian fits together with the histograms (the Gaussian curve and
the two first sigma levels), using `--show-gaussian-fits`

, but probably only for
the sake of comparison since this is not usually done in publications.
`--fmt`

can be used to set the number of figures to report the results
(default is `5`

).
There is `--font-size`

to choose the size of the fonts, `--show-hist`

to
plot the histograms together with the smooth curves when `--kde`

is used,
`--no-best-fit-marks`

to omit the indication of the
coordinates of the best-fit point, `--png`

to save images in png format
besides the pdf files.

If you have nuisance parameters, you can opt to not show them with
`--exclude nuisance`

.
This option can actually take the label of any parameter you choose not to
include in the plot.

Below we see the triangle plot of the histograms, with the default settings,
in comparison with a perfected version using the smoothed distributions, the
Python color `C9`

, the \(\LaTeX\) renderer, including the \(3\sigma\)
confidence level and excluding the nuisance parameter \(M\).

### Combining two or more simulations in one plot¶

You just need to run `epic.py plot`

with two or more paths in the
arguments.
I illustrate this with two simulation for the same simplified \(\Lambda\text{CDM}\) model, with cold dark matter and \(\Lambda\) only, one with \(H(z)\) and \(H_0\) data, the other with the same data plus the simplified supernovae dataset from JLA.
It is then interesting to plot both realizations together so we can see the effect that including a dataset has on the results:

```
$ python epic.py plot \
<FULL-OR-RELATIVE-PATH-TO>/HLCDM+SNeIa/H_and_SN/ \
<FULL-OR-RELATIVE-PATH-TO>/HLCDM/H_only/ \
--kde --use-tex --group-name comparison --no-best-fit-marks
```

You can combine as many results as you like.
When this is done, the list of parameters will be determined from the first
simulation given in the command by its path.
Automatic ranges for the plots are determined from the constraints of the first
simulation as well.
Since different simulations might give considerably different results
that could go outside of the ranges of one of them, consider using `--no-auto-range`

or specifying custom intervals in the `.ini`

file if needed.
The `--group-name`

option specifies the prefix for the name of the pdf file
that will be generated.
If you omit this option you will be prompted to type it.
All other settings are optional.

A legend will be included in the top right corner using the labels defined in
the `.ini`

files, under the `analysis`

section.
The legend title uses the model name of the first simulation in the arguments.
This is intended for showing, at the same time, results from different datasets
with the same model.

### Visualizing chain sequences and convergence¶

If you include the option `--sequences`

with the `analyze`

command you will
get a plot like this

When monitoring convergence, the values of \(\hat{R}^p - 1\) at twenty (or
`--GR-steps`

) different lengths for the multivariate analysis and separate
one-dimensional analyses for each parameter are plotted in the files
`monitor_convergence_<N>.pdf`

and `monitor_each_parameter_<N>.pdf`

, where
`N`

is the total utilized length of the chains.
The absolute values of the between-chains and within-chains variances,
\(\hat{V}\) and \(W\) are also shown.
For our \(\Lambda\text{CDM}\) example, we got

To generate these plots, one needs first run the script with the command `analyze`

and the `--convergence`

option; then, run:

```
$ python epic.py monitor <FULL-OR-RELATIVE-PATH-TO>/LCDM/MyFirstRun
```

The first argument can be multiple paths to simulations, in which case the
first plot above will have panes for each simulation, one below each other.
There are still the options `--use-tex`

and `--png`

, as with the `plot`

command.
Finally, the correlation of the chains can also be inspected if we use the
option `--correlation-function`

.
This includes all the cross- and auto-correlations.

Footnotes

[1] | When checking convergence with Gelman-Rubin method, however, burn-in is still applied. |

[2] | Łącki M. K. & Miasojedow B. “State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm”. Statistics and Computing 26, 951–964 (2016). |

[3] | Choice limited up to the fifth level. Notice that high sigma levels might be difficult to find due to the finite sample sizes when the sample is not sufficiently large. |

[4] | Gelman A & Rubin D. B. “Inference from Iterative Simulation Using Multiple Sequences”. Statistical Science 7 (1992) 457-472. |

[5] | Brooks S. P. & Gelman A. “General Methods for Monitoring Convergence of Iterative Simulations”. Journal of Computational and Graphical Statistics 7 (1998) 434. |

[6] | Kristan M., Leonardis A. and Skocaj D. “Multivariate online kernel density estimation with Gaussian kernels”. Pattern Recognit 44 (2011) 2630-2642. |