# Tutorial

This tutorial provides a full walk-through on how to apply EPI to a
example problem. We only assume that you already installed `eulerpi`. The
tutorial is divided in four sections:

1.  [Introduction](#introduction)
2.  [Define your data](#define-your-data)
3.  [Define your model](#define-your-model)
4.  [Inference](#inference)

Let\'s start!

Introduction
------------

EPI is an algorithm to infere a parameter distribution $Q$ satisfying
$Y = s(Q)$ given a (discrete) data probability distribution $y_i \sim Y$
and a model implementing the mapping $s: Q \to Y$. The (forward) model
describes the mapping from the parameter points $q_i$ to the data points
$y_i$.

In the following we will look at temperature data over the globe and a
model for the dependence of the temperature $y_i$ on the latitude $q_i$.

The goal is to derive the parameter distribution $\Phi_Q$ from the data
distribution $\Phi_Y$. This is the inverse of what our (forward) model
is providing. To solve the inverse problem, EPI uses the multi-dimension
transformation formula:

In the real world, problems with a known continous data distribution are
very sparse. Instead, we often rely on discrete measurements. EPI start
with discrete data points as input and derives a continous distribution
using Kernel Density Estimation (KDE) techniques. From this data
distribution the EPI algorithm derives the parameter distribution. To
close the cycle between the data and parameters, we can again sample
from this distribution and use the forward model to get a discrete
distribution of the parameters.

With this picture in mind, we can start to implement the temperature
problem in eulerpi.

## Define your data
Your data needs to be stored in a `.csv` file in the following format:

``` text
datapoint_dim1, datapoint_dim2, datapoint_dim3, ..., datapoint_dimN
datapoint_dim1, datapoint_dim2, datapoint_dim3, ..., datapoint_dimN
datapoint_dim1, datapoint_dim2, datapoint_dim3, ..., datapoint_dimN
...
datapoint_dim1, datapoint_dim2, datapoint_dim3, ..., datapoint_dimN
```

Each of the lines defines a N dimensional datapoint. The
`.csv` file will be loaded into an
$\mathrm{R}^{M \times N}$ numpy array in EPI. Alternatively, you can provide an $\mathrm{R}^{M \times N}$ numpy array directly.

In the following we will use the example data `TemperatureData.csv`. It has 455 datapoints with two dimensions each.
Nonuniform data is not supported in EPI.
Please download it from: [Download Temperature Data](https://systems-theory-in-systems-biology.github.io/EPI/_downloads/090dff47c31e511d0522cc9cc0cdb502/TemperatureData.csv) and make sure that it is located in the same path as this notebook. 

## Define your model

Next you need to define your model. The most basic way is to derive from
the `eulerpi.core.models.BaseModel` class.

In [None]:
import importlib
import jax.numpy as jnp
import numpy as np
from eulerpi.core.models import BaseModel

A model inhereting from `BaseModel` must implement the methods
- `forward`
- `jacobian`

Additionally, the attributes 
- `param_dim`
- `data_dim`
- `PARAM_LIMITS`
- `CENTRAL_PARAM`

must be defined by the model.

This provides the sampling algorithm with sensible starting values and boundary values. The jacobian for the temperature model is derived analytically and implemented explicitly. Note that the model class has to be defined in its own file - in this case, copy the following code into a file with the name `temperature.py`.

In [None]:
from typing import Optional

import jax.numpy as jnp
import numpy as np

from eulerpi.core.models import BaseModel


class Temperature(BaseModel):

    param_dim = 1
    data_dim = 1

    PARAM_LIMITS = np.array([[0, np.pi / 2]])
    CENTRAL_PARAM = np.array([np.pi / 4.0])

    def __init__(
        self,
        central_param: np.ndarray = CENTRAL_PARAM,
        param_limits: np.ndarray = PARAM_LIMITS,
        name: Optional[str] = None,
        **kwargs,
    ) -> None:
        super().__init__(central_param, param_limits, name=name, **kwargs)

    def forward(self, param):
        low_T = -30.0
        high_T = 30.0
        res = jnp.array(
            [low_T + (high_T - low_T) * jnp.cos(jnp.abs(param[0]))]
        )
        return res

    def jacobian(self, param):
        return jnp.array([60.0 * jnp.sin(jnp.abs(param[0]))])

## Inference

Now we can now use EPI to infer the parameter distribution from the data.By default, the `inference` method uses Markov chain Monte Carlo sampling (this can be changed using the inference_type argument). `inference` returns a tuple containing samples from the parameter Markov chain $y_i$, the corresponding data points $q_i = s(y_i)$, the estimated densities $\Phi_Q (q_i)$ scaled by a constant $c$, and a `ResultManager` object that can be used to load and manipulate the results of EPI.

In [None]:
from eulerpi.core.inference import inference
from temperature import Temperature

# create a temperature model object
model = Temperature()
# run EPI
overall_params, sim_results, density_evals, result_manager = inference(
    model = model,
    data = "TemperatureData.csv",
)

Depending on the complexity of your model the sampling can take a long time. Due to this reason, not only the final results but also intermediate sampling results are saved. You can find them in the folder `Applications/Temperature/`.