Custom Models

This module contains all the custom stochastic models. It helps the user to choose a pre-implemented model and to use directly on its own use case. The first class StochasticModel is an abstract class on which all other class are based on.

Each custom model has a custom class that is a subclass of StochasticModel and a high-level function that allows to convert to usual object of type tf.keras.Model or tf.keras.SequentialModel to the corresponding model.

The list of available models is :

Stochastic Model

StochasticModel is a subclass of tf.keras.Model that was defined in order to add stochastic metrics. It’s an abstract class that can’t be instanciate. All other custom models are subclasses of StochasticModel.

Tip

If you create a custom stochastic model, you may want it to be a subclass of StochasticModel.

Ths class is useful to integrate stochatic metrics into the training and testing process. For example, the piece of code below shows how to use the stochastic metrics in a custom model :

>>> model.compile(stochastic_metrics='picp')
>>> model.fit(x_train, y_train, epochs=6)
Epoch 1/6
1/1 [==============================] - 4s 4s/step - loss: 1.5937 - picp: 1.0000
Epoch 2/6
1/1 [==============================] - 0s 9ms/step - loss: 1.1870 - picp: 1.0000
Epoch 3/6
1/1 [==============================] - 0s 10ms/step - loss: 1.1034 - picp: 1.0000
Epoch 4/6
1/1 [==============================] - 0s 6ms/step - loss: 1.0789 - picp: 0.9500
Epoch 5/6
1/1 [==============================] - 0s 5ms/step - loss: 1.0209 - picp: 0.8500
Epoch 6/6
1/1 [==============================] - 0s 6ms/step - loss: 0.9594 - picp: 0.9000

An explanation of the class StochasticModel is defined below.

class purestochastic.model.base_uncertainty_models.StochasticModel(*args, **kwargs)[source]

StochasticModel allows to make stochastic training and inference features.

StochasticModel is a subclass of keras.Model that allows to construct stochastic model. Stochastic model often outputs the parameters of a parametric distribution or quantiles of a generic distribution. For example, it outputs the mean and the variance of a Gaussian Distribution.

However, with standard class keras.Model, all the metrics need to take the same input values. Nevertheless, deterministic and stochastic metrics don’t take the same input values. Therefore, StochasticModel adds the possibility to have deterministic as well as stochastic metrics. Stochastic metrics need to be specified when model.compile is called with stochastic_metrics or stochastic_weigthed_metrics arguments.

The class is abstract and can’t be instanciate. Subclass need to override their own compute_metrics(self, x, y, prediction, sample_weight) method that will called the parent method with the appropriate y_pred and stochastic_predictions arguments.

compile(stochastic_metrics=None, stochastic_weigthed_metrics=None):

Compile the model and add stochastic metrics.

compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight):

Compute the values of the deterministic and stochastic metrics.

reset_metrics():

Reset the state of deterministic and stochastic metrics

Warning

All the stochastic metrics need to take the same input values. They have to be consistent together.

compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight)[source]

Compute the metrics.

The method called the parent method compute_metrics to compute the deterministic metrics and then compute the stochastic metrics manually.

The methods takes one additional parameter stochastic_predictions that it’s specified by methods of subclass. This has to be the same for all the stochastic metrics.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • y_pred (tf.Tensor) – Mean prediction for y.

  • stochastic_predictions (tf.Tensor) – Stochastic predictions for y.

  • sample_weight – Sample weights for weighting the metrics.

Returns

metric_results – Value of each metric.

Return type

dict

MVEM

Todo

Add a class for MVEM which deals with compute_metrics

DeepEnsemble

The DeepEnsemble model is an ensemble of Deep Learning model. The idea is to train the same model multiple times with different random seeds and then average the results in order to have diverse predictions. It’s then possible to combine the predictions. For more details, see the papers Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles or Deep Ensembles: A Loss Landscape Perspective.

../_images/DeepEnsemble.png

Cartoon illustration of the performance of the DeepEnsemble model from the article Deep Ensembles: A Loss Landscape Perspective.

Tip

The DeepEnsemble model seems to work better if the kernel initializer is not set to ‘glorot_uniform’. (the default value) but to ‘random_normal’ with a high variance stddev.

Here are some examples using the DeepEnsembleModel :

  • With high-level API (recommended usage) :

1inputs = Input(shape=(input_dim,))
2x = Dense(100, activation="relu")(inputs)
3outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4model = Model(inputs=inputs, outputs=outputs)
5DeepEnsemble = toDeepEnsemble(model, nb_models)
  • With low-level layers :

1    inputs = Input(shape=(input_dim,))
2    x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
3    outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
4    model = DeepEnsembleModel(inputs=inputs, outputs=outputs)

The DeepEnsembleModel and the method toDeepEnsemble are described above.

class purestochastic.model.deep_ensemble.DeepEnsembleModel(*args, **kwargs)[source]

Implementation of the DeepEnsemble model.

The Deep Ensemble 1 is an ensemble of Deep Learning model trained independently and combined for prediction in order to estimate uncertainty.

The model can be constructed manually or it’s possible to use the method toDeepEnsemble to convert a simple keras.Model object into a DeepEnsembleModel object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None):

Compute the loss independently for each model.

_combine_predictions(predictions, stacked):

Combine the predictions made by the models.

compute_metrics(x, y, predictions, sample_weight):

Specify the mean and stochastic part of the predictions to compute the metrics.

predict(x):

Compute the predictions of the model thanks to the _combine_predictions method.

References

1

Balaji Lakshminarayanan, Alexander Pritzel et Charles Blundell. « Simple and scalable predictive uncertainty estimation using deep ensembles ». In : Advances in Neural Information Processing Systems 2017-Decem.Nips (2017), p. 6403-6414. issn : 10495258. arXiv : 1612.01474.

_combine_predictions(predictions, stacked)[source]

Combine the predictions of all the models in order to quantify the uncertainty.

This method combines the prediction of all the models in order to quantify uncertainty. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models):

  • Mean Variance Activation (see method MeanVarianceActivation)):

    • Mean : \(\hat{\mu} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{\mu}_i\)

    • Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{\mu})^2\)

    • Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B} \sum_{i=1}^{B} (\sigma^2_i)\)

  • No specific structure :

    • Mean : \(\hat{y} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{y}_i\)

    • Variance : \(\hat{\sigma}^2 = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{y})^2\)

In the future, it will be possible to add other possibilities.

Parameters
  • predictions (tf.Tensor) – Predictions returned by the model (output of model(x))

  • stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

Returns

  • Predictions that have been combined. If stacked is True, the output is a one tensor.

  • Otherwise, the output is a list of tensors.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]

Custom compute_loss function.

This method overrides the compute_loss function so that the class doesn’t need specific loss function. It computes the loss for each model independently.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • y_pred (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

The total loss.

compute_metrics(x, y, predictions, sample_weight)[source]

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • predictions (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

predict(x, **kwargs)[source]

Combine predictions made by all the models.

This method just called the parent’s method and then combine predictions in order to quantify uncertainty.

Parameters
  • x (tf.Tensor) – Input data.

  • kwargs (optional) – Other Arguments of the predict parent’s method.

Returns

Predictions made by the Deep Ensemble model.

Return type

np.ndarray

purestochastic.model.deep_ensemble.toDeepEnsemble(net, nb_models)[source]

Convert a regular model into a deep ensemble model.

This method intends to be high-level interface to construct a Deep Ensemble model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.

Parameters
  • net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model

  • nb_models (int) – the number of models

Returns

a Deep Ensemble Model

Return type

DeepEnsembleModel

Todo

Add support for other architectures

SWAG

The SWAG model is a bayesian model and especially a bayesian model averaging with variational inference. The model fits a the posterio distribution of the parameter of the model during the training of the model exploiting specific properties of the optimization process. For more details, see the papers A Simple Baseline for Bayesian Uncertainty in Deep Learning.

Tip

It’s important to fix the learning rate of the second optimization process quite high so that the parameters are enough diverse. It’s not a problem is the loss is not stable, that is the loss increases and decreases itertively.

Warning

This method uses two optimization process. The first one is the pretaining with the argument given in the compile method. The second one is the training of the model with the argument given in the fit method.

Here are some examples using the SWAGModel :

  • With high-level API (recommended usage) :

1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 SWAG = toSWAG(model)
  • With low-level layers :

1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = SWAGModel(inputs=inputs, outputs=outputs)

The SWAGModel and the methods toSWAG and SWAGCallback are described above. The principal logic of the SWAG algorithm is in the SWAGCallback class.

class purestochastic.model.swag.SWAGModel(*args, **kwargs)[source]

Implementation of the SWAG Model.

The SWAG 2 (Stochastic Weight Averaging Gaussian) is a model to make bayesian inference and training to quantify uncertainty. For more details, see SWAGCallback.

The model can be constructed manually or it’s possible to use the method toSWAG to convert a simple keras.Model object into a SWAGModel object.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):

Trains the model with the SWAG algorithm.

_sample_prediction(data, S, verbose=0):

Sample different prediction according to the posterior distribution of the parameters.

_combine_predictions(predictions, stacked):

Combine the sampled predictions.

compute_metrics(x, y, predictions, sample_weight):

Specify the mean and stochastic part of the predictions to compute the metrics.

predict(data, S=5, verbose=0):

Computes the predictions of the model with the SWAG algorithm.

evaluate(x=None, y=None, S=5, sample_weight=None):

Evaluate the model with the SWAG algorithm.

References

2(1,2)

Wesley J. Maddox et al. « A simple baseline for Bayesian uncertainty in deep learning ». In : Advances in Neural Information Processing Systems 32.NeurIPS (2019), p. 1-25. issn : 10495258. arXiv : 1902.02476.

_combine_predictions(predictions, stacked)[source]

Bayesian Model Averaging of the S predictions.

This method follows the _sample_prediction method. It takes in input the batch of S predictions sampled from _sample_prediction method. Then, it averages the predictions in order to compute the mean and the uncertainty associated with the prediction. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (S=number of samples):

  • Mean Variance Activation (see method MeanVarianceActivation)):

    • Mean : \(\hat{\mu} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{\mu}_i\)

    • Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{\mu})^2\)

    • Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{S} \sum_{i=1}^{S} (\sigma^2_i)\)

  • No specific structure

    • Mean : \(\hat{y} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{y}_i\)

    • Variance : \(\hat{\sigma}^2 = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{y})^2\)

In the future, it would be possible to add other possibilities.

Parameters
  • predictions (tf.Tensor) – Batch of the S predictions computed by _sample_prediction.

  • stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

_sample_prediction(data, S, verbose=0)[source]

Sample predictions according to the posterior distribution of the parameters.

In the SWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article 2. The mean has been stored in the variable SWA_weights. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively in SWA_cov and deviation_matrix.

The method samples the weights and computes the prediction associated multiple times.

Parameters
  • data (tf.Tensor) – Input data (equivalent to x).

  • S (int) – The number of samples used in the Monte Carlo method.

  • verbose (int, default:0) – The verbosity level.

Returns

preds – The batch of S predictions.

Return type

tf.Tensor

compute_metrics(x, y, predictions, sample_weight)[source]

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Warning

Unless the model predicts aleatoric uncertainty, the model can’t compute stochastic metrics before the end of the training.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • predictions (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

evaluate(x=None, y=None, S=5, sample_weight=None)[source]

Custom evaluate method.

It returns the loss value & metrics values for the model in test mode.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data

  • S (int, default:5) – The number of samples used in the Monte Carlo method.

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

Dict containing the values of the metrics and loss of the model.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]

Train the model with the SWAG algorithm.

The model is trained in two parts :

  • Before start_averaging epochs, the model is trained normally. It’s defined as the pretraining of the model and the training uses the optimizer and learning rate specified in the compile function.

  • After start_averaging epochs, the model is trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the model are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in _sample_prediction in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer.

See also

src.model.swag.SWAGCallback

Parameters
  • X (np.ndarray) – The input data.

  • y (np.ndarray) – The target data.

  • start_averaging (int) – The number of epochs to pretrain the model.

  • learning_rate (float) – The learning rate of the SWAG algorithm (second part).

  • update_frequency (int) – The number of epochs between each save of parameters of the SWAG algorithm.

  • K (int) – The number of samples used to compute the covariance matrix.

Return type

History of the SWAG’s training.

predict(data, S=5, verbose=0)[source]

Sample predictions and combine them.

This method defines the inference step of the SWAG algorithm. First, it samples predictions of the model with the _sample_prediction method. Then, the predictions are combined with the method _combine_predictions.

Parameters
  • data (numpy.ndarray) – The input data.

  • S (int, default:5) – The number of samples used in the Monte Carlo method.

  • verbose (int, default:0) – The verbosity level.

Return type

The predictions of the model.

purestochastic.model.swag.toSWAG(net)[source]

Convert a regular model into a SWAG model.

This method intends to be high-level interface to construct a SWAG model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.

Parameters
  • net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model

  • nb_models (int) – the number of models

Returns

a SWAG Model

Return type

SWAGModel

class purestochastic.model.swag.SWAGCallback(learning_rate, update_frequency, K)[source]

Approximation of the posterior distribution of parameters as a gaussian distribution.

Callback used in the class SWAGModel and MultiSWAGModel. It allows to approximate the posterior distribution of the parameters as a gaussian distribution. The parameters of the gaussian distribution are computed as follows :

  • The mean of the gaussian is the mean of the parameters (first moment) found during the training process. Mathematically, it is defined as :

\[\theta_{SWA} = \frac{1}{T} \sum_{t=1}^T \theta_t\]
  • The covariance matrix is constructed by taking half of a diagonal approximation and half of a low-rank approximation of the covariance matrix. The diagonal approximation is computed at the end of the training by using the first and second order moments of the parameters :

\[\Sigma_{Diag} = diag(\bar{\theta}^2-\theta_{SWA}^2)\]

The low-rank approximation is constructed by using the difference of the last K values of the parameters with the mean value of the parameters :

\[\Sigma_{low-rank} = \frac{1}{K-1}.\hat{D}\hat{D}^T \text{ avec chaque colonne de D } D_t=(\theta_t - \bar{\theta}_t)\]

To sample from this gaussian distribution, the SWAGModel and MultiSWAGModel use the following equation :

\[\theta_j = \theta_{SWA} +\frac{1}{\sqrt{2}}.\Sigma_{diag}^{\frac{1}{2}}n_1 + \frac{1}{\sqrt{2(K-1)}}\hat{D}n_2, ~~ n_1, n_2 \sim \mathcal{N}(0,I)\]

It is then sufficient to store the matrix D, the first order moments of the parameters as well as the diagonal approximation of the covariance at the end of the training.

Parameters
  • learning_rate (float) – The learning rate of the optimizer.

  • update_frequency (int) – The number of epochs between two updates of the first and second moments of the parameters.

  • K (int) – The number of samples used to compute the second order moments.

on_epoch_end(epoch, logs=None)[source]

Updates first and second order moments as well as deviation matrix.

Every update_frequency epochs, the first and second order moments as well as deviation matrix are updated :

\[\bar{\theta} = \frac{n \bar{\theta} + \theta_{epochs}}{n+1}\]
\[\bar{\theta}^2 = \frac{n \bar{\theta}^2 + \theta_{epochs}^2}{n+1}\]
\[\text{APPEND_COL}(\hat{D}, \theta_{epochs}-\bar{\theta})\]

If the matrix D has more than K columns, the oldest columns is removed.

Parameters

epoch (int) – The number of the actual epoch.

on_train_end(logs=None)[source]

Compute and store the variables needed to sample the posterior distribution.

The mean of the gaussian distribution is saved in the attribute SWA_weights of the model. The deviation matrix used in the covariance matrix is saved in the attribute deviation_matrix of the model. Finally, the root of the diagonal matrix used in the covariance matrix is computed and saved in the attribute SWA_cov of the model.

Parameters

logs (optional) – See tf.keras.callbacks.Callback

MultiSWAG

The MultiSWAG model was developped in order to have the advantages of the SWAG model and the DeepEnsemble model. Therefore, it’s an ensemble of bayesian deep learning models. For more details, see Bayesian deep learning and a probabilistic perspective of generalization.

../_images/MultiSWAG.png

Cartoon illustration of comparison between the SWAG, DeepEnsemble model and the MultiSWAG model from the paper Bayesian deep learning and a probabilistic perspective of generalization.

Here are some examples using the MultiSWAGModel :

  • With high-level API (recommended usage) :

1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 DeepEnsemble = toMultiSWAG(model, nb_models)
  • With low-level layers :

1 inputs = Input(shape=(input_dim,))
2 x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
3 outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = MultiSWAGModel(inputs=inputs, outputs=outputs)

The MultiSWAGModel and the method toMultiSWAG are described above.

class purestochastic.model.swag.MultiSWAGModel(*args, **kwargs)[source]

Implementation of the MultiSWAG Model.

The MultiSWAG 3 (Multi Stochastic Weight Averaging Gaussian) is an ensemble of SWAG Model. It’s a mix between a DeepEnsemble and SWAG Model. For more details, see SWAGCallback, SWAGModel and DeepEnsembleModel.

The model can be constructed manually or it’s possible to use the method toMultiSWAG to convert a simple keras.Model object into a :class:MultiSWAGModel object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):

Trains the model with the MultiSWAG algorithm.

_sample_prediction(data, S, verbose=0):

Sample different prediction according to the posterior distribution of the parameters.

_combine_predictions(predictions, stacked):

Combine the sampled predictions made by all models.

compute_metrics(x, y, predictions, sample_weight):

Specify the mean and stochastic part of the predictions to compute the metrics.

evaluate(x=None, y=None, S=5, sample_weight=None):

Evaluate the model with the MultiSWAG algorithm.

predict(data, S=5, verbose=0):

Computes the predictions of the model with the MultiSWAG algorithm.

References

3

Andrew Gordon Wilson et Pavel Izmailov. « Bayesian deep learning and a probabilistic perspective of generalization ». In : Advances in Neural Information Processing Systems 2020- Decem.3 (2020). issn : 10495258. arXiv : 2002.08791.

_combine_predictions(predictions, sampled, stacked)[source]

Bayesian Model Averaging of the S predictions of the B models.

It’s a little bit different from the function in SWAGModel. There is 2 cases :

  • If sampled is False, the parameters of the posterior distribution have not been computed yet and so it’s impossible to sample predictions. Therefore, the function just combines the predictions made by all the models as in the DeepEnsembleModel.

  • If sampled is True, the parameters have been computed. So, this method follows the _sample_prediction method. It takes in input the batch of S predictions for each model sampled from _sample_prediction method. Then, it averages the predictions over the samples and the models in order to compute the mean and the uncertainty associated with the prediction.

The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models, S = number of samples) :

  • Mean Variance Activation (see method MeanVarianceActivation)):

    • Mean : \(\hat{\mu} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{\mu}_{i,j}\)

    • Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{\mu})^2\)

    • Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\sigma^2_{i,j})\)

  • No specific structure :

    • Mean : \(\hat{y} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{y}_{i,j}\)

    • Variance : \(\hat{\sigma}^2 = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{y})^2\)

In the future, it would be possible to add other possibilities.

Parameters
  • predictions (tf.Tensor) – Batch of the S predictions for each model computed by _sample_prediction.

  • sampled (boolean) – Boolean to indicate wheter the input have been sampled.

  • stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

_sample_prediction(data, S, verbose=0)[source]

Sample predictions according to the posterior distribution of the parameters.

It’s the same function as in SWAGModel. In the MultiSWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article. The mean has been stored in the variable SWA_weights. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively in SWA_cov and deviation_matrix.

The method samples the weights and computes the prediction associated multiple times for each model independently.

Parameters
  • data (tf.Tensor) – Input data (equivalent to x).

  • S (int) – The number of samples used in the Monte Carlo method.

  • verbose (int, default:0) – The verbosity level.

Returns

preds – The batch of S predictions.

Return type

tf.Tensor

compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]

Custom compute_loss function.

This method overrides the compute_loss function so that the class doesn’t need specific loss function. It computes the loss for each model independently. It’s the same function as in DeepEnsembleModel.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • y_pred (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

The total loss.

compute_metrics(x, y, prediction, sample_weight)[source]

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • predictions (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

evaluate(x=None, y=None, S=5, sample_weight=None)[source]

Custom evaluate method.

It returns the loss value & metrics values for the model in test mode.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data

  • S (int, default:5) – The number of samples used in the Monte Carlo method.

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

Dict containing the values of the metrics and loss of the model.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]

Train the model with the MultiSWAG algorithm.

It’s the same function as in SWAGModel but with multiple models trained independently. The models are trained in two parts :

  • Before start_averaging epochs, the models are trained normally. It’s defined as the pretraining of the models and the training uses the optimizer and learning rate specified in the compile function.

  • After start_averaging epochs, the models are trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the models are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in _sample_prediction in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer. For more details, see SWAGCallback.

Parameters
  • X (np.ndarray) – The input data.

  • y (np.ndarray) – The target data.

  • start_averaging (int) – The number of epochs to pretrain the model.

  • learning_rate (float) – The learning rate of the MultiSWAG algorithm (second part).

  • update_frequency (int) – The number of epochs between each save of parameters of the MultiSWAG algorithm.

  • K (int) – The number of samples used to compute the covariance matrix.

Return type

History of the MultiSWAG’s training.

predict(data, S=5, verbose=0)[source]

Sample predictions and combine them.

It’s the same function as in SWAGModel This method defines the inference step of the MultiSWAG algorithm. First, it samples predictions of each model with the _sample_prediction method. Then, all the predictions are combined with the method _combine_predictions.

Parameters
  • data (np.ndarray) – The input data.

  • S (int, default:5) – The number of samples used in the Monte Carlo method.

  • verbose (int, default:0) – The verbosity level.

Return type

The predictions of the model.

purestochastic.model.swag.toMultiSWAG(net, nb_models)[source]

Convert a regular model into a MultiSWAG model.

This method intends to be high-level interface to construct a MultiSWAG model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.

Parameters
  • net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model

  • nb_models (int) – the number of models

Returns

a MultiSWAG Model

Return type

MultiSWAGModel

Orthonormal Certificates

The Orthonormal Certificates model was developped in order to quantify epistemic uncertainty with a single-model estimates. For more details, see Single-model uncertainties for deep learning.

Here are some examples using the OrthonormalCertificatesModel :

  • With high-level API (recommended usage) :

1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense(output_dim)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 DeepEnsemble = toOrthonormalCertificates(model, K=100, nb_layers_head=1)
  • With low-level layers (not recommended) :

1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense(output_dim)(x)
4 outputs2 = Dense(K, kernel_regularizer=Orthonormality())(x)
5 model = OrthonormalCertificatesModel(inputs=inputs, outputs=[outputs, outputs2])

The OrthonormalCertificatesModel and the method toOrthonormalCertificates are described above.

class purestochastic.model.orthonormal_certificates.OrthonormalCertificatesModel(*args, **kwargs)[source]

Implementation of the Orthonormal Certificates model.

The model was proposed in 4 . To estimate epistemic uncertainty, they propose Orthonormal Certificates (OCs), a collection of diverse non-constant functions that map all training samples to zero.

The model can be constructed manually (not recommended) or it’s possible to use the method toOrthonormalCertificates to convert a simple keras.Model object into a OrthonormalCertificatesModel object.

fit(X, y, epochs_oc=0, learning_rate_oc=0.001):

Fit the initial and OC model.

fit_oc(X, y, learning_rate_oc=0.001):

Fit the OC model.

compute_metrics(x, y, predictions, sample_weight):

Specify the mean and stochastic part of the predictions to compute the metrics.

predict(data, S=5, verbose=0):

Computes the predictions of the initial model and an epistemic score.

find_loss():

Returns the loss specified in compile.

References

4(1,2,3)

Tagasovska, Natasa and Lopez-Paz, David. « Single-model uncertainties for deep learning ». In : Advances in Neural Information Processing Systems 2019.Nips (2019), p. 1-12. issn : 10495258. arXiv : 1811.00908.

compute_metrics(x, y, predictions, sample_weight)[source]

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Warning

For OrthonormalCertificatesModel, the choice is to remove stochastic metrics because the certificates don’t have a real sense.

Parameters
  • x (tf.Tensor) – Input data.

  • y (tf.Tensor) – Target data.

  • predictions (tf.Tensor) – Predictions returned by the model (output of model(x))

  • sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

find_loss()[source]

Returns the loss specified in the compile function.

Returns

The name of the loss.

Return type

str

fit(X, y, epochs_oc=0, learning_rate_oc=0.001, **kwargs)[source]

Train the model the initial model and the orthonormal certificates.

The model is trained in two parts :

  • During epochs epochs, the model is trained normally. It’s defined as the training of the initial model and the training uses the optimizer and learning rate specified in the compile function. The certificates are frozen.

  • During epochs_oc epochs, all the layer are frozen except the certificates. The training is parametrized by learning_rate_oc and the sum of the loss function specified in the compile function and the Orthonormality loss.

Note

By default, the parameter epochs_oc is set to 0, and the orthonormal certificates are not trained.

See also

purestochastic.common.regularizer.Orthonormality

Parameters
  • X (np.ndarray) – The input data.

  • y (np.ndarray) – The target data.

  • epoch_oc (int (default : 0)) – Number of epochs for the training of certificates.

  • learning_rate_oc (float (default : 0.001)) – Learning rate for the training of certificates.

Return type

History of the two trainings.

fit_oc(X, y, learning_rate_oc=0.001, **kwargs)[source]

Train the orthonormal certificates.

All the layer are frozen except the orthonormal certificates. The model is trained with the optimizer specified in the compile function with the learning rate learning_rate_oc. The loss is the sum of the two following parts :

  • The loss function with predicted value set to the output of the orthonormal certificates and target value set to 0.

  • The orthonormality regularizer added to the kernel so that the certificates are orthonormal. For more details, see :class:purestochastic.common.regularizer.Orthonormality.

The details of the method is detailled in 4.

Parameters
  • X (np.ndarray) – The input data.

  • y (np.ndarray) – The target data.

  • learning_rate_oc (float (default : 0.001)) – Learning rate for the training of certificates.

Return type

History of the training.

predict(x, **kwargs)[source]

Compute predictions.

This method just called the parent’s method to compute the predictions of the initial model and the orthonormal certificates. The norm of the orthonormal certificates is computed in order to have a score for the epistemic uncertainty as defined in the article 4 .

Parameters
  • x (tf.Tensor) – Input data.

  • kwargs (optional) – Other Arguments of the predict parent’s method.

Returns

Predictions made by the Deep Ensemble model.

Return type

np.ndarray

purestochastic.model.orthonormal_certificates.toOrthonormalCertificates(net, K, nb_layers_head, multiple_miso=True, lambda_coeff=1)[source]

Convert a regular model into a Orthonormal Certificates model.

This method intends to be high-level interface to construct a Orthonormal Certificates model from a regular model.

Parameters
  • net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model

  • nb_models (int) – the number of models

Returns

a Orthonormal Certificates Model

Return type

class OrthonormalCertificatesModel