Custom Models¶

This module contains all the custom stochastic models. It helps the user to choose a pre-implemented model and to use directly on its own use case. The first class StochasticModel is an abstract class on which all other class are based on.

Each custom model has a custom class that is a subclass of StochasticModel and a high-level function that allows to convert to usual object of type tf.keras.Model or tf.keras.SequentialModel to the corresponding model.

The list of available models is :

Stochastic Model
MVEM
DeepEnsemble
SWAG
MultiSWAG
Orthonormal Certificates

Stochastic Model ¶

StochasticModel is a subclass of tf.keras.Model that was defined in order to add stochastic metrics. It’s an abstract class that can’t be instanciate. All other custom models are subclasses of StochasticModel.

Tip

If you create a custom stochastic model, you may want it to be a subclass of StochasticModel.

Ths class is useful to integrate stochatic metrics into the training and testing process. For example, the piece of code below shows how to use the stochastic metrics in a custom model :

>>> model.compile(stochastic_metrics='picp')
>>> model.fit(x_train, y_train, epochs=6)
Epoch 1/6
1/1 [==============================] - 4s 4s/step - loss: 1.5937 - picp: 1.0000
Epoch 2/6
1/1 [==============================] - 0s 9ms/step - loss: 1.1870 - picp: 1.0000
Epoch 3/6
1/1 [==============================] - 0s 10ms/step - loss: 1.1034 - picp: 1.0000
Epoch 4/6
1/1 [==============================] - 0s 6ms/step - loss: 1.0789 - picp: 0.9500
Epoch 5/6
1/1 [==============================] - 0s 5ms/step - loss: 1.0209 - picp: 0.8500
Epoch 6/6
1/1 [==============================] - 0s 6ms/step - loss: 0.9594 - picp: 0.9000

An explanation of the class StochasticModel is defined below.

class purestochastic.model.base_uncertainty_models.StochasticModel(*args, **kwargs)[source]¶

StochasticModel allows to make stochastic training and inference features.

StochasticModel is a subclass of keras.Model that allows to construct stochastic model. Stochastic model often outputs the parameters of a parametric distribution or quantiles of a generic distribution. For example, it outputs the mean and the variance of a Gaussian Distribution.

However, with standard class keras.Model, all the metrics need to take the same input values. Nevertheless, deterministic and stochastic metrics don’t take the same input values. Therefore, StochasticModel adds the possibility to have deterministic as well as stochastic metrics. Stochastic metrics need to be specified when model.compile is called with stochastic_metrics or stochastic_weigthed_metrics arguments.

The class is abstract and can’t be instanciate. Subclass need to override their own compute_metrics(self, x, y, prediction, sample_weight) method that will called the parent method with the appropriate y_pred and stochastic_predictions arguments.

compile(stochastic_metrics=None, stochastic_weigthed_metrics=None):: Compile the model and add stochastic metrics.

compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight):: Compute the values of the deterministic and stochastic metrics.

reset_metrics():: Reset the state of deterministic and stochastic metrics

Warning

All the stochastic metrics need to take the same input values. They have to be consistent together.

compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight)[source]¶

Compute the metrics.

The method called the parent method compute_metrics to compute the deterministic metrics and then compute the stochastic metrics manually.

The methods takes one additional parameter stochastic_predictions that it’s specified by methods of subclass. This has to be the same for all the stochastic metrics.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Mean prediction for y.
stochastic_predictions (tf.Tensor) – Stochastic predictions for y.
sample_weight – Sample weights for weighting the metrics.

Returns

metric_results – Value of each metric.

Return type

dict

MVEM ¶

Todo

Add a class for MVEM which deals with compute_metrics

DeepEnsemble ¶

The DeepEnsemble model is an ensemble of Deep Learning model. The idea is to train the same model multiple times with different random seeds and then average the results in order to have diverse predictions. It’s then possible to combine the predictions. For more details, see the papers Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles or Deep Ensembles: A Loss Landscape Perspective.

Tip

The DeepEnsemble model seems to work better if the kernel initializer is not set to ‘glorot_uniform’. (the default value) but to ‘random_normal’ with a high variance stddev.

Here are some examples using the DeepEnsembleModel :

With high-level API (recommended usage) :

inputs = Input(shape=(input_dim,))
x = Dense(100, activation="relu")(inputs)
outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
model = Model(inputs=inputs, outputs=outputs)
DeepEnsemble = toDeepEnsemble(model, nb_models)

With low-level layers :

    inputs = Input(shape=(input_dim,))
    x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
    outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
    model = DeepEnsembleModel(inputs=inputs, outputs=outputs)

The DeepEnsembleModel and the method toDeepEnsemble are described above.

class purestochastic.model.deep_ensemble.DeepEnsembleModel(*args, **kwargs)[source]¶

Implementation of the DeepEnsemble model.

The Deep Ensemble 1 is an ensemble of Deep Learning model trained independently and combined for prediction in order to estimate uncertainty.

The model can be constructed manually or it’s possible to use the method toDeepEnsemble to convert a simple keras.Model object into a DeepEnsembleModel object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None):: Compute the loss independently for each model.

_combine_predictions(predictions, stacked):: Combine the predictions made by the models.

compute_metrics(x, y, predictions, sample_weight):: Specify the mean and stochastic part of the predictions to compute the metrics.

predict(x):: Compute the predictions of the model thanks to the _combine_predictions method.

References

1: Balaji Lakshminarayanan, Alexander Pritzel et Charles Blundell. « Simple and scalable predictive uncertainty estimation using deep ensembles ». In : Advances in Neural Information Processing Systems 2017-Decem.Nips (2017), p. 6403-6414. issn : 10495258. arXiv : 1612.01474.

_combine_predictions(predictions, stacked)[source]¶

Combine the predictions of all the models in order to quantify the uncertainty.

This method combines the prediction of all the models in order to quantify uncertainty. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models):

Mean Variance Activation (see method MeanVarianceActivation)):

Mean : \(\hat{\mu} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{\mu}_i\)

Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{\mu})^2\)

Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B} \sum_{i=1}^{B} (\sigma^2_i)\)

No specific structure :

Mean : \(\hat{y} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{y}_i\)

Variance : \(\hat{\sigma}^2 = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{y})^2\)

In the future, it will be possible to add other possibilities.

Parameters

predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

Returns

Predictions that have been combined. If stacked is True, the output is a one tensor.
Otherwise, the output is a list of tensors.

compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]¶

Custom compute_loss function.

This method overrides the compute_loss function so that the class doesn’t need specific loss function. It computes the loss for each model independently.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

The total loss.

compute_metrics(x, y, predictions, sample_weight)[source]¶

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

predict(x, **kwargs)[source]¶

Combine predictions made by all the models.

This method just called the parent’s method and then combine predictions in order to quantify uncertainty.

Parameters

x (tf.Tensor) – Input data.
kwargs (optional) – Other Arguments of the predict parent’s method.

Returns

Predictions made by the Deep Ensemble model.

Return type

np.ndarray

purestochastic.model.deep_ensemble.toDeepEnsemble(net, nb_models)[source]¶

Convert a regular model into a deep ensemble model.

This method intends to be high-level interface to construct a Deep Ensemble model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.

Parameters

net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model
nb_models (int) – the number of models

Returns

a Deep Ensemble Model

Return type

DeepEnsembleModel

Todo

Add support for other architectures

SWAG ¶

The SWAG model is a bayesian model and especially a bayesian model averaging with variational inference. The model fits a the posterio distribution of the parameter of the model during the training of the model exploiting specific properties of the optimization process. For more details, see the papers A Simple Baseline for Bayesian Uncertainty in Deep Learning.

Tip

It’s important to fix the learning rate of the second optimization process quite high so that the parameters are enough diverse. It’s not a problem is the loss is not stable, that is the loss increases and decreases itertively.

Warning

This method uses two optimization process. The first one is the pretaining with the argument given in the compile method. The second one is the training of the model with the argument given in the fit method.

Here are some examples using the SWAGModel :

With high-level API (recommended usage) :

 inputs = Input(shape=(input_dim,))
 x = Dense(100, activation="relu")(inputs)
 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
 model = Model(inputs=inputs, outputs=outputs)
 SWAG = toSWAG(model)

With low-level layers :

 inputs = Input(shape=(input_dim,))
 x = Dense(100, activation="relu")(inputs)
 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
 model = SWAGModel(inputs=inputs, outputs=outputs)

The SWAGModel and the methods toSWAG and SWAGCallback are described above. The principal logic of the SWAG algorithm is in the SWAGCallback class.

class purestochastic.model.swag.SWAGModel(*args, **kwargs)[source]¶

Implementation of the SWAG Model.

The SWAG 2 (Stochastic Weight Averaging Gaussian) is a model to make bayesian inference and training to quantify uncertainty. For more details, see SWAGCallback.

The model can be constructed manually or it’s possible to use the method toSWAG to convert a simple keras.Model object into a SWAGModel object.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):: Trains the model with the SWAG algorithm.

_sample_prediction(data, S, verbose=0):: Sample different prediction according to the posterior distribution of the parameters.

_combine_predictions(predictions, stacked):: Combine the sampled predictions.

compute_metrics(x, y, predictions, sample_weight):: Specify the mean and stochastic part of the predictions to compute the metrics.

predict(data, S=5, verbose=0):: Computes the predictions of the model with the SWAG algorithm.

evaluate(x=None, y=None, S=5, sample_weight=None):: Evaluate the model with the SWAG algorithm.

References

2(1,2): Wesley J. Maddox et al. « A simple baseline for Bayesian uncertainty in deep learning ». In : Advances in Neural Information Processing Systems 32.NeurIPS (2019), p. 1-25. issn : 10495258. arXiv : 1902.02476.

_combine_predictions(predictions, stacked)[source]¶

Bayesian Model Averaging of the S predictions.

This method follows the _sample_prediction method. It takes in input the batch of S predictions sampled from _sample_prediction method. Then, it averages the predictions in order to compute the mean and the uncertainty associated with the prediction. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (S=number of samples):

Mean Variance Activation (see method MeanVarianceActivation)):
- Mean : \(\hat{\mu} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{\mu}_i\)
- Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{\mu})^2\)
- Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{S} \sum_{i=1}^{S} (\sigma^2_i)\)
No specific structure
- Mean : \(\hat{y} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{y}_i\)
- Variance : \(\hat{\sigma}^2 = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{y})^2\)

In the future, it would be possible to add other possibilities.

Parameters

predictions (tf.Tensor) – Batch of the S predictions computed by _sample_prediction.
stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

_sample_prediction(data, S, verbose=0)[source]¶

Sample predictions according to the posterior distribution of the parameters.

In the SWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article 2. The mean has been stored in the variable SWA_weights. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively in SWA_cov and deviation_matrix.

The method samples the weights and computes the prediction associated multiple times.

Parameters

data (tf.Tensor) – Input data (equivalent to x).
S (int) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.

Returns

preds – The batch of S predictions.

Return type

tf.Tensor

compute_metrics(x, y, predictions, sample_weight)[source]¶

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Warning

Unless the model predicts aleatoric uncertainty, the model can’t compute stochastic metrics before the end of the training.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

evaluate(x=None, y=None, S=5, sample_weight=None)[source]¶

Custom evaluate method.

It returns the loss value & metrics values for the model in test mode.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data
S (int, default:5) – The number of samples used in the Monte Carlo method.
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

Dict containing the values of the metrics and loss of the model.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]¶

Train the model with the SWAG algorithm.

The model is trained in two parts :

Before start_averaging epochs, the model is trained normally. It’s defined as the pretraining of the model and the training uses the optimizer and learning rate specified in the compile function.
After start_averaging epochs, the model is trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the model are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in _sample_prediction in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer.

MultiSWAG ¶

The MultiSWAG model was developped in order to have the advantages of the SWAG model and the DeepEnsemble model. Therefore, it’s an ensemble of bayesian deep learning models. For more details, see Bayesian deep learning and a probabilistic perspective of generalization.

Here are some examples using the MultiSWAGModel :

With high-level API (recommended usage) :

 inputs = Input(shape=(input_dim,))
 x = Dense(100, activation="relu")(inputs)
 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
 model = Model(inputs=inputs, outputs=outputs)
 DeepEnsemble = toMultiSWAG(model, nb_models)

With low-level layers :

 inputs = Input(shape=(input_dim,))
 x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
 outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
 model = MultiSWAGModel(inputs=inputs, outputs=outputs)

The MultiSWAGModel and the method toMultiSWAG are described above.

class purestochastic.model.swag.MultiSWAGModel(*args, **kwargs)[source]¶

Implementation of the MultiSWAG Model.

The MultiSWAG 3 (Multi Stochastic Weight Averaging Gaussian) is an ensemble of SWAG Model. It’s a mix between a DeepEnsemble and SWAG Model. For more details, see SWAGCallback, SWAGModel and DeepEnsembleModel.

The model can be constructed manually or it’s possible to use the method toMultiSWAG to convert a simple keras.Model object into a :class:MultiSWAGModel object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):: Trains the model with the MultiSWAG algorithm.

_sample_prediction(data, S, verbose=0):: Sample different prediction according to the posterior distribution of the parameters.

_combine_predictions(predictions, stacked):: Combine the sampled predictions made by all models.

compute_metrics(x, y, predictions, sample_weight):: Specify the mean and stochastic part of the predictions to compute the metrics.

evaluate(x=None, y=None, S=5, sample_weight=None):: Evaluate the model with the MultiSWAG algorithm.

predict(data, S=5, verbose=0):: Computes the predictions of the model with the MultiSWAG algorithm.

References

3: Andrew Gordon Wilson et Pavel Izmailov. « Bayesian deep learning and a probabilistic perspective of generalization ». In : Advances in Neural Information Processing Systems 2020- Decem.3 (2020). issn : 10495258. arXiv : 2002.08791.

_combine_predictions(predictions, sampled, stacked)[source]¶

Bayesian Model Averaging of the S predictions of the B models.

It’s a little bit different from the function in SWAGModel. There is 2 cases :

If sampled is False, the parameters of the posterior distribution have not been computed yet and so it’s impossible to sample predictions. Therefore, the function just combines the predictions made by all the models as in the DeepEnsembleModel.
If sampled is True, the parameters have been computed. So, this method follows the _sample_prediction method. It takes in input the batch of S predictions for each model sampled from _sample_prediction method. Then, it averages the predictions over the samples and the models in order to compute the mean and the uncertainty associated with the prediction.

The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models, S = number of samples) :

Mean Variance Activation (see method MeanVarianceActivation)):

Mean : \(\hat{\mu} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{\mu}_{i,j}\)

Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{\mu})^2\)

Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\sigma^2_{i,j})\)

No specific structure :

Mean : \(\hat{y} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{y}_{i,j}\)

Variance : \(\hat{\sigma}^2 = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{y})^2\)

In the future, it would be possible to add other possibilities.

Parameters

predictions (tf.Tensor) – Batch of the S predictions for each model computed by _sample_prediction.
sampled (boolean) – Boolean to indicate wheter the input have been sampled.
stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.

_sample_prediction(data, S, verbose=0)[source]¶

Sample predictions according to the posterior distribution of the parameters.

It’s the same function as in SWAGModel. In the MultiSWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article. The mean has been stored in the variable SWA_weights. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively in SWA_cov and deviation_matrix.

The method samples the weights and computes the prediction associated multiple times for each model independently.

Parameters

data (tf.Tensor) – Input data (equivalent to x).
S (int) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.

Returns

preds – The batch of S predictions.

Return type

tf.Tensor

compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]¶

Custom compute_loss function.

This method overrides the compute_loss function so that the class doesn’t need specific loss function. It computes the loss for each model independently. It’s the same function as in DeepEnsembleModel.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

The total loss.

compute_metrics(x, y, prediction, sample_weight)[source]¶

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

evaluate(x=None, y=None, S=5, sample_weight=None)[source]¶

Custom evaluate method.

It returns the loss value & metrics values for the model in test mode.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data
S (int, default:5) – The number of samples used in the Monte Carlo method.
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

Dict containing the values of the metrics and loss of the model.

fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]¶

Train the model with the MultiSWAG algorithm.

It’s the same function as in SWAGModel but with multiple models trained independently. The models are trained in two parts :

Before start_averaging epochs, the models are trained normally. It’s defined as the pretraining of the models and the training uses the optimizer and learning rate specified in the compile function.
After start_averaging epochs, the models are trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the models are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in _sample_prediction in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer. For more details, see SWAGCallback.

Parameters

X (np.ndarray) – The input data.
y (np.ndarray) – The target data.
start_averaging (int) – The number of epochs to pretrain the model.
learning_rate (float) – The learning rate of the MultiSWAG algorithm (second part).
update_frequency (int) – The number of epochs between each save of parameters of the MultiSWAG algorithm.
K (int) – The number of samples used to compute the covariance matrix.

Return type

History of the MultiSWAG’s training.

predict(data, S=5, verbose=0)[source]¶

Sample predictions and combine them.

It’s the same function as in SWAGModel This method defines the inference step of the MultiSWAG algorithm. First, it samples predictions of each model with the _sample_prediction method. Then, all the predictions are combined with the method _combine_predictions.

Parameters

data (np.ndarray) – The input data.
S (int, default:5) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.

Return type

The predictions of the model.

purestochastic.model.swag.toMultiSWAG(net, nb_models)[source]¶

Convert a regular model into a MultiSWAG model.

This method intends to be high-level interface to construct a MultiSWAG model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.

Parameters

net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model
nb_models (int) – the number of models

Returns

a MultiSWAG Model

Return type

MultiSWAGModel

Orthonormal Certificates ¶

The Orthonormal Certificates model was developped in order to quantify epistemic uncertainty with a single-model estimates. For more details, see Single-model uncertainties for deep learning.

Here are some examples using the OrthonormalCertificatesModel :

With high-level API (recommended usage) :

 inputs = Input(shape=(input_dim,))
 x = Dense(100, activation="relu")(inputs)
 outputs = Dense(output_dim)(x)
 model = Model(inputs=inputs, outputs=outputs)
 DeepEnsemble = toOrthonormalCertificates(model, K=100, nb_layers_head=1)

With low-level layers (not recommended) :

 inputs = Input(shape=(input_dim,))
 x = Dense(100, activation="relu")(inputs)
 outputs = Dense(output_dim)(x)
 outputs2 = Dense(K, kernel_regularizer=Orthonormality())(x)
 model = OrthonormalCertificatesModel(inputs=inputs, outputs=[outputs, outputs2])

The OrthonormalCertificatesModel and the method toOrthonormalCertificates are described above.

class purestochastic.model.orthonormal_certificates.OrthonormalCertificatesModel(*args, **kwargs)[source]¶

Implementation of the Orthonormal Certificates model.

The model was proposed in 4 . To estimate epistemic uncertainty, they propose Orthonormal Certificates (OCs), a collection of diverse non-constant functions that map all training samples to zero.

The model can be constructed manually (not recommended) or it’s possible to use the method toOrthonormalCertificates to convert a simple keras.Model object into a OrthonormalCertificatesModel object.

fit(X, y, epochs_oc=0, learning_rate_oc=0.001):: Fit the initial and OC model.

fit_oc(X, y, learning_rate_oc=0.001):: Fit the OC model.

compute_metrics(x, y, predictions, sample_weight):: Specify the mean and stochastic part of the predictions to compute the metrics.

predict(data, S=5, verbose=0):: Computes the predictions of the initial model and an epistemic score.

find_loss():: Returns the loss specified in compile.

References

4(1,2,3): Tagasovska, Natasa and Lopez-Paz, David. « Single-model uncertainties for deep learning ». In : Advances in Neural Information Processing Systems 2019.Nips (2019), p. 1-12. issn : 10495258. arXiv : 1811.00908.

compute_metrics(x, y, predictions, sample_weight)[source]¶

Custom compute_metrics method.

As stated in the parent method compute_metrics, this method called the parent function with the appropriate y_pred and stochastic_predictions arguments.

Warning

For OrthonormalCertificatesModel, the choice is to remove stochastic metrics because the certificates don’t have a real sense.

Parameters

x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.

Return type

See parent method.

find_loss()[source]¶

Returns the loss specified in the compile function.

Returns: The name of the loss.
Return type: str

fit(X, y, epochs_oc=0, learning_rate_oc=0.001, **kwargs)[source]¶

Train the model the initial model and the orthonormal certificates.

The model is trained in two parts :

During epochs epochs, the model is trained normally. It’s defined as the training of the initial model and the training uses the optimizer and learning rate specified in the compile function. The certificates are frozen.
During epochs_oc epochs, all the layer are frozen except the certificates. The training is parametrized by learning_rate_oc and the sum of the loss function specified in the compile function and the Orthonormality loss.

Note

By default, the parameter epochs_oc is set to 0, and the orthonormal certificates are not trained.

Custom Models¶

Stochastic Model¶

MVEM¶

DeepEnsemble¶

SWAG¶

MultiSWAG¶

Orthonormal Certificates¶

Stochastic Model ¶

MVEM ¶

DeepEnsemble ¶

SWAG ¶

MultiSWAG ¶

Orthonormal Certificates ¶