Custom Models¶
This module contains all the custom stochastic models. It helps the user to choose a pre-implemented model
and to use directly on its own use case. The first class StochasticModel
is an abstract class on which
all other class are based on.
Each custom model has a custom class that is a subclass of StochasticModel
and a high-level function that allows
to convert to usual object of type tf.keras.Model
or tf.keras.SequentialModel to the corresponding
model.
The list of available models is :
Stochastic Model¶
StochasticModel
is a subclass of tf.keras.Model that
was defined in order to add stochastic metrics. It’s an abstract class that can’t be instanciate. All other custom models
are subclasses of StochasticModel
.
Tip
If you create a custom stochastic model, you may want it to be a subclass of
StochasticModel
.
Ths class is useful to integrate stochatic metrics into the training and testing process. For example, the piece of code below shows how to use the stochastic metrics in a custom model :
>>> model.compile(stochastic_metrics='picp')
>>> model.fit(x_train, y_train, epochs=6)
Epoch 1/6
1/1 [==============================] - 4s 4s/step - loss: 1.5937 - picp: 1.0000
Epoch 2/6
1/1 [==============================] - 0s 9ms/step - loss: 1.1870 - picp: 1.0000
Epoch 3/6
1/1 [==============================] - 0s 10ms/step - loss: 1.1034 - picp: 1.0000
Epoch 4/6
1/1 [==============================] - 0s 6ms/step - loss: 1.0789 - picp: 0.9500
Epoch 5/6
1/1 [==============================] - 0s 5ms/step - loss: 1.0209 - picp: 0.8500
Epoch 6/6
1/1 [==============================] - 0s 6ms/step - loss: 0.9594 - picp: 0.9000
An explanation of the class StochasticModel
is defined below.
- class purestochastic.model.base_uncertainty_models.StochasticModel(*args, **kwargs)[source]¶
StochasticModel
allows to make stochastic training and inference features.StochasticModel
is a subclass ofkeras.Model
that allows to construct stochastic model. Stochastic model often outputs the parameters of a parametric distribution or quantiles of a generic distribution. For example, it outputs the mean and the variance of a Gaussian Distribution.However, with standard class
keras.Model
, all the metrics need to take the same input values. Nevertheless, deterministic and stochastic metrics don’t take the same input values. Therefore,StochasticModel
adds the possibility to have deterministic as well as stochastic metrics. Stochastic metrics need to be specified whenmodel.compile
is called withstochastic_metrics
orstochastic_weigthed_metrics
arguments.The class is abstract and can’t be instanciate. Subclass need to override their own
compute_metrics(self, x, y, prediction, sample_weight)
method that will called the parent method with the appropriate y_pred and stochastic_predictions arguments.- compile(stochastic_metrics=None, stochastic_weigthed_metrics=None):
Compile the model and add stochastic metrics.
- compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight):
Compute the values of the deterministic and stochastic metrics.
- reset_metrics():
Reset the state of deterministic and stochastic metrics
Warning
All the stochastic metrics need to take the same input values. They have to be consistent together.
- compute_metrics(x, y, y_pred, stochastic_predictions, sample_weight)[source]¶
Compute the metrics.
The method called the parent method
compute_metrics
to compute the deterministic metrics and then compute the stochastic metrics manually.The methods takes one additional parameter
stochastic_predictions
that it’s specified by methods of subclass. This has to be the same for all the stochastic metrics.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Mean prediction for y.
stochastic_predictions (tf.Tensor) – Stochastic predictions for y.
sample_weight – Sample weights for weighting the metrics.
- Returns
metric_results – Value of each metric.
- Return type
dict
MVEM¶
Todo
Add a class for MVEM which deals with compute_metrics
DeepEnsemble¶
The DeepEnsemble model is an ensemble of Deep Learning model. The idea is to train the same model multiple times with different random seeds and then average the results in order to have diverse predictions. It’s then possible to combine the predictions. For more details, see the papers Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles or Deep Ensembles: A Loss Landscape Perspective.
Tip
The DeepEnsemble model seems to work better if the kernel initializer is not set to ‘glorot_uniform’.
(the default value) but to ‘random_normal’ with a high variance stddev
.
Here are some examples using the DeepEnsembleModel
:
With high-level API (recommended usage) :
1inputs = Input(shape=(input_dim,))
2x = Dense(100, activation="relu")(inputs)
3outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4model = Model(inputs=inputs, outputs=outputs)
5DeepEnsemble = toDeepEnsemble(model, nb_models)
With low-level layers :
1 inputs = Input(shape=(input_dim,))
2 x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
3 outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = DeepEnsembleModel(inputs=inputs, outputs=outputs)
The DeepEnsembleModel
and the method toDeepEnsemble
are described above.
- class purestochastic.model.deep_ensemble.DeepEnsembleModel(*args, **kwargs)[source]¶
Implementation of the DeepEnsemble model.
The Deep Ensemble 1 is an ensemble of Deep Learning model trained independently and combined for prediction in order to estimate uncertainty.
The model can be constructed manually or it’s possible to use the method
toDeepEnsemble
to convert a simplekeras.Model
object into aDeepEnsembleModel
object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.- compute_loss(x=None, y=None, y_pred=None, sample_weight=None):
Compute the loss independently for each model.
- _combine_predictions(predictions, stacked):
Combine the predictions made by the models.
- compute_metrics(x, y, predictions, sample_weight):
Specify the mean and stochastic part of the predictions to compute the metrics.
- predict(x):
Compute the predictions of the model thanks to the _combine_predictions method.
References
- 1
Balaji Lakshminarayanan, Alexander Pritzel et Charles Blundell. « Simple and scalable predictive uncertainty estimation using deep ensembles ». In : Advances in Neural Information Processing Systems 2017-Decem.Nips (2017), p. 6403-6414. issn : 10495258. arXiv : 1612.01474.
- _combine_predictions(predictions, stacked)[source]¶
Combine the predictions of all the models in order to quantify the uncertainty.
This method combines the prediction of all the models in order to quantify uncertainty. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models):
Mean Variance Activation (see method
MeanVarianceActivation
)):Mean : \(\hat{\mu} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{\mu}_i\)
Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{\mu})^2\)
Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B} \sum_{i=1}^{B} (\sigma^2_i)\)
No specific structure :
Mean : \(\hat{y} = \dfrac{1}{B} \sum_{i=1}^{B} \hat{y}_i\)
Variance : \(\hat{\sigma}^2 = \dfrac{1}{B} \sum_{i=1}^{B} (\hat{y}_i - \hat{y})^2\)
In the future, it will be possible to add other possibilities.
- Parameters
predictions (tf.Tensor) – Predictions returned by the model (output of
model(x)
)stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.
- Returns
Predictions that have been combined. If
stacked
is True, the output is a one tensor.Otherwise, the output is a list of tensors.
- compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]¶
Custom
compute_loss
function.This method overrides the
compute_loss
function so that the class doesn’t need specific loss function. It computes the loss for each model independently.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Predictions returned by the model (output of
model(x)
)sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
The total loss.
- compute_metrics(x, y, predictions, sample_weight)[source]¶
Custom
compute_metrics
method.As stated in the parent method
compute_metrics
, this method called the parent function with the appropriatey_pred
andstochastic_predictions
arguments.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of
model(x)
)sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
See parent method.
- predict(x, **kwargs)[source]¶
Combine predictions made by all the models.
This method just called the parent’s method and then combine predictions in order to quantify uncertainty.
- Parameters
x (tf.Tensor) – Input data.
kwargs (optional) – Other Arguments of the predict parent’s method.
- Returns
Predictions made by the Deep Ensemble model.
- Return type
np.ndarray
- purestochastic.model.deep_ensemble.toDeepEnsemble(net, nb_models)[source]¶
Convert a regular model into a deep ensemble model.
This method intends to be high-level interface to construct a Deep Ensemble model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.
- Parameters
net (tf.keras.Sequential or tf.keras.Model) – a tensorflow model
nb_models (int) – the number of models
- Returns
a Deep Ensemble Model
- Return type
Todo
Add support for other architectures
SWAG¶
The SWAG model is a bayesian model and especially a bayesian model averaging with variational inference. The model fits a the posterio distribution of the parameter of the model during the training of the model exploiting specific properties of the optimization process. For more details, see the papers A Simple Baseline for Bayesian Uncertainty in Deep Learning.
Tip
It’s important to fix the learning rate of the second optimization process quite high so that the parameters are enough diverse. It’s not a problem is the loss is not stable, that is the loss increases and decreases itertively.
Warning
This method uses two optimization process. The first one is the pretaining with the argument given in the compile
method. The second one is the
training of the model with the argument given in the fit
method.
Here are some examples using the SWAGModel
:
With high-level API (recommended usage) :
1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 SWAG = toSWAG(model)
With low-level layers :
1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = SWAGModel(inputs=inputs, outputs=outputs)
The SWAGModel
and the methods toSWAG
and SWAGCallback
are described above.
The principal logic of the SWAG algorithm is in the SWAGCallback
class.
- class purestochastic.model.swag.SWAGModel(*args, **kwargs)[source]¶
Implementation of the SWAG Model.
The SWAG 2 (Stochastic Weight Averaging Gaussian) is a model to make bayesian inference and training to quantify uncertainty. For more details, see
SWAGCallback
.The model can be constructed manually or it’s possible to use the method toSWAG to convert a simple
keras.Model
object into aSWAGModel
object.- fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):
Trains the model with the SWAG algorithm.
- _sample_prediction(data, S, verbose=0):
Sample different prediction according to the posterior distribution of the parameters.
- _combine_predictions(predictions, stacked):
Combine the sampled predictions.
- compute_metrics(x, y, predictions, sample_weight):
Specify the mean and stochastic part of the predictions to compute the metrics.
- predict(data, S=5, verbose=0):
Computes the predictions of the model with the SWAG algorithm.
- evaluate(x=None, y=None, S=5, sample_weight=None):
Evaluate the model with the SWAG algorithm.
References
- 2(1,2)
Wesley J. Maddox et al. « A simple baseline for Bayesian uncertainty in deep learning ». In : Advances in Neural Information Processing Systems 32.NeurIPS (2019), p. 1-25. issn : 10495258. arXiv : 1902.02476.
- _combine_predictions(predictions, stacked)[source]¶
Bayesian Model Averaging of the S predictions.
This method follows the
_sample_prediction
method. It takes in input the batch of S predictions sampled from_sample_prediction
method. Then, it averages the predictions in order to compute the mean and the uncertainty associated with the prediction. The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (S=number of samples):Mean Variance Activation (see method
MeanVarianceActivation
)):Mean : \(\hat{\mu} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{\mu}_i\)
Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{\mu})^2\)
Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{S} \sum_{i=1}^{S} (\sigma^2_i)\)
No specific structure
Mean : \(\hat{y} = \dfrac{1}{S} \sum_{i=1}^{S} \hat{y}_i\)
Variance : \(\hat{\sigma}^2 = \dfrac{1}{S} \sum_{i=1}^{S} (\hat{y}_i - \hat{y})^2\)
In the future, it would be possible to add other possibilities.
- Parameters
predictions (tf.Tensor) – Batch of the S predictions computed by
_sample_prediction
.stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.
- _sample_prediction(data, S, verbose=0)[source]¶
Sample predictions according to the posterior distribution of the parameters.
In the SWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article 2. The mean has been stored in the variable
SWA_weights
. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively inSWA_cov
anddeviation_matrix
.The method samples the weights and computes the prediction associated multiple times.
- Parameters
data (tf.Tensor) – Input data (equivalent to x).
S (int) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.
- Returns
preds – The batch of S predictions.
- Return type
tf.Tensor
- compute_metrics(x, y, predictions, sample_weight)[source]¶
Custom
compute_metrics
method.As stated in the parent method
compute_metrics
, this method called the parent function with the appropriatey_pred
andstochastic_predictions
arguments.Warning
Unless the model predicts aleatoric uncertainty, the model can’t compute stochastic metrics before the end of the training.
- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
See parent method.
- evaluate(x=None, y=None, S=5, sample_weight=None)[source]¶
Custom
evaluate
method.It returns the loss value & metrics values for the model in test mode.
- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data
S (int, default:5) – The number of samples used in the Monte Carlo method.
sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
Dict containing the values of the metrics and loss of the model.
- fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]¶
Train the model with the SWAG algorithm.
The model is trained in two parts :
Before
start_averaging
epochs, the model is trained normally. It’s defined as the pretraining of the model and the training uses the optimizer and learning rate specified in thecompile
function.After
start_averaging
epochs, the model is trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the model are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in_sample_prediction
in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer.
See also
src.model.swag.SWAGCallback
- Parameters
X (np.ndarray) – The input data.
y (np.ndarray) – The target data.
start_averaging (int) – The number of epochs to pretrain the model.
learning_rate (float) – The learning rate of the SWAG algorithm (second part).
update_frequency (int) – The number of epochs between each save of parameters of the SWAG algorithm.
K (int) – The number of samples used to compute the covariance matrix.
- Return type
History of the SWAG’s training.
- predict(data, S=5, verbose=0)[source]¶
Sample predictions and combine them.
This method defines the inference step of the SWAG algorithm. First, it samples predictions of the model with the
_sample_prediction
method. Then, the predictions are combined with the method_combine_predictions
.- Parameters
data (numpy.ndarray) – The input data.
S (int, default:5) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.
- Return type
The predictions of the model.
- purestochastic.model.swag.toSWAG(net)[source]¶
Convert a regular model into a SWAG model.
This method intends to be high-level interface to construct a SWAG model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.
- Parameters
net (
tf.keras.Sequential
ortf.keras.Model
) – a tensorflow modelnb_models (int) – the number of models
- Returns
a SWAG Model
- Return type
- class purestochastic.model.swag.SWAGCallback(learning_rate, update_frequency, K)[source]¶
Approximation of the posterior distribution of parameters as a gaussian distribution.
Callback used in the class
SWAGModel
andMultiSWAGModel
. It allows to approximate the posterior distribution of the parameters as a gaussian distribution. The parameters of the gaussian distribution are computed as follows :The mean of the gaussian is the mean of the parameters (first moment) found during the training process. Mathematically, it is defined as :
\[\theta_{SWA} = \frac{1}{T} \sum_{t=1}^T \theta_t\]The covariance matrix is constructed by taking half of a diagonal approximation and half of a low-rank approximation of the covariance matrix. The diagonal approximation is computed at the end of the training by using the first and second order moments of the parameters :
\[\Sigma_{Diag} = diag(\bar{\theta}^2-\theta_{SWA}^2)\]The low-rank approximation is constructed by using the difference of the last K values of the parameters with the mean value of the parameters :
\[\Sigma_{low-rank} = \frac{1}{K-1}.\hat{D}\hat{D}^T \text{ avec chaque colonne de D } D_t=(\theta_t - \bar{\theta}_t)\]To sample from this gaussian distribution, the SWAGModel and MultiSWAGModel use the following equation :
\[\theta_j = \theta_{SWA} +\frac{1}{\sqrt{2}}.\Sigma_{diag}^{\frac{1}{2}}n_1 + \frac{1}{\sqrt{2(K-1)}}\hat{D}n_2, ~~ n_1, n_2 \sim \mathcal{N}(0,I)\]It is then sufficient to store the matrix D, the first order moments of the parameters as well as the diagonal approximation of the covariance at the end of the training.
- Parameters
learning_rate (float) – The learning rate of the optimizer.
update_frequency (int) – The number of epochs between two updates of the first and second moments of the parameters.
K (int) – The number of samples used to compute the second order moments.
- on_epoch_end(epoch, logs=None)[source]¶
Updates first and second order moments as well as deviation matrix.
Every
update_frequency
epochs, the first and second order moments as well as deviation matrix are updated :\[\bar{\theta} = \frac{n \bar{\theta} + \theta_{epochs}}{n+1}\]\[\bar{\theta}^2 = \frac{n \bar{\theta}^2 + \theta_{epochs}^2}{n+1}\]\[\text{APPEND_COL}(\hat{D}, \theta_{epochs}-\bar{\theta})\]If the matrix D has more than K columns, the oldest columns is removed.
- Parameters
epoch (int) – The number of the actual epoch.
- on_train_end(logs=None)[source]¶
Compute and store the variables needed to sample the posterior distribution.
The mean of the gaussian distribution is saved in the attribute
SWA_weights
of the model. The deviation matrix used in the covariance matrix is saved in the attributedeviation_matrix
of the model. Finally, the root of the diagonal matrix used in the covariance matrix is computed and saved in the attributeSWA_cov
of the model.- Parameters
logs (optional) – See tf.keras.callbacks.Callback
MultiSWAG¶
The MultiSWAG model was developped in order to have the advantages of the SWAG model and the DeepEnsemble model. Therefore, it’s an ensemble of bayesian deep learning models. For more details, see Bayesian deep learning and a probabilistic perspective of generalization.
Here are some examples using the MultiSWAGModel
:
With high-level API (recommended usage) :
1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense2Dto3D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 DeepEnsemble = toMultiSWAG(model, nb_models)
With low-level layers :
1 inputs = Input(shape=(input_dim,))
2 x = Dense2Dto3D(nb_models,100, activation="relu")(inputs)
3 outputs = Dense3Dto4D(output_dim, 2, activation=MeanVarianceActivation)(x)
4 model = MultiSWAGModel(inputs=inputs, outputs=outputs)
The MultiSWAGModel
and the method toMultiSWAG
are described above.
- class purestochastic.model.swag.MultiSWAGModel(*args, **kwargs)[source]¶
Implementation of the MultiSWAG Model.
The MultiSWAG 3 (Multi Stochastic Weight Averaging Gaussian) is an ensemble of SWAG Model. It’s a mix between a DeepEnsemble and SWAG Model. For more details, see
SWAGCallback
,SWAGModel
andDeepEnsembleModel
.The model can be constructed manually or it’s possible to use the method
toMultiSWAG
to convert a simplekeras.Model
object into a :class:MultiSWAGModel object. This class don’t need specific loss function and can’t use all of the tensorflow loss function and also custom loss functions.- fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10):
Trains the model with the MultiSWAG algorithm.
- _sample_prediction(data, S, verbose=0):
Sample different prediction according to the posterior distribution of the parameters.
- _combine_predictions(predictions, stacked):
Combine the sampled predictions made by all models.
- compute_metrics(x, y, predictions, sample_weight):
Specify the mean and stochastic part of the predictions to compute the metrics.
- evaluate(x=None, y=None, S=5, sample_weight=None):
Evaluate the model with the MultiSWAG algorithm.
- predict(data, S=5, verbose=0):
Computes the predictions of the model with the MultiSWAG algorithm.
References
- 3
Andrew Gordon Wilson et Pavel Izmailov. « Bayesian deep learning and a probabilistic perspective of generalization ». In : Advances in Neural Information Processing Systems 2020- Decem.3 (2020). issn : 10495258. arXiv : 2002.08791.
- _combine_predictions(predictions, sampled, stacked)[source]¶
Bayesian Model Averaging of the S predictions of the B models.
It’s a little bit different from the function in
SWAGModel
. There is 2 cases :If sampled is False, the parameters of the posterior distribution have not been computed yet and so it’s impossible to sample predictions. Therefore, the function just combines the predictions made by all the models as in the
DeepEnsembleModel
.If sampled is True, the parameters have been computed. So, this method follows the
_sample_prediction
method. It takes in input the batch of S predictions for each model sampled from_sample_prediction
method. Then, it averages the predictions over the samples and the models in order to compute the mean and the uncertainty associated with the prediction.
The computation of uncertainty and the mean prediction is different according to the structure of the network. For the moment, there are 2 possibilities (B = number of models, S = number of samples) :
Mean Variance Activation (see method
MeanVarianceActivation
)):Mean : \(\hat{\mu} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{\mu}_{i,j}\)
Epistemic Variance : \(\hat{\sigma}^2_{epi} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{\mu})^2\)
Aleatoric Variance : \(\hat{\sigma}^2_{alea} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\sigma^2_{i,j})\)
No specific structure :
Mean : \(\hat{y} = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} \hat{y}_{i,j}\)
Variance : \(\hat{\sigma}^2 = \dfrac{1}{B*S} \sum_{i=1}^{B} \sum_{j=1}^{S} (\hat{y}_{i,j} - \hat{y})^2\)
In the future, it would be possible to add other possibilities.
- Parameters
predictions (tf.Tensor) – Batch of the S predictions for each model computed by
_sample_prediction
.sampled (boolean) – Boolean to indicate wheter the input have been sampled.
stacked (boolean) – Boolean to indicate wheter the output should be stacked in a single tensor or not.
- _sample_prediction(data, S, verbose=0)[source]¶
Sample predictions according to the posterior distribution of the parameters.
It’s the same function as in
SWAGModel
. In the MultiSWAG algorithm, the posterior distribution of the parameters is approximated as a Gaussian Distribution. The mean and the covariance are specified in the report associated with the code or in the article. The mean has been stored in the variableSWA_weights
. The diagonal and the Kth-rank approximation of the covariance matrix have been stored respectively inSWA_cov
anddeviation_matrix
.The method samples the weights and computes the prediction associated multiple times for each model independently.
- Parameters
data (tf.Tensor) – Input data (equivalent to x).
S (int) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.
- Returns
preds – The batch of S predictions.
- Return type
tf.Tensor
- compute_loss(x=None, y=None, y_pred=None, sample_weight=None)[source]¶
Custom
compute_loss
function.This method overrides the
compute_loss
function so that the class doesn’t need specific loss function. It computes the loss for each model independently. It’s the same function as inDeepEnsembleModel
.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
y_pred (tf.Tensor) – Predictions returned by the model (output of
model(x)
)sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
The total loss.
- compute_metrics(x, y, prediction, sample_weight)[source]¶
Custom
compute_metrics
method.As stated in the parent method
compute_metrics
, this method called the parent function with the appropriatey_pred
andstochastic_predictions
arguments.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
See parent method.
- evaluate(x=None, y=None, S=5, sample_weight=None)[source]¶
Custom
evaluate
method.It returns the loss value & metrics values for the model in test mode.
- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data
S (int, default:5) – The number of samples used in the Monte Carlo method.
sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
Dict containing the values of the metrics and loss of the model.
- fit(X, y, start_averaging=10, learning_rate=0.001, update_frequency=1, K=10, **kwargs)[source]¶
Train the model with the MultiSWAG algorithm.
It’s the same function as in
SWAGModel
but with multiple models trained independently. The models are trained in two parts :Before
start_averaging
epochs, the models are trained normally. It’s defined as the pretraining of the models and the training uses the optimizer and learning rate specified in thecompile
function.After
start_averaging
epochs, the models are trained with the SWAG callback. In other words, at the end of specific epochs (according to parameters), the parameters of the models are saved. At the end of the training, the callback computes the parameters of the approximated posterior gaussian distribution. The parameters are then used in_sample_prediction
in order to sample different predictions. At present, the optimizer is necessarily the SGD optimizer. For more details, seeSWAGCallback
.
- Parameters
X (np.ndarray) – The input data.
y (np.ndarray) – The target data.
start_averaging (int) – The number of epochs to pretrain the model.
learning_rate (float) – The learning rate of the MultiSWAG algorithm (second part).
update_frequency (int) – The number of epochs between each save of parameters of the MultiSWAG algorithm.
K (int) – The number of samples used to compute the covariance matrix.
- Return type
History of the MultiSWAG’s training.
- predict(data, S=5, verbose=0)[source]¶
Sample predictions and combine them.
It’s the same function as in
SWAGModel
This method defines the inference step of the MultiSWAG algorithm. First, it samples predictions of each model with the_sample_prediction
method. Then, all the predictions are combined with the method_combine_predictions
.- Parameters
data (np.ndarray) – The input data.
S (int, default:5) – The number of samples used in the Monte Carlo method.
verbose (int, default:0) – The verbosity level.
- Return type
The predictions of the model.
- purestochastic.model.swag.toMultiSWAG(net, nb_models)[source]¶
Convert a regular model into a MultiSWAG model.
This method intends to be high-level interface to construct a MultiSWAG model from a regular model. At present, only the densely-connected NN is compatible with a fully parallelizable implementation. Other architecture are just concatenated models.
- Parameters
net (
tf.keras.Sequential
ortf.keras.Model
) – a tensorflow modelnb_models (int) – the number of models
- Returns
a MultiSWAG Model
- Return type
Orthonormal Certificates¶
The Orthonormal Certificates model was developped in order to quantify epistemic uncertainty with a single-model estimates. For more details, see Single-model uncertainties for deep learning.
Here are some examples using the OrthonormalCertificatesModel
:
With high-level API (recommended usage) :
1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense(output_dim)(x)
4 model = Model(inputs=inputs, outputs=outputs)
5 DeepEnsemble = toOrthonormalCertificates(model, K=100, nb_layers_head=1)
With low-level layers (not recommended) :
1 inputs = Input(shape=(input_dim,))
2 x = Dense(100, activation="relu")(inputs)
3 outputs = Dense(output_dim)(x)
4 outputs2 = Dense(K, kernel_regularizer=Orthonormality())(x)
5 model = OrthonormalCertificatesModel(inputs=inputs, outputs=[outputs, outputs2])
The OrthonormalCertificatesModel
and the method toOrthonormalCertificates
are described above.
- class purestochastic.model.orthonormal_certificates.OrthonormalCertificatesModel(*args, **kwargs)[source]¶
Implementation of the Orthonormal Certificates model.
The model was proposed in 4 . To estimate epistemic uncertainty, they propose Orthonormal Certificates (OCs), a collection of diverse non-constant functions that map all training samples to zero.
The model can be constructed manually (not recommended) or it’s possible to use the method
toOrthonormalCertificates
to convert a simplekeras.Model
object into aOrthonormalCertificatesModel
object.- fit(X, y, epochs_oc=0, learning_rate_oc=0.001):
Fit the initial and OC model.
- fit_oc(X, y, learning_rate_oc=0.001):
Fit the OC model.
- compute_metrics(x, y, predictions, sample_weight):
Specify the mean and stochastic part of the predictions to compute the metrics.
- predict(data, S=5, verbose=0):
Computes the predictions of the initial model and an epistemic score.
- find_loss():
Returns the loss specified in
compile
.
References
- 4(1,2,3)
Tagasovska, Natasa and Lopez-Paz, David. « Single-model uncertainties for deep learning ». In : Advances in Neural Information Processing Systems 2019.Nips (2019), p. 1-12. issn : 10495258. arXiv : 1811.00908.
- compute_metrics(x, y, predictions, sample_weight)[source]¶
Custom
compute_metrics
method.As stated in the parent method
compute_metrics
, this method called the parent function with the appropriatey_pred
andstochastic_predictions
arguments.Warning
For
OrthonormalCertificatesModel
, the choice is to remove stochastic metrics because the certificates don’t have a real sense.- Parameters
x (tf.Tensor) – Input data.
y (tf.Tensor) – Target data.
predictions (tf.Tensor) – Predictions returned by the model (output of model(x))
sample_weight (optional) – Sample weights for weighting the loss function.
- Return type
See parent method.
- find_loss()[source]¶
Returns the loss specified in the
compile function
.- Returns
The name of the loss.
- Return type
str
- fit(X, y, epochs_oc=0, learning_rate_oc=0.001, **kwargs)[source]¶
Train the model the initial model and the orthonormal certificates.
The model is trained in two parts :
During
epochs
epochs, the model is trained normally. It’s defined as the training of the initial model and the training uses the optimizer and learning rate specified in thecompile
function. The certificates are frozen.During
epochs_oc
epochs, all the layer are frozen except the certificates. The training is parametrized bylearning_rate_oc
and the sum of the loss function specified in thecompile
function and the Orthonormality loss.
Note
By default, the parameter
epochs_oc
is set to 0, and the orthonormal certificates are not trained.See also
purestochastic.common.regularizer.Orthonormality
- Parameters
X (np.ndarray) – The input data.
y (np.ndarray) – The target data.
epoch_oc (int (default : 0)) – Number of epochs for the training of certificates.
learning_rate_oc (float (default : 0.001)) – Learning rate for the training of certificates.
- Return type
History of the two trainings.
- fit_oc(X, y, learning_rate_oc=0.001, **kwargs)[source]¶
Train the orthonormal certificates.
All the layer are frozen except the orthonormal certificates. The model is trained with the optimizer specified in the
compile
function with the learning ratelearning_rate_oc
. The loss is the sum of the two following parts :The loss function with predicted value set to the output of the orthonormal certificates and target value set to 0.
The orthonormality regularizer added to the kernel so that the certificates are orthonormal. For more details, see :class:
purestochastic.common.regularizer.Orthonormality
.
The details of the method is detailled in 4.
- Parameters
X (np.ndarray) – The input data.
y (np.ndarray) – The target data.
learning_rate_oc (float (default : 0.001)) – Learning rate for the training of certificates.
- Return type
History of the training.
- predict(x, **kwargs)[source]¶
Compute predictions.
This method just called the parent’s method to compute the predictions of the initial model and the orthonormal certificates. The norm of the orthonormal certificates is computed in order to have a score for the epistemic uncertainty as defined in the article 4 .
- Parameters
x (tf.Tensor) – Input data.
kwargs (optional) – Other Arguments of the predict parent’s method.
- Returns
Predictions made by the Deep Ensemble model.
- Return type
np.ndarray
- purestochastic.model.orthonormal_certificates.toOrthonormalCertificates(net, K, nb_layers_head, multiple_miso=True, lambda_coeff=1)[source]¶
Convert a regular model into a Orthonormal Certificates model.
This method intends to be high-level interface to construct a Orthonormal Certificates model from a regular model.
- Parameters
net (
tf.keras.Sequential
ortf.keras.Model
) – a tensorflow modelnb_models (int) – the number of models
- Returns
a Orthonormal Certificates Model
- Return type
class OrthonormalCertificatesModel