ganblr.models package

The GANBLR models.

class ganblr.models.GANBLR

Bases: object

The GANBLR Model.

evaluate(x, y, model='lr') float

Perform a TSTR(Training on Synthetic data, Testing on Real data) evaluation.

Parameters
  • x (array_like) – Test dataset.

  • y (array_like) – Test dataset.

  • model (str or object) – The model used for evaluate. Should be one of [‘lr’, ‘mlp’, ‘rf’], or a model class that have sklearn-style fit and predict method.

  • Return

  • --------

  • accuracy_score (float.) –

fit(x, y, k=0, batch_size=32, epochs=10, warmup_epochs=1, verbose=1)

Fit the model to the given data.

Parameters
  • x (array_like of shape (n_samples, n_features)) – Dataset to fit the model. The data should be discrete.

  • y (array_like of shape (n_samples,)) – Label of the dataset.

  • k (int, default=0) – Parameter k of ganblr model. Must be greater than 0. No more than 2 is Suggested.

  • batch_size (int, default=32) – Size of the batch to feed the model at each step.

  • epochs (int, default=0) – Number of epochs to use during training.

  • warmup_epochs (int, default=1) – Number of epochs to use in warmup phase. Defaults to 1.

  • verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.

Returns

self – Fitted model.

Return type

object

sample(size=None, verbose=1) ndarray

Generate synthetic data.

Parameters
  • size (int or None) – Size of the data to be generated. set to None to make the size equal to the size of the training set.

  • verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.

  • Return

  • -----------------

  • synthetic_samples (np.ndarray) – Generated synthetic data.

class ganblr.models.GANBLRPP(numerical_columns, random_state=None)

Bases: object

The GANBLR++ model.

Parameters
  • numerical_columns (list of int) – Indicating the indexes of numerical columns. For example, if the 3, 5, 10th feature of a data is numerical feature, then this param should be [3, 5, 10].

  • random_state (int, RandomState instance or None) – Controls the random seed given to the method chosen to initialize the parameters of BayesianGaussianMixture used by GANBLRPP.

evaluate(x, y, model='lr')

Perform a TSTR(Training on Synthetic data, Testing on Real data) evaluation.

Parameters
  • x (array_like) – Test dataset.

  • y (array_like) – Test dataset.

  • model (str or object) – The model used for evaluate. Should be one of [‘lr’, ‘mlp’, ‘rf’], or a model class that have sklearn-style fit and predict method.

  • Return

  • --------

  • accuracy_score (float.) –

fit(x, y, k=0, batch_size=32, epochs=10, warmup_epochs=1, verbose=1)

Fit the model to the given data.

Parameters
  • x (array_like of shape (n_samples, n_features)) – Dataset to fit the model. The data should be discrete.

  • y (array_like of shape (n_samples,)) – Label of the dataset.

  • k (int, default=0) – Parameter k of ganblr model. Must be greater than 0. No more than 2 is Suggested.

  • batch_size (int, default=32) – Size of the batch to feed the model at each step.

  • epochs (int, default=0) – Number of epochs to use during training.

  • warmup_epochs (int, default=1) – Number of epochs to use in warmup phase. Defaults to 1.

  • verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.

Returns

self – Fitted model.

Return type

object

sample(size=None, verbose=1)

Generate synthetic data.

Parameters
  • size (int or None) – Size of the data to be generated. set to None to make the size equal to the size of the training set.

  • verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.

  • Return

  • -----------------

  • synthetic_samples (np.ndarray) – Generated synthetic data.