ganblr.models package¶
The GANBLR models.
- class ganblr.models.GANBLR¶
Bases:
objectThe GANBLR Model.
- evaluate(x, y, model='lr') float¶
Perform a TSTR(Training on Synthetic data, Testing on Real data) evaluation.
- Parameters
x (array_like) – Test dataset.
y (array_like) – Test dataset.
model (str or object) – The model used for evaluate. Should be one of [‘lr’, ‘mlp’, ‘rf’], or a model class that have sklearn-style fit and predict method.
Return –
-------- –
accuracy_score (float.) –
- fit(x, y, k=0, batch_size=32, epochs=10, warmup_epochs=1, verbose=1)¶
Fit the model to the given data.
- Parameters
x (array_like of shape (n_samples, n_features)) – Dataset to fit the model. The data should be discrete.
y (array_like of shape (n_samples,)) – Label of the dataset.
k (int, default=0) – Parameter k of ganblr model. Must be greater than 0. No more than 2 is Suggested.
batch_size (int, default=32) – Size of the batch to feed the model at each step.
epochs (int, default=0) – Number of epochs to use during training.
warmup_epochs (int, default=1) – Number of epochs to use in warmup phase. Defaults to
1.verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.
- Returns
self – Fitted model.
- Return type
object
- sample(size=None, verbose=1) ndarray¶
Generate synthetic data.
- Parameters
size (int or None) – Size of the data to be generated. set to None to make the size equal to the size of the training set.
verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.
Return –
----------------- –
synthetic_samples (np.ndarray) – Generated synthetic data.
- class ganblr.models.GANBLRPP(numerical_columns, random_state=None)¶
Bases:
objectThe GANBLR++ model.
- Parameters
numerical_columns (list of int) – Indicating the indexes of numerical columns. For example, if the 3, 5, 10th feature of a data is numerical feature, then this param should be [3, 5, 10].
random_state (int, RandomState instance or None) – Controls the random seed given to the method chosen to initialize the parameters of BayesianGaussianMixture used by GANBLRPP.
- evaluate(x, y, model='lr')¶
Perform a TSTR(Training on Synthetic data, Testing on Real data) evaluation.
- Parameters
x (array_like) – Test dataset.
y (array_like) – Test dataset.
model (str or object) – The model used for evaluate. Should be one of [‘lr’, ‘mlp’, ‘rf’], or a model class that have sklearn-style fit and predict method.
Return –
-------- –
accuracy_score (float.) –
- fit(x, y, k=0, batch_size=32, epochs=10, warmup_epochs=1, verbose=1)¶
Fit the model to the given data.
- Parameters
x (array_like of shape (n_samples, n_features)) – Dataset to fit the model. The data should be discrete.
y (array_like of shape (n_samples,)) – Label of the dataset.
k (int, default=0) – Parameter k of ganblr model. Must be greater than 0. No more than 2 is Suggested.
batch_size (int, default=32) – Size of the batch to feed the model at each step.
epochs (int, default=0) – Number of epochs to use during training.
warmup_epochs (int, default=1) – Number of epochs to use in warmup phase. Defaults to
1.verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.
- Returns
self – Fitted model.
- Return type
object
- sample(size=None, verbose=1)¶
Generate synthetic data.
- Parameters
size (int or None) – Size of the data to be generated. set to None to make the size equal to the size of the training set.
verbose (int, default=1) – Whether to output the log. Use 1 for log output and 0 for complete silence.
Return –
----------------- –
synthetic_samples (np.ndarray) – Generated synthetic data.