autotst package
Submodules
autotst.autotst_types module
- autotst.autotst_types.Dataset
Numpy array of any shape and type, used for datasets
- autotst.autotst_types.Labels
One dimentional array of ints, used for labels (with values 0 and 1)
- autotst.autotst_types.ListFloats
One dimentional array of floats, used for weights and predictions
- autotst.autotst_types.Samples
Numpy array of any shape and type, used for the distribution’s samples
autotst.functions module
- autotst.functions.fit_witness(data_train: NDArray[Any, Ellipsis, Any], label_train: NDArray[Any, Ellipsis, Any], model: Model, **kwargs) None
Calls the fit function of the model on the provided dataset, weighted to account for the difference of representation of the two labels. :param predictions: one-dimensional array with the witness predictions of the test data :param labels: one-dimensional array with labels 1 and 0 indicating data coming from one sample or the other :param model: the model on which the fit function is applied.
- autotst.functions.get_default_model() Model
Returns an instance of the AutoGluonTabularPredictor, with default parameters
- autotst.functions.get_weights(label_train: NDArray[1, UInt[int, unsignedinteger]]) NDArray[1, Float]
Labels being a one-dimensional array with labels 1 and 0, returns an array of weights that gives higher values to indexes corresponding to the less represented label.
- autotst.functions.interpret(data_test: NDArray[Any, Ellipsis, Any], predictions: NDArray[1, Float], k: int = 1) Tuple[NDArray[Any, Ellipsis, Any], NDArray[Any, Ellipsis, Any]]
Returns the k most typical examples from the two distributions :param data_test: dataset with the first items corresponding to the first distribution and the last items to the second distributions :param predictions: label prediction corresponding to the dataset :param k: number of items to extract from the dataset, for each distribution :return: the k most typical examples from the two distributions
- autotst.functions.p_value(sample_p: NDArray[Any, Ellipsis, Any], sample_q: NDArray[Any, Ellipsis, Any], model: Optional[Model] = None, split_ratio: float = 0.5, permutations: int = 10000, **fit_kwargs) float
Split the datasets unto a training and a test set, fit the model using the training set and uses the test set to compute the p-value. :param sample_p: samples drawn from a first distribution :param sample_q: samples drawn from a second distribution :param model: instance of model for fitting and prediction. If None (the default): an AutoGluonTabularPredictor will be used :param split_ratio: for splitting into learning and testing sets :param permutations: number of permutations used to estimate the p value :param fit_kwargs: parameters to the model’s fit function :return: p value
- autotst.functions.p_value_evaluate(model: Model, data_test: NDArray[Any, Ellipsis, Any], labels_test: NDArray[1, UInt[int, unsignedinteger]], permutations: int = 10000) Tuple[NDArray[Any, Ellipsis, Any], float]
Apply the model to generate predictions, and uses these predictions to evaluate the p value. :param model: the model used for prediction, assumed to have been fitted :param dataset: dataset :param labels: one-dimensional array with labels 1 and 0 indicating data coming from one sample or the other :param permutations: number of permutations when estimating the p-value :return: the predictions and the p value
- autotst.functions.permutations_p_value(predictions: NDArray[1, Float], labels: NDArray[1, UInt[int, unsignedinteger]], permutations: int = 10000) float
Compute p value of the witness mean discrepancy test statistic via permutations
- Parameters
predictions – one-dimensional array with the witness predictions of the test data
labels – one-dimensional array with labels 1 and 0 indicating data coming from P or Q
permutations (int) – Number of permutations
- Returns
p value
autotst.model module
- class autotst.model.AutoGluonImagePredictor(**kwargs)
Bases:
Model
Wrapper model for the Image Classifier of the AutoGluon package. The objective is classification, and the witness function uses the predicted probabilities.
- __init__(**kwargs) None
- fit(data_train: NDArray[Any, Ellipsis, Any], label_train: NDArray[Any, Ellipsis, Any], weights: NDArray[1, Float], presets: str = 'best_quality', time_limit: int = 60, **kwargs) None
Wrapper around fit routine. :param data_train: training data - provided as a list of image paths! :param label_train: training labels :param weights: weights for the loss - will be ignored here!!! :param presets: Autogluon preset :param time_limit: time limit for train (seconds) :param kwargs: other arguments to be passed to AutoGluon’s fit routine. :return:
- predict(data_test: NDArray[Any, Ellipsis, Any]) NDArray[1, Float]
- class autotst.model.AutoGluonTabularPredictor(**kwargs)
Bases:
Model
Wrapper model for the Tabular Predictor of the AutoGluon package
- __init__(**kwargs) None
- fit(data_train: NDArray[Any, Ellipsis, Any], label_train: NDArray[Any, Ellipsis, Any], weights: NDArray[1, Float], presets: str = 'best_quality', time_limit: int = 60, verbosity: int = 0, **kwargs) None
Wrapper around fit routine. :param data_train: training data :param label_train: training labels :param weights: weights for the loss :param presets: Autogluon preset :param time_limit: time limit for train (seconds) :param verbosity: control output of Autogluon :param kwargs: other arguments to be passed to AutoGluon’s fit routine. :return:
- predict(data_test: NDArray[Any, Ellipsis, Any]) NDArray[1, Float]
autotst.splitted_sets module
- class autotst.splitted_sets.SplittedSets(training_set: NDArray[Any, Ellipsis, Any], test_set: NDArray[Any, Ellipsis, Any], training_labels: NDArray[1, UInt[int, unsignedinteger]], test_labels: NDArray[1, UInt[int, unsignedinteger]])
Bases:
object
Class encapsulating datasets and labels dividing into testing and training.
- __init__(training_set: NDArray[Any, Ellipsis, Any], test_set: NDArray[Any, Ellipsis, Any], training_labels: NDArray[1, UInt[int, unsignedinteger]], test_labels: NDArray[1, UInt[int, unsignedinteger]])
- classmethod from_samples(sample_p: NDArray[Any, Ellipsis, Any], sample_q: NDArray[Any, Ellipsis, Any], split_ratio: float = 0.5) object
Creates a labeled dataset that concatenates the samples drawn from the distributions P and Q, and splits it between a training and a testing sets. Labels are binaries with values 1 for samples drawn from P and 0 for samples drawn from Q.
- static split(X: NDArray[Any, Ellipsis, Any], Y: NDArray[Any, Ellipsis, Any], split_ratio: float) Tuple[NDArray[Any, Ellipsis, Any], NDArray[Any, Ellipsis, Any], NDArray[1, UInt[int, unsignedinteger]], NDArray[1, UInt[int, unsignedinteger]]]
Creates a labeled dataset that concatenates the samples drawn from the distributions X and Y, and splits it between a training and a testing sets. Labels are binaries with values 1 for samples drawn from P and 0 for samples drawn from Q. The returned tuples has for values: training set, testing set, labels for training set, labels for testing set.
- test_split() Tuple[int, int]
Similar to training_split, but for the testing set.
- training_split() Tuple[int, int]
Returns the number p and q of items that have been drawn respectively from the distributions P and Q in the training set. The first pth items of the trainign set correspond to P, and the following qth items correspond to Q.
autotst.test module
- class autotst.test.AutoTST(sample_p: ~nptyping.types._ndarray.NDArray[(typing.Any, Ellipsis), ~typing.Any], sample_q: ~nptyping.types._ndarray.NDArray[(typing.Any, Ellipsis), ~typing.Any], split_ratio: float = 0.5, model: ~typing.Type[~autotst.model.Model] = <class 'autotst.model.AutoGluonTabularPredictor'>, **model_kwargs)
Bases:
object
AutoML Two-Sample Test
Documentation with example of the class goes here
Constructor
- Parameters
sample_p – Sample drawn from P
sample_q – Sample drawn from Q
split_ratio – Ratio that defines how much data is used for training the witness
model – Model used to learn the witness function
**model_kwargs –
Keyword arguments to initialize the model
- Returns
None
- __init__(sample_p: ~nptyping.types._ndarray.NDArray[(typing.Any, Ellipsis), ~typing.Any], sample_q: ~nptyping.types._ndarray.NDArray[(typing.Any, Ellipsis), ~typing.Any], split_ratio: float = 0.5, model: ~typing.Type[~autotst.model.Model] = <class 'autotst.model.AutoGluonTabularPredictor'>, **model_kwargs) None
Constructor
- Parameters
sample_p – Sample drawn from P
sample_q – Sample drawn from Q
split_ratio – Ratio that defines how much data is used for training the witness
model – Model used to learn the witness function
**model_kwargs –
Keyword arguments to initialize the model
- Returns
None
- fit_witness(**kwargs) None
Fit witness
- Parameters
kwargs – Keyword arguments to be passed to fit method of model
- Returns
None
- interpret(k=1)
Return the k most typical examples from P and Q.
- Returns
Tuple: (k most significant examples from P, k most significant examples from Q)
- p_value(permutations: int = 1000, **fit_kwargs)
Run the complete pipeline and return p value with default settings.
- Returns
p-value
- p_value_evaluate(permutations: int = 10000) float
Evaluate p value.
- Parameters
permutations – number of permutations when estimating the p-value
- Returns
p value
- split_data() SplittedSets
Split & label data using the instances splitting ratio. The splits are stored as attributes but also returned.