data_management package

Submodules

data_management.data_helpers module

class data_management.data_helpers.DataSubSampler(data_dir, destination_dir, fraction, seed=None)

Bases: object

DataSubSampler class for creating a smaller dataset by randomly sampling a fraction of files from the original dataset.

Attributes:

data_dir (str): Directory containing the original dataset. destination_dir (str): Directory where the sampled dataset will be saved. fraction (float): Fraction of files to sample from the original dataset. seed (int): Seed for random number generator.

create_miniature_dataset()

Creates a copy of all folders and subfolders in the path, but only samples a fraction of files.

class data_management.data_helpers.DatasetSplitter(data_dir, destination_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, test_ratio_2=None, seed=None)

Bases: object

DatasetSplitter class for splitting a dataset into train, validation, and test sets.

Attributes:

data_dir (str): Directory containing the dataset. destination_dir (str): Directory where the train, validation, and test sets will be saved. train_ratio (float): Ratio of train set. val_ratio (float): Ratio of validation set. test_ratio (float): Ratio of test set. test_ratio_2 (float): Ratio of an additional test set, if needed. seed (int): Seed for random number generator.

run()

Main method to execute dataset splitting and copying files.

data_management.data_helpers.update_nested_dict(d, u)

Updates a nested dictionary with values from another dictionary.

Args:

d (dict): The dictionary to update. u (dict): The dictionary to use for updating.

Returns:

dict: The updated dictionary.

Module contents