data_management package
Submodules
data_management.data_helpers module
- class data_management.data_helpers.DataSubSampler(data_dir, destination_dir, fraction, seed=None)
Bases:
object
DataSubSampler class for creating a smaller dataset by randomly sampling a fraction of files from the original dataset.
- Attributes:
data_dir (str): Directory containing the original dataset. destination_dir (str): Directory where the sampled dataset will be saved. fraction (float): Fraction of files to sample from the original dataset. seed (int): Seed for random number generator.
- create_miniature_dataset()
Creates a copy of all folders and subfolders in the path, but only samples a fraction of files.
- class data_management.data_helpers.DatasetSplitter(data_dir, destination_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, test_ratio_2=None, seed=None)
Bases:
object
DatasetSplitter class for splitting a dataset into train, validation, and test sets.
- Attributes:
data_dir (str): Directory containing the dataset. destination_dir (str): Directory where the train, validation, and test sets will be saved. train_ratio (float): Ratio of train set. val_ratio (float): Ratio of validation set. test_ratio (float): Ratio of test set. test_ratio_2 (float): Ratio of an additional test set, if needed. seed (int): Seed for random number generator.
- run()
Main method to execute dataset splitting and copying files.
- data_management.data_helpers.update_nested_dict(d, u)
Updates a nested dictionary with values from another dictionary.
- Args:
d (dict): The dictionary to update. u (dict): The dictionary to use for updating.
- Returns:
dict: The updated dictionary.