modelforge.dataset.utils

Utility functions for dataset handling.

Functions

calculate_mean_and_variance(torch_dataset[, ...])

Calculates the mean and variance of the dataset.

calculate_size_of_splits(total_size, split_frac)

Calculate the size of each split based on the total size and the split ratios.

normalize_energies(dataset, stats)

Normalizes the energies in the dataset.

random_record_split(dataset, lengths[, ...])

Randomly split a TorchDataset into non-overlapping new datasets of given lengths, keeping all conformers in a record in the same split

two_stage_random_split(dataset_size, ...)

Perform a two-stage random split of a dataset.

Classes

FirstComeFirstServeSplittingStrategy([split])

Strategy to split a dataset based on idx.

RandomRecordSplittingStrategy([seed, split, ...])

Strategy to split a dataset randomly, keeping all configurations in a record in the same split.

RandomSplittingStrategy([seed, split, test_seed])

Strategy to split a dataset randomly.

SplittingStrategy(split[, seed, test_seed])

Base class for dataset splitting strategies.