torchtable.operator module

Module contents

class torchtable.operator.Operator

Bases: object

Base class for all operators. Operators can be chained together by piping their outputs to new operators or hooking operators to other operators. Any number of operators can be chained to become a pipeline, which is itself just another operator. Subclasses should implement the apply method that defines the operation performed by the operator.

Example

>>> class TimesThree(Operator):
...     def apply(self, x):
...         return x * 3
>>> op = TimeThree()
>>> op(4) # 4 * 3 = 12
... 12
>>> class Square(Operator):
...     def apply(self, x):
            return x ** 2
>>> op = TimesThree() > Square()
>>> op(2) # (2 * 3) ** 2 = 36
... 36
apply(x: Any, train=True) → Any

Takes output of previous stage in the pipeline and produces output. Override in subclasses.

Parameters:
  • train – If true, this operator will “train” on the input.
  • other words, the internal parameters of this operator may change to fit the given input. (In) –
hook(op: torchtable.operator.core.Operator) → torchtable.operator.core.Operator

Connect an operator to the beginning of this pipeline. Returns self.

pipe(op: torchtable.operator.core.Operator) → torchtable.operator.core.Operator

Connect an operator after this operator. Returns the connected operator.

class torchtable.operator.LambdaOperator(func: Callable[T, T])

Bases: torchtable.operator.core.Operator

Generic operator for stateless operation.

Parameters:func – Function to apply to input.
apply(x: T, train=True) → Any

Takes output of previous stage in the pipeline and produces output. Override in subclasses.

Parameters:
  • train – If true, this operator will “train” on the input.
  • other words, the internal parameters of this operator may change to fit the given input. (In) –
class torchtable.operator.TransformerOperator(transformer)

Bases: torchtable.operator.core.Operator

Wrapper for any stateful transformer with fit and transform methods.

Parameters:transformer – Any object with a fit and transform method.

Example

>>> op = TransformerOperator(sklearn.preprocessing.StandardScaler())
apply(x: Any, train=True)

Takes output of previous stage in the pipeline and produces output. Override in subclasses.

Parameters:
  • train – If true, this operator will “train” on the input.
  • other words, the internal parameters of this operator may change to fit the given input. (In) –
build(x: Any) → None
class torchtable.operator.Normalize(method: Optional[str])

Bases: torchtable.operator.core.TransformerOperator

Normalizes a numeric field.

Parameters:
  • method – Method of normalization (choose from the following):
  • None (-) – No normalization will be applied (same as noop)
  • 'Gaussian' (-) – Subtracts mean and divides by the standard deviation
  • 'RankGaussian' (-) – Assigns elements to a Gaussian distribution based on their rank.
class torchtable.operator.FillMissing(method: Union[Callable, str])

Bases: torchtable.operator.core.TransformerOperator

Fills missing values according to method

Parameters:
  • method – Method of filling missing values. Options:
  • None (-) – Do not fill missing values
  • 'median' (-) – Fill with median
  • 'mean' (-) – Fill with mean
  • 'mode' (-) – Fill with mode. Effective for categorical fields.
  • - (any callable) – The output of the callable will be used to fill the missing values
class torchtable.operator.Vocab(min_freq=0, max_features=None, handle_unk: Optional[bool] = False, nan_as_unk=False)

Bases: object

Mapping from category to integer id

fit(x: pandas.core.series.Series) → torchtable.operator.core.Vocab

Construct the mapping

transform(x: pandas.core.series.Series) → pandas.core.series.Series
class torchtable.operator.Categorize(min_freq: int = 0, max_features: Optional[int] = None, handle_unk: Optional[bool] = None)

Bases: torchtable.operator.core.TransformerOperator

Converts categorical data into integer ids

Parameters:
  • min_freq – Minimum frequency required for a category to receive a unique id. Any categories with a lower frequency will be treated as unknown categories.
  • max_features – Maximum number of unique categories to store. If larger than the number of actual categories, the categories with the highest frequencies will be chosen. If None, there will be no limit on the number of categories.
  • handle_unk – Whether to allocate a unique id to unknown categories. If you expect to see categories that you did not encounter in your training data, you should set this to True. If None, handle_unk will be set to True if min_freq > 0 or max_features is not None, otherwise it will be False.
vocab_size
class torchtable.operator.ToTensor(dtype: torch.dtype)

Bases: torchtable.operator.core.Operator

Convert input to a torch.tensor

Parameters:dtype – The dtype of the output tensor
apply(x: Union[pandas.core.series.Series, numpy.core.multiarray.array], device: Optional[torch.device] = None, train=True) → None._VariableFunctions.tensor

Takes output of previous stage in the pipeline and produces output. Override in subclasses.

Parameters:
  • train – If true, this operator will “train” on the input.
  • other words, the internal parameters of this operator may change to fit the given input. (In) –
exception torchtable.operator.UnknownCategoryError

Bases: ValueError