torchtable.field module¶

Module contents¶

class torchtable.field.Field(pipeline: torchtable.operator.core.Operator, name: Optional[str] = None, is_target: bool = False, continuous: bool = True, categorical: bool = False, cardinality: Optional[int] = None, batch_pipeline: Optional[torchtable.operator.core.Operator] = None, dtype: Optional[torch.dtype] = None, metadata: dict = {})¶

Bases: object

A single field in the output mini batch. A Field acts as a continaer for all relevant information regarding an output in the output mini batch. Primarily, it stores a pipeline to apply to a column/set of columns in the input. It also stores a pipeline for converting the input batch to an appropriate type for the downstream model (generally a torch.tensor). This class can directly be instantiated with a custom pipeline but is generally used as a subclass for other fields.

Example

>>> fld = Field(LambdaOperator(lambda x: x + 1) > LambdaOperator(lambda x: x ** 2))
>>> fld.transform(1)
... 9

Parameters:

pipeline – An operator representing the set of operations mapping the input column to the output. This transformation will be applied during the construction of the dataset. If the pipeline is resource intensive and applying it all at once is unrealistic, consider deferring some of the processing to batch_pipeline.
is_target – Whether the field is an input or target field. Affects default batching behavior.
continuous – Whether the output is continuous.
categorical – Whether the output is categorical/discrete.
batch_pipeline – The transformation to apply to this field during batching. By default, this will simply be an operation to transform the input to a tensor to feed to the model. This can be set to any Operator that the user wishes so that arbitrary transformations (e.g. padding, noising) can be applied during data loading.
dtype – The output tensor dtype. Only relevant when batch_pipeline is None (using the default pipeline).
metadata – Additional data about the field to store. Use cases include adding data about model parameters (e.g. size of embeddings for this field).

cardinality¶: Relevant for categorical data. For custom fields, the cardinality must be passed explicity.

index(example: Union[pandas.core.series.Series, numpy.core.multiarray.array], idx) → Union[pandas.core.series.Series, numpy.core.multiarray.array]¶: Wrapper for indexing. The field must provide the ability to index via a list for batching later on.

transform(x: pandas.core.series.Series, train=True) → Union[pandas.core.series.Series, numpy.core.multiarray.array]¶

Method to process the input column during construction of the dataset. Kwargs:

train: If true, this transformation may change some internal parameters of the pipeline.

For instance, if there is a normalization step in the pipeline, the mean and std will be computed on the current input. Otherwise, the pipeline will use statistics computed in the past.

transform_batch(x: Union[pandas.core.series.Series, numpy.core.multiarray.array], device: Optional[torch.device] = None, train: bool = True) → None._VariableFunctions.tensor¶: Method to process batch input during loading of the dataset.

class torchtable.field.IdentityField(name=None, is_target=False, continuous=True, categorical=False, metadata={})¶

Bases: torchtable.field.core.Field

A field that does not modify the input.

class torchtable.field.NumericField(name=None, fill_missing='median', normalization='Gaussian', is_target=False, metadata={})¶

Bases: torchtable.field.core.Field

A field corresponding to a continous, numerical output (e.g. price, distance, etc.)

Parameters:	fill_missing – The method of filling missing values. See the FillMissing operator for details. normalization – The method of normalization. See the Normalize operator for details.

class torchtable.field.CategoricalField(name=None, min_freq=0, max_features=None, handle_unk=None, is_target=False, metadata: dict = {})¶

Bases: torchtable.field.core.Field

A field corresponding to a categorica, discrete output (e.g. id, group, gender)

Parameters:	the Categorize operator for more details. (See) –

cardinality¶: The number of unique outputs.

transform(x: pandas.core.series.Series, train=True) → Union[pandas.core.series.Series, numpy.core.multiarray.array]¶

Method to process the input column during construction of the dataset. Kwargs:

train: If true, this transformation may change some internal parameters of the pipeline.

For instance, if there is a normalization step in the pipeline, the mean and std will be computed on the current input. Otherwise, the pipeline will use statistics computed in the past.

class torchtable.field.DatetimeFeatureField(func: Callable[pandas.core.series.Series, pandas.core.series.Series], fill_missing: Optional[str] = None, name=None, is_target=False, continuous=False, metadata: dict = {})¶

Bases: torchtable.field.core.Field

A generic field for constructing features from datetime columns. :param func: Feature construction function

class torchtable.field.DayofWeekField(**kwargs)¶: Bases: torchtable.field.datetime.DatetimeFeatureField

class torchtable.field.DayField(**kwargs)¶: Bases: torchtable.field.datetime.DatetimeFeatureField

class torchtable.field.MonthStartField(**kwargs)¶: Bases: torchtable.field.datetime.DatetimeFeatureField

class torchtable.field.MonthEndField(**kwargs)¶: Bases: torchtable.field.datetime.DatetimeFeatureField

class torchtable.field.HourField(**kwargs)¶: Bases: torchtable.field.datetime.DatetimeFeatureField

torchtable.field.date_fields(**kwargs) → List[torchtable.field.datetime.DatetimeFeatureField]¶: The default set of fields for feature engineering using a field with date information

torchtable.field.datetime_fields(**kwargs) → List[torchtable.field.datetime.DatetimeFeatureField]¶: The default set of fields for feature engineering using a field with date and time information

class torchtable.field.FieldCollection(*args, flatten: bool = False, namespace: Optional[str] = None)¶

Bases: list

A list of fields with some auxillary methods.

Parameters:	flatten – If set to True, each field in this collection will be mapped to one key in the batch/dataset. Otherwise, each field in this collection will be mapped to an entry in a list for the same key in the batch/dataset.

index(value[, start[, stop]]) → integer -- return first index of value.¶: Raises ValueError if the value is not present.

name¶

set_namespace(nm: str) → None¶: Set names of inner fields as well

transform(*args, **kwargs) → list¶: Applies transform with each field and returns a list