tensor.opt
– Tensor Optimizations¶
Tensor optimizations addressing the ops in basic.py.
-
class
theano.tensor.opt.
Assert
(msg='Theano Assert failed!')[source]¶ Implements assertion in a computational graph.
Returns the first parameter if the condition is true, otherwise, triggers AssertionError.
Notes
This Op is a debugging feature. It can be removed from the graph because of optimizations, and can hide some possible optimizations to the optimizer. Specifically, removing happens if it can be determined that condition will always be true. Also, the output of the Op must be used in the function computing the graph, but it doesn’t have to be returned.
Examples
>>> import theano >>> T = theano.tensor >>> x = T.vector('x') >>> assert_op = T.opt.Assert() >>> func = theano.function([x], assert_op(x, x.size<2))
-
c_code
(node, name, inames, onames, props)[source]¶ Return the C implementation of an Op.
Returns C code that does the computation associated to this Op, given names for the inputs and outputs.
Parameters: - node (Apply instance) – The node for which we are compiling the current c_code. The same Op may be used in more than one node.
- name (str) – A name that is automatically assigned and guaranteed to be unique.
- inputs (list of strings) – There is a string for each input of the function, and the string is the name of a C variable pointing to that input. The type of the variable depends on the declared type of the input. There is a corresponding python variable that can be accessed by prepending “py_” to the name in the list.
- outputs (list of strings) – Each string is the name of a C variable where the Op should store its output. The type depends on the declared type of the output. There is a corresponding python variable that can be accessed by prepending “py_” to the name in the list. In some cases the outputs will be preallocated and the value of the variable may be pre-filled. The value for an unallocated output is type-dependent.
- sub (dict of strings) – Extra symbols defined in CLinker sub symbols (such as ‘fail’). WRITEME
-
c_code_cache_version
()[source]¶ Return a tuple of integers indicating the version of this Op.
An empty tuple indicates an ‘unversioned’ Op that will not be cached between processes.
The cache mechanism may erase cached modules that have been superceded by newer versions. See ModuleCache for details.
See also
c_code_cache_version_apply
-
grad
(input, output_gradients)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned Variable represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of type NullType for that input.
Parameters: - inputs (list of Variable) – The input variables.
- output_grads (list of Variable) – The gradients of the output variables.
Returns: grads – The gradients with respect to each Variable in inputs.
Return type: list of Variable
-
make_node
(value, *conds)[source]¶ Construct an Apply node that represent the application of this operation to the given inputs.
This must be implemented by sub-classes.
Returns: node – The constructed Apply node. Return type: Apply
-
perform
(node, inputs, out_)[source]¶ Calculate the function on the inputs and put the variables in the output storage.
Parameters: - node (Apply) – The symbolic Apply node that represents this computation.
- inputs (Sequence) – Immutable sequence of non-symbolic/numeric inputs. These are the values of each Variable in node.inputs.
- output_storage (list of list) – List of mutable single-element lists (do not change the length of these lists). Each sub-list corresponds to value of each Variable in node.outputs. The primary purpose of this method is to set the values of these sub-lists.
- params (tuple) – A tuple containing the values of each entry in __props__.
Notes
The output_storage list might contain data. If an element of output_storage is not None, it has to be of the right type, for instance, for a TensorVariable, it has to be a NumPy ndarray with the right number of dimensions and the correct dtype. Its shape and stride pattern can be arbitrary. It is not guaranteed that such pre-set values were produced by a previous call to this Op.perform; they could’ve been allocated by another Op’s perform method. A Op is free to reuse output_storage as it sees fit, or to discard it and allocate new memory.
-
-
class
theano.tensor.opt.
Canonizer
(main, inverse, reciprocal, calculate, use_reciprocal=True)[source]¶ Simplification tool. The variable is a local_optimizer. It is best used with a TopoOptimizer in in_to_out order.
Usage: Canonizer(main, inverse, reciprocal, calculate)
Parameters: - main – A suitable Op class that is commutative, associative and takes one to an arbitrary number of inputs, e.g. add or mul
- inverse – An Op class such that inverse(main(x, y), y) == x e.g. sub or true_div
- reciprocal – A function such that main(x, reciprocal(y)) == inverse(x, y) e.g. neg or inv
- calculate – Function that takes a list of numpy.ndarray instances for the numerator, another list for the denumerator, and calculates inverse(main(*num), main(*denum)). It takes a keyword argument, aslist. If True, the value should be returned as a list of one element, unless the value is such that value = main(). In that case, the return value should be an empty list.
Examples
>>> import theano.tensor as tt >>> from theano.tensor.opt import Canonizer >>> add_canonizer = Canonizer(add, sub, neg, \\ ... lambda n, d: sum(n) - sum(d)) >>> mul_canonizer = Canonizer(mul, true_div, inv, \\ ... lambda n, d: prod(n) / prod(d))
Examples of optimizations mul_canonizer can perform:
x / x -> 1(x * y) / x -> yx / y / x -> 1 / yx / y / z -> x / (y * z)x / (y / z) -> (x * z) / y(a / b) * (b / c) * (c / d) -> a / d(2.0 * x) / (4.0 * y) -> (0.5 * x) / y2 * x / 2 -> xx * y * z -> Elemwise(mul){x,y,z} #only one pass over the memory.!-> Elemwise(mul){x,Elemwise(mul){y,z}}-
static
get_constant
(v)[source]¶ Returns: A numeric constant if v is a Constant or, well, a numeric constant. If v is a plain Variable, returns None. Return type: object
-
get_num_denum
(input)[source]¶ This extract two lists, num and denum, such that the input is: self.inverse(self.main(*num), self.main(*denum)). It returns the two lists in a (num, denum) pair.
For example, for main, inverse and reciprocal = *, / and inv(),
input -> returned value (num, denum)x*y -> ([x, y], [])inv(x) -> ([], [x])inv(x) * inv(y) -> ([], [x, y])x*y/z -> ([x, y], [z])log(x) / y * (z + x) / y -> ([log(x), z + x], [y, y])(((a / b) * c) / d) -> ([a, c], [b, d])a / (b / c) -> ([a, c], [b])log(x) -> ([log(x)], [])x**y -> ([x**y], [])x * y * z -> ([x, y, z], [])
-
merge_num_denum
(num, denum)[source]¶ Utility function which takes two lists, num and denum, and returns something which is equivalent to inverse(main(*num), main(*denum)), but depends on the length of num and the length of denum (in order to minimize the number of operations).
Let n = len(num) and d = len(denum):
n=0, d=0: neutral element (given by self.calculate([], []))(for example, this would be 0 if main is additionand 1 if main is multiplication)n=1, d=0: num[0]n=0, d=1: reciprocal(denum[0])n=1, d=1: inverse(num[0], denum[0])n=0, d>1: reciprocal(main(*denum))n>1, d=0: main(*num)n=1, d>1: inverse(num[0], main(*denum))n>1, d=1: inverse(main(*num), denum[0])n>1, d>1: inverse(main(*num), main(*denum))Given the values of n and d to which they are associated, all of the above are equivalent to: inverse(main(*num), main(*denum))
-
simplify
(num, denum, out_type)[source]¶ Shorthand for:
self.simplify_constants(*self.simplify_factors(num, denum))
-
simplify_constants
(orig_num, orig_denum, out_type=None)[source]¶ Find all constants and put them together into a single constant.
Finds all constants in orig_num and orig_denum (using get_constant) and puts them together into a single constant. The constant is inserted as the first element of the numerator. If the constant is the neutral element, it is removed from the numerator.
Examples
Let main be multiplication:
[2, 3, x], [] -> [6, x], [][x, y, 2], [4, z] -> [0.5, x, y], [z][x, 2, y], [z, 2] -> [x, y], [z]
-
simplify_factors
(num, denum)[source]¶ For any Variable r which is both in num and denum, removes it from both lists. Modifies the lists inplace. Returns the modified lists. For example:
[x], [x] -> [], [][x, y], [x] -> [y], [][a, b], [c, d] -> [a, b], [c, d]
-
tracks
()[source]¶ Return the list of op classes that this opt applies to.
Return None to apply to all nodes.
-
transform
(fgraph, node)[source]¶ Transform a subgraph whose output is node.
Subclasses should implement this function so that it returns one of two kinds of things:
- False to indicate that no optimization can be applied to this node; or
- <list of variables> to use in place of node’s outputs in the greater graph.
- dict(old variables -> new variables). A dictionary that map from old variables to new variables to replace.
Parameters: node (an Apply instance) –
-
class
theano.tensor.opt.
FusionOptimizer
(local_optimizer)[source]¶ Graph optimizer that simply runs local fusion operations.
TODO: This is basically a EquilibriumOptimizer; we should just use that.
-
class
theano.tensor.opt.
InplaceElemwiseOptimizer
(OP)[source]¶ We parametrise it to make it work for Elemwise and GpuElemwise op.
-
add_requirements
(fgraph)[source]¶ Add features to the fgraph that are required to apply the optimization. For example:
fgraph.attach_feature(History()) fgraph.attach_feature(MyFeature()) etc.
-
apply
(fgraph)[source]¶ Usage: InplaceElemwiseOptimizer(op).optimize(fgraph)
Attempts to replace all Broadcast ops by versions of them that operate inplace. It operates greedily: for each Broadcast Op that is encountered, for each output, tries each input to see if it can operate inplace on that input. If so, makes the change and go to the next output or Broadcast Op.
Examples
x + y + z -> x += y += z
(x + y) * (x * y) -> (x += y) *= (x * y) or (x + y) *= (x *= y)
-
-
class
theano.tensor.opt.
MakeVector
(dtype='int64')[source]¶ Concatenate a number of scalars together into a vector.
This is a simple version of stack() that introduces far less cruft into the graph. Should work with 0 inputs. The constant_folding optimization will remove it.
-
R_op
(inputs, eval_points)[source]¶ Construct a graph for the R-operator.
This method is primarily used by tensor.Rop
Suppose the op outputs
[ f_1(inputs), …, f_n(inputs) ]
Parameters: - inputs (a Variable or list of Variables) –
- eval_points – A Variable or list of Variables with the same length as inputs. Each element of eval_points specifies the value of the corresponding input at the point where the R op is to be evaluated.
Returns: - rval[i] should be Rop(f=f_i(inputs),
wrt=inputs, eval_points=eval_points)
Return type: list of n elements
-
c_code
(node, name, inp, out_, props)[source]¶ Return the C implementation of an Op.
Returns C code that does the computation associated to this Op, given names for the inputs and outputs.
Parameters: - node (Apply instance) – The node for which we are compiling the current c_code. The same Op may be used in more than one node.
- name (str) – A name that is automatically assigned and guaranteed to be unique.
- inputs (list of strings) – There is a string for each input of the function, and the string is the name of a C variable pointing to that input. The type of the variable depends on the declared type of the input. There is a corresponding python variable that can be accessed by prepending “py_” to the name in the list.
- outputs (list of strings) – Each string is the name of a C variable where the Op should store its output. The type depends on the declared type of the output. There is a corresponding python variable that can be accessed by prepending “py_” to the name in the list. In some cases the outputs will be preallocated and the value of the variable may be pre-filled. The value for an unallocated output is type-dependent.
- sub (dict of strings) – Extra symbols defined in CLinker sub symbols (such as ‘fail’). WRITEME
-
c_code_cache_version
()[source]¶ Return a tuple of integers indicating the version of this Op.
An empty tuple indicates an ‘unversioned’ Op that will not be cached between processes.
The cache mechanism may erase cached modules that have been superceded by newer versions. See ModuleCache for details.
See also
c_code_cache_version_apply
-
grad
(inputs, output_gradients)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned Variable represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of type NullType for that input.
Parameters: - inputs (list of Variable) – The input variables.
- output_grads (list of Variable) – The gradients of the output variables.
Returns: grads – The gradients with respect to each Variable in inputs.
Return type: list of Variable
-
make_node
(*inputs)[source]¶ Construct an Apply node that represent the application of this operation to the given inputs.
This must be implemented by sub-classes.
Returns: node – The constructed Apply node. Return type: Apply
-
perform
(node, inputs, out_)[source]¶ Calculate the function on the inputs and put the variables in the output storage.
Parameters: - node (Apply) – The symbolic Apply node that represents this computation.
- inputs (Sequence) – Immutable sequence of non-symbolic/numeric inputs. These are the values of each Variable in node.inputs.
- output_storage (list of list) – List of mutable single-element lists (do not change the length of these lists). Each sub-list corresponds to value of each Variable in node.outputs. The primary purpose of this method is to set the values of these sub-lists.
- params (tuple) – A tuple containing the values of each entry in __props__.
Notes
The output_storage list might contain data. If an element of output_storage is not None, it has to be of the right type, for instance, for a TensorVariable, it has to be a NumPy ndarray with the right number of dimensions and the correct dtype. Its shape and stride pattern can be arbitrary. It is not guaranteed that such pre-set values were produced by a previous call to this Op.perform; they could’ve been allocated by another Op’s perform method. A Op is free to reuse output_storage as it sees fit, or to discard it and allocate new memory.
-
-
class
theano.tensor.opt.
ShapeFeature
[source]¶ Graph optimizer for removing all calls to shape().
This optimizer replaces all Shapes and Subtensors of Shapes with Shape_i and MakeVector Ops.
This optimizer has several goals:
- to ‘lift’ Shapes to as close to the inputs as possible.
- to infer the shape of every node in the graph in terms of the input shapes.
- remove all fills (T.second, T.fill) from the graph
Lifting shapes as close to the inputs as possible is important for canonicalization because it is very bad form to have to compute something just to know how big it will be. Firstly, it is a waste of time to compute such outputs. But it is important to get rid of these outputs as early as possible in the compilation process because the extra computations make it appear as if many internal graph nodes have multiple clients. Many optimizations refuse to work on nodes with multiple clients.
Lifting is done by using an <Op>.infer_shape function if one is present, or else using a conservative default. An Op that supports shape-lifting should define a infer_shape(self, fgraph, node, input_shapes) function. The argument input_shapes is a tuple of tuples… there is an interior tuple for each input to the node. The tuple has as many elements as dimensions. The element in position i of tuple j represents the i’th shape component of the j’th input. The function should return a tuple of tuples. One output tuple for each node.output. Again, the i’th element of the j’th output tuple represents the output[j].shape[i] of the function. If an output is not a TensorType, then None should be returned instead of a tuple for that output.
For example the infer_shape for a matrix-matrix product would accept input_shapes=((x0,x1), (y0,y1)) and return ((x0, y1),).
Inferring the shape of internal nodes in the graph is important for doing size-driven optimizations. If we know how big various intermediate results will be, we can estimate the cost of many Ops accurately, and generate c-code that is specific [e.g. unrolled] to particular sizes.
In cases where you cannot figure out the shape, raise a ShapeError.
Notes
Right now there is only the ConvOp that could really take advantage of this shape inference, but it is worth it even just for the ConvOp. All that’s necessary to do shape inference is 1) to mark shared inputs as having a particular shape, either via a .tag or some similar hacking; and 2) to add an optional In() argument to promise that inputs will have a certain shape (or even to have certain shapes in certain dimensions). We can’t automatically infer the shape of shared variables as they can change of shape during the execution by default. (NOT IMPLEMENTED YET, BUT IS IN TRAC)
Using Shape information in Optimizations
To use this shape information in OPTIMIZATIONS, use the
shape_of
dictionary.For example:
try: shape_of = fgraph.shape_feature.shape_of except AttributeError: # This can happen when the mode doesn't include the ShapeFeature. return shape_of_output_zero = shape_of[node.output[0]]
The
shape_of_output_zero
symbol will contain a tuple, whose elements are either integers or symbolic integers.TODO: check to see if the symbols are necessarily non-constant… or are integer literals sometimes Theano constants?? That would be confusing.
-
default_infer_shape
(fgraph, node, i_shapes)[source]¶ Return a list of shape tuple or None for the outputs of node.
This function is used for Ops that don’t implement infer_shape. Ops that do implement infer_shape should use the i_shapes parameter, but this default implementation ignores it.
-
get_shape
(var, idx)[source]¶ Optimization can call this to get the current shape_i
It is better to call this then use directly shape_of[var][idx] as this method should update shape_of if needed.
TODO: Up to now, we don’t update it in all cases. Update in all cases.
-
on_attach
(fgraph)[source]¶ Called by FunctionGraph.attach_feature, the method that attaches the feature to the FunctionGraph. Since this is called after the FunctionGraph is initially populated, this is where you should run checks on the initial contents of the FunctionGraph.
The on_attach method may raise the AlreadyThere exception to cancel the attach operation if it detects that another Feature instance implementing the same functionality is already attached to the FunctionGraph.
The feature has great freedom in what it can do with the fgraph: it may, for example, add methods to it dynamically.
-
on_change_input
(fgraph, node, i, r, new_r, reason)[source]¶ Called whenever
node.inputs[i]
is changed from var to new_var. At the moment the callback is done, the change has already taken place.If you raise an exception in this function, the state of the graph might be broken for all intents and purposes.
-
on_detach
(fgraph)[source]¶ Called by FunctionGraph.remove_feature. Should remove any dynamically-added functionality that it installed into the fgraph.
-
on_import
(fgraph, node, reason)[source]¶ Called whenever a node is imported into fgraph, which is just before the node is actually connected to the graph.
Note: this is not called when the graph is created. If you want to detect the first nodes to be implemented to the graph, you should do this by implementing on_attach.
-
same_shape
(x, y, dim_x=None, dim_y=None)[source]¶ Return True if we are able to assert that x and y have the same shape.
dim_x and dim_y are optional. If used, they should be an index to compare only 1 dimension of x and y.
-
set_shape
(r, s, override=False)[source]¶ Assign the shape s to previously un-shaped variable r.
Parameters: - r (a variable) –
- s (None or a tuple of symbolic integers) –
- override (If False, it mean r is a new object in the fgraph.) – If True, it mean r is already in the fgraph and we want to override its shape.
-
class
theano.tensor.opt.
ShapeOptimizer
[source]¶ Optimizer that serves to add ShapeFeature as an fgraph feature.
-
class
theano.tensor.opt.
UnShapeOptimizer
[source]¶ Optimizer remove ShapeFeature as an fgraph feature.
-
theano.tensor.opt.
apply_rebroadcast_opt
(rval)[source]¶ Apply as many times as required the optimization local_useless_rebroadcast and local_rebroadcast_lift.
Parameters: rval (a Variable) – Return type: A Variable (the same if no optimization can be applied)
-
theano.tensor.opt.
attempt_distribution
(factor, num, denum, out_type)[source]¶ Try to insert each num and each denum in the factor?
Returns: If there are changes, new_num and new_denum contain all the numerators and denominators that could not be distributed in the factor Return type: changes?, new_factor, new_num, new_denum
-
theano.tensor.opt.
broadcast_like
(value, template, fgraph, dtype=None)[source]¶ Return a Variable with the same shape and dtype as the template, filled by broadcasting value through it. value will be cast as necessary.
-
theano.tensor.opt.
check_for_x_over_absX
(numerators, denominators)[source]¶ Convert x/abs(x) into sign(x).
-
theano.tensor.opt.
encompasses_broadcastable
(b1, b2)[source]¶ Parameters: - b1 – The broadcastable attribute of a tensor type.
- b2 – The broadcastable attribute of a tensor type.
Returns: True if the broadcastable patterns b1 and b2 are such that b2 is broadcasted to b1’s shape and not the opposite.
Return type: bool
-
theano.tensor.opt.
get_clients
(fgraph, node)[source]¶ Used by erf/erfc opt to track less frequent op.
-
theano.tensor.opt.
get_clients2
(fgraph, node)[source]¶ Used by erf/erfc opt to track less frequent op.
-
theano.tensor.opt.
is_an_upcast
(type1, type2)[source]¶ Given two data types (as strings), check if converting to type2 from type1 constitutes an upcast. Differs from theano.scalar.upcast
-
theano.tensor.opt.
is_inverse_pair
(node_op, prev_op, inv_pair)[source]¶ Given two consecutive operations, check if they are the provided pair of inverse functions.
-
theano.tensor.opt.
local_add_mul_fusion
(fgraph, node)[source]¶ Fuse consecutive add or mul in one such node with more inputs.
It is better to fuse add/mul that way then in a Composite node as this make the inner graph of the Composite smaller. This allow to put more computation in a Composite before hitting the max recusion limit when pickling Composite.
-
theano.tensor.opt.
local_elemwise_fusion
(fgraph, node)[source]¶ Fuse Elemwise `Op`s in a node.
As part of specialization, we fuse two consecutive elemwise `Op`s of the same shape.
For mixed dtype, we let the Composite Op do the cast. It lets the C compiler do the cast.
The number of dimensions is validated at call time by Theano itself.
-
theano.tensor.opt.
local_elemwise_fusion_op
(op_class, max_input_fct=<function <lambda>>, maker=None)[source]¶ Create a recursive function that fuses Elemwise `Op`s.
The basic idea is that we loop through an Elemwise node’s inputs, find other Elemwise nodes, determine the scalars input types for all of the Elemwise Op`s, construct a new scalar `Op using the scalar input types and each Elemwise’s scalar Op, and use the composite scalar Op in a new “fused” Elemwise.
It’s parameterized in order to work for Elemwise and GpuElemwise `Op`s.
Parameters: - op_class (type) – GpuElemwise or Elemwise class (the one that we want to fuse)
- max_input_fct (callable) –
A function that returns the maximum number of inputs that this Elemwise can take (useful for GpuElemwise). The GPU kernel currently has a limit of 256 bytes for the size of all parameters passed to it. As currently we pass a lot of information only by parameter, we must limit how many `Op`s we fuse together to avoid busting that 256 limit.
On the CPU we limit to 32 input variables since that is the maximum NumPy support.
- maker (callable) – A function with the signature (node, *args) that constructs an op_class instance (e.g. op_class(*args)).
-
theano.tensor.opt.
merge_two_slices
(fgraph, slice1, len1, slice2, len2)[source]¶ This function merges two slices into a single slice. The code works on the assumption that:
- slice1 is actually a slice and not an index, while slice2 can be just an index.
- the two slices have been applied consecutively on the same tensor
The output slice is not in canonical form, but actually just a slice that can be applied to a tensor to produce the same output as applying the two consecutive slices.
len1
is the length of the tensor before applying the first slice, whilelen2
is the length after applying the first slice.