Understanding Tensors in AI: A Journey Through Physics and Mathematics
When exploring the world of Artificial Intelligence, one frequently encounters the term tensor. In practice, a tensor is often treated as just a multi-dimensional array — a generalization of scalars, vectors, and matrices. However, this modern usage is a far cry from its deeper mathematical and physical origins. To understand what a tensor really is, it’s helpful to trace its meaning through physics and mathematics before returning to AI.
1. Tensors in Physics: The Elegant Language of Nature
In physics, tensors offer a powerful and compact way to express the laws of nature. Historically, tensors were introduced in the context of general relativity, where expressing physical laws in a coordinate-independent way is essential.
A common (but misleading) pedagogical shortcut is to define a tensor as a matrix whose components transform in a certain way under a change of basis — distinguishing between covariant and contravariant components. While technically correct, this definition often hides the elegance and generality of tensors.
A more conceptual and satisfying definition starts from the viewpoint of multilinear algebra. A tensor is a multilinear map: for example, a function T: V × V → ℝ or T: V × V* → ℝ that is linear in each argument. Here, V is a vector space and V* its dual.
Under a change of basis, both vectors and dual vectors (linear forms) transform, and this transformation gives rise to what we call covariant and contravariant components. Suppose we have a vector A = a·e₁ + b·e₂, and we change the basis (e₁, e₂) to (e′₁, e′₂), where, for instance, e′₁ = ½(e₁ + e₂). In the new basis, the coordinates of A change in a way that compensates for the transformation of the basis. This reversal is what we call contravariant behavior. If a quantity transforms in the same way as the basis, we call it covariant. A true tensor, however, is independent of the choice of basis; it is only when we express it in coordinates that we must distinguish between covariant and contravariant components. The tensor itself transcends this distinction.
2. Tensors in Mathematics: Abstraction and Generalization
Mathematically, tensors are generalized as elements of tensor products of vector spaces and their duals. For example, we can define a tensor of type (r, s) as a multilinear map: T: V × ... × V (r times) × V* × ... × V* (s times) → ℝ
This means the tensor takes r vectors and s covectors (from the dual space V*) and returns a scalar. It is linear in each input.
And this construction can be generalized even further. A tensor can be defined over multiple vector spaces and their duals, and its image is not necessarily ℝ — it can lie in any vector space W. For instance, one can define a multilinear map T : V₁ × V₂* × V₃ → W, where V₁, V₂, V₃ and W are all vector spaces.
At this level of abstraction, we often lose the concrete geometric or physical intuition behind tensors. While the definition becomes more powerful and general, it also becomes more formal and abstract.
3. Tensors in AI: A Practical Tool with Simplified Semantics
In Artificial Intelligence, especially in frameworks like TensorFlow or PyTorch, the word tensor is used in a much more pragmatic way. A tensor is essentially a multi-dimensional array (e.g., a scalar is a 0D tensor, a vector is a 1D tensor, a matrix is 2D, and so on).
Although these structures are inspired by the mathematical notion of tensors, in AI, we almost never consider basis transformations. The focus is on efficient numerical operations on large arrays of data: broadcasting, reshaping, matrix multiplication, etc. In this context, the “tensor” is more a data structure than a coordinate-independent object.
Nonetheless, the use of the term still honors the original idea — tensors represent structured data, often resulting from linear operations.
Conclusion: From Theory to Practice
The concept of a tensor has evolved from an elegant tool to express the laws of the universe, to a powerful abstract object in mathematics, to a practical data structure in AI. While the AI usage does not involve basis changes or coordinate transformations, it still preserves the core idea: representing structured, multi-dimensional data and applying linear operations to it.
In deep learning, tensors are used to store and process various types of data:
- Images, as 3D tensors (height × width × channels),
- Text embeddings, as sequences of word vectors in high-dimensional spaces,
- Time series, where each time step can be a multi-feature vector,
- Neural network weights, organized as tensors of varying dimensions depending on layer types,
- Convolution operations, which are essentially tensor contractions across dimensions.
These applications rely heavily on the same linearity and multi-dimensionality that define tensors in physics and mathematics. Understanding the original meaning helps us better appreciate why tensors are such a natural and powerful abstraction for expressing complex computations in machine learning.
Comments