This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/musescore1983 on 2024-11-01 05:29:54+00:00.
This question was originally posted on MathOverflow, but I thought it might interest the community here as well:
Question:
I am exploring a neural network architecture inspired by physical interactions, where each neuron has associated “mass” and “position” vectors. The weight matrix between neurons is computed using a force-like inverse-square law interaction, reminiscent of the Coulomb interaction between charged particles. For two neurons with “mass” vectors μ_i
and μ_j
located at positions x_i
and x_j
, the weight w_ij
is defined as:
w_ij = (μ_i · μ_j) / ||x_i - x_j||^2
This formulation is structurally similar to the Coulomb matrix used in quantum chemistry to represent atomic interactions in molecules, where the entries are defined as:
C_ij = (Z_i * Z_j) / ||x_i - x_j|| if i ≠ j 0.5 * Z_i^2.4 if i = j
where Z_i
and Z_j
are atomic charges.
Given this context, I am interested in the following theoretical question:
Under what circumstances can a general symmetric matrix
W
be represented in the form of a Coulomb-like matrix?
That is, when does there exist a set of vectors { μ_i, x_i }
such that:
w_ij = (μ_i · μ_j) / ||x_i - x_j||^2 for all i, j
Motivation:
Exploring the possibility of representing a symmetric weight matrix W
as a Coulomb-like matrix could potentially confirm that neural networks using this “force-based” weight concept can learn any function representable by a traditional network using symmetric matrices. Since multilayer perceptrons with symmetric weight matrices are known to be universal function approximators, establishing a comparable representational capability in neural networks with force-based interactions could open new avenues for designing computationally efficient and scalable neural architectures.
The inquiry into whether any symmetric matrix can be represented as a Coulomb matrix underpins the theoretical validity of using such architectures in broader machine learning applications. Such a representation would not only underscore the universality of force-based neural networks but also provide a foundational argument for their use in scenarios where traditional neural architectures might be computationally prohibitive.
Any insights into conditions, dimensionality constraints, or special cases where such a representation is feasible would be greatly appreciated!
The idea to use vectors instead of real numbers for “mass” comes from physics:
Suppose that two bodies 1
and 2
with each mass m_i
and electric charge q_i
have between them two forces: Coulomb’s Force and Newton’s Gravity force:
F_C = (q_1 * q_2) / |x_1 - x_2|^2 F_N = (m_1 * m_2) / |x_1 - x_2|^2
But by Newton’s principle, the forces can be added, thus:
F_12 = F_C + F_N = (m_1 * m_2 + q_1 * q_2) / |x_1 - x_2|^2
which can be written as:
F_12 = F_C + F_N = <μ_1, μ_2> / |x_1 - x_2|^2
where μ_i = (m_i, q_i)
is a two-dimensional vector with components mass and charge.
I am not imposing restrictions on the diagonal of the symmetric matrix. The dimensions of the mass and position vectors can be different.
Edit: While modifying the original idea, after experiments with custom neural networks, to the following and while keeping the intention the same, I am asking if every symmetric matrix with 0
on the diagonal can be written as:
w_ij = * ||x_i - x_j||^2
This simplifies the analysis I hope since then we do not divide through 0
.
Second edit: A somehow cheating solution which solves the problem above would be to use the spectral theorem, whereby a real symmetric matrix w_ij
can be decomposed into:
w_ij = sum_{k=1}^n (q_{ik} * λ_k * q_{jk})
where q_i = (q_{i1}, ..., q_{in})
is an n
-dimensional vector and λ_k
is a real number. The λ_k
and q_k
are the eigenvalues and eigenvectors of W
. The q_k
are pairwise orthogonal. Using this knowledge that every symmetric matrix can be decomposed this way, we might want to impose on the weights of the neural network the ‘restriction’ that they are generated this way for d
being the dimension of the vectors q_k
which could in theory be large as n
, the number of neurons:
w_ij = sum_{k=1}^d (q_{ik} * λ_k * q_{jk})
Hence the neural network has a d
-dimensional vector λ
and each neuron i
has a d
-dimensional vector q_i
. The weights between neuron i
and j
are computed as described above. The vectors q_i
and λ
are learned through gradient descent and backpropagation as is being done in Multilayer Perceptrons. I have not a proof that this setting should allow the network to learn any function, but my vague idea goes like this:
If an ordinary MLP can learn any function, then it means it can learn any sort of symmetric weights w_ij
. By allowing the ‘Spectral Neural Network’ to be able to express any sort of symmetric weights w_ij
through the spectral theorem, we could argue that the spectral neural network can adapt its parameters to learn any symmetric weights. But then it is an MLP which has learned some specific weights for a given specific function f
and so it should be possible to learn any function with the spectral neural network.
Modified question: Is it possible to make the idea of universal learning with spectral neural network more concrete, maybe a proof?
Comment: Of course for d ≥ n
the savings in memory are lost, but I can imagine that there are situations of problems where d << n
and there we have not only savings in memory from O(n^2)
to O(n*d)
but also a dimensionality reduction, kind of. The training process could start with d=1
and increase it fast or gradually specific to the problem.