Humans are fundamentally visual creatures. From the moment we're born, we’re wired to search for patterns.
“All perceiving is also thinking, all reasoning is also intuition, all observation is also invention.”
— Johann Wolfgang von Goethe
This drive shaped our evolution and continues to define how we interpret the world. During the 19century, the invention of the camera-enabled picture-perfect representation — surpassing even the most skilled artists. And yet, Impressionists held their ground, capturing something deeper than realism: human perception and emotional resonance.

Claude Monet, Impression, Sunrise (1872)
Musée Marmottan Monet, Paris
We are designed to see connectedness.
Even if we’re not consciously searching for it, patterns are everywhere, and everything — at some level — can be reduced to the same underlying structures. Chemistry, physics, engineering, and even artificial intelligence all converge when we peel back their surfaces: they share fundamental mechanisms. The deeper we go, the more "connection" reveals itself.
Today’s AI revolution reflects this idea. Modern AI doesn’t just classify static data — it uncovers relationships between data points.
Let’s say we have a class : a cat. It can be described through various properties — fur color, eye shape, body size, ear length, etc. These properties, , form a feature vector. Similarly, another class: a dog, has its own properties . If we consider 5 properties, we're in a 5-dimensional space.
But since humans struggle with visualizing more than 3 dimensions, we often project this high-dimensional data into 2D — plotting red dots for cat instances and blue dots for dog instances. These dots represent much more than they appear — they encode complex features folded into a lower-dimensional plane.
When new input data arrives (unknown whether it's a cat or dog), we need a way to classify it. But drawing a straight line to separate these compressed 2D dots is often too crude — the projection distorts the structure. That’s where the kernel function comes in. Kernels lift the data back into higher dimensions where its internal structure — which was "folded" — becomes unfolded, revealing clusters and separability based on similarity.
But we don't just deal with two categories. In real life, we might have many: cats, dogs, dolphins, rabbits, and more. That’s when Graph Neural Networks (GNNs) come into play.
In GNNs, each node — say A, B, C, D — represents a data point (like a cat or dog), and each has its own set of features (Layer 0). Instead of treating them as just categories on a 2D plane, GNNs allow each node to "speak" to its neighbors. At Layer 1, a node updates itself by reflecting the features of its first-hop neighbors, like spectators becoming active participants. This process — using a transformation matrix W and applying a nonlinearity σ (like ReLU) — resembles the forward propagation of a signal.
Repeating this across layers — WσWσWσ— reminds me of Green’s function formalism in quantum physics. In physical systems, signals or particles propagate and influence surrounding molecules, including solvents. The forward direction mimics how information in GNNs flows outward to neighbors, while Green’s functions allow us to calculate how a disturbance or excitation (such as electron addition/removal) impacts the system — via repeated application and integration: GGG…
Similarly, cross-entropy loss, which compares predicted probabilities with true labels y, reminds me of the variational principle in quantum mechanics — where the estimated energy is always greater than or equal to the true ground state energy. In machine learning, the system "learns" by minimizing this loss, analogous to how wavefunctions are optimized to lower the energy.
And when I see the word entropy, I naturally think of thermodynamic entropy S, and of course, Gibbs free energy:
ΔG=ΔH−TΔS
These models don’t have literal heat or temperature, but the idea of free energy carries over — it becomes a metaphor for the system’s available potential to improve. In optimization, a model lowers its "free energy" (loss) by finding a better configuration of weights — just like how molecules settle into low-energy states.
In fact, Density Functional Theory (DFT) also works this way. When doing quantum chemical calculations, we use the gradient of the potential energy surface to optimize toward a minimum. In DFT packages, there's often a convergence threshold like— the calculation stops when the energy gradient (first derivative) is close enough to zero.
DFT also uses self-consistent field (SCF) iterations — just like GNNs. The system updates its estimate of electron density repeatedly, improving each time. And in more advanced wavefunction methods like CI (Configuration Interaction), the wavefunction evolves by "turning on and off" electron configurations — conceptually similar to adjusting parameters and activations in neural nets.
In the end, whether we’re looking at brush on canvas, electrons in orbitals, or data points in a neural network, we are always searching for structure — for something meaningful behind. Impressionist painters didn’t just paint what they saw, they painted what they felt as temperature like warmth, cold and converts to color or hardness converts to shape. Scientists build theories by simplifying to symbol and imitate - atoms to circle, systems and surroundings as circles surrounding connected by real line or dotted line . And AI doesn’t just classify — it learns from the connections we might not yet see.
All perceiving is thinking. All observation is invention. And all of these — art, science, learning — are not separate.
They are deeply, endlessly connected.
Add comment
Comments