Safety neurons: World model tensors for general AI alignment
By Dave Steve
There is a recent feature in The Transmitter, Most neurons in mouse cortex defy functional categories, stating that, “The majority of cells in the cerebral cortex are unspecialized. Cells in the visual cortex that selectively fire in response to an object moving in one direction or another, or neurons in the prefrontal cortex that tune in to confidence and reward and fire only when rats make decisions. Neuropixels recordings reveal that much of the mouse cerebral cortex comprises non-selective neurons.
Everywhere except in the primary sensory areas, the neuronal firing appeared as an undefined blob. The findings suggest that structure exists at the macroscale but that on a regional scale most neurons are multifunctional. Brain regions that respond in a less selective way form high-dimensional neural representations. How the brain encodes information—either in a low-dimensional way, in which the response of each neuron can be described using a linear combination of a few variables, or using a high-dimensional representation that requires many more parameters.”
If neurons are multifunctional, unspecialized or non-selective, this means that neurons apply to varying purposes. This could imply that the neurons for intelligence are also the neurons for caution, consequences, safety, or prudence. But how exactly are neurons multifunctional?
If neurons “selectively fire in response to an object moving in one direction or another” what does it mean that neurons fire? If firing is correlated with the function of neurons, how can the function of neurons be understood by their firing element?
If the objective is to understand how the brain encodes information, to zero in on neurons is to zero in on their firing, making it reasonable to seek out what it means that neurons fire.
How would neurons anatomically represent or encode sensory information? What would neurons become to represent a smell differently from a sound? How would neurons represent a feeling differently from an emotion? If there is nothing anatomical that could mean how neurons would represent information, then it is possible to focus on the elements of their firing to define how those elements encode information.
Electrical signals [or ions] are responsible for neural firing. This means that whenever a neuron is active or delivering on a function, it is doing so, through electrical signals. However, electrical signals are not often alone, they work along chemical signals. So when neurons fire or are active, signals are dominant. This predicates the postulation that electrical and chemical signals are the elements the brain uses to encode information.
Also, it does not matter if neurons seem selective or not, they have to fire for functions, hence the responsibility of electrical signals, emerging off chemical signals and returning to chemical signals.
There are often clusters of neurons. It can be theorized that these clusters allow signals to be in sets or loops, so that they can encode information—providing summaries and receiving information from other sets.
So, if the [electrical and chemical] signals—are the focus to understand the human brain and—are responsible for intelligence and for the safety of intelligence [or affect], how do they apply to building world models for AI and for general AI alignment?
Physical world models and general AI alignment
If AI would model the physical world, it may have to leap its neural basis to signals or at least have a signals block or group—of electrical and chemical. The same will be required for general AI alignment.
Simply, there would have to be a block with several layers were some tensors would act like electrical signals and others like chemical signals. The electrical segment would not use the total inputs, and the chemical segment would not fully interact, with some remaining constant.
Electrical signals, conceptually, often split, with some going ahead in the brain. This allows for the initial ones to interact first with chemical signals, so that if they fit, then they go ahead, if not, the incoming ones interact in the right direction. This is how the brain conceptually mechanizes what is labeled predictive coding, processing and prediction error correction.
Also, the electrical segment could split in different directions within the block, not just of the next chemical segment. This will be like skipping some hidden layers, far deeper than dropouts. The necessity of this distribution or relay randomness is because the world is not necessarily to the same order all the time—even when predictable. Also, there is a need to have a fair amount of intentionality to control some incoming inputs. It would be possible to develop some control in this signals block, as well as the capability to shape memory in some directions—differently from LSTM.
Distributions to some layers of chemical segments would also mean the possibility for general alignment in the instance, just like humans are aware of what to be cautious about in any situation—by speech or action—knowing what consequences would be, otherwise.
Developing tensors and algorithms for what can represent electrical and chemical signals for deep learning could become a way to have them better interpret the world as well as generally be aligned—even when some of the outputs appear on social media, search engines, or app stores.
The research to develop this could be possible in months, exploring for equivalents of electrical and chemical signals in the brain.
There is a new paper in Nature, A foundation model of transcription across human cell types, were the authors wrote, “Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks.”