By David Stephen
Sedona, AZ — Sedona.biz — Assuming an AI alignment or safety app rose to first place on the app store, it would have been a major great news for the world. This app, say with a capability to authenticate or track AI outputs, especially against misuses, would have found broad usefulness, gating risks as artificial intelligence advances towards artificial general intelligence.
There is a current need to solve deepfake images problem, affecting schools. There is a problem of AI voices of loved ones. There are fake videos as well as fake texts. There are cybersecurity misuses of chatbots. There are also several combinations to obtain information that bypasses the guardrails of AI models. These are some of the common [present] risks of AI. There are future risks that are potentially more dangerous than these.
Several AI tools are available across app stores. Others in search engines. Many of their outputs are seen on social media, while some others are used privately. There are common processes of AI tools, from prompt to output—tokens. These tokens [words or letters] have vector [or number] equivalents. There could be the possibility to explore a project where the outputs of misuses, from any AI tool, can have pattern matching to their vectors so that similar or likely combinations, in future cases, are recognized.
The purpose is to have what can be referred to as general AI alignment, where safety is possible across models, not just one or a few. This means that so long as any AI model can have its outputs in public areas of the internet, it can be subjected to safety checks or tracks by vector data, conceptually. Also, it is possible to explore platform driven penalty, where, if a model is misused and it is found in some areas of the internet, it could be explored for penalization—by data, compute, parameters, or output blurs.
Though this could begin from individual models before expanding to platforms. Some of these paths towards general safety and alignment could be technically challenging, but the problem is complex and necessary that simple solutions may not cut it, given that there are already several present misuses with little to no answers.
There may also be moments, such that whenever they output something bad, and the AI model can know and then become heavier in responding the next time. So, to avoid it, it may generally prefer not to answer some queries since it would become heavy. This moments parameter can be used as a verifier for many AI models, so that they are less misused.
It is possible to also develop non-concept features. Normally, features are artificial neurons in AI models that represent memories of things and their similarities to others. If it is possible to explore features that are not concepts, they can be used as basis for safety checks, as they output results.
There are several answers that major AI models would not give, by rejection at times or they start answering then they stop. This is not enough for safety and alignment in the vast weaknesses that many of the models face.
Reasoning, low training costs, fewer computing and energy efficiency are sought for breakthroughs in AI advancement, but AI alignment and safety are key problems where their general absence may erase advancements and pose unknown risks. An approach could be how human affect aligns human intelligence towards general human survival.
There is a new [January, 2025] International AI Safety Report, stating that, “The capabilities of general-purpose AI, the type of AI that this report focuses on, have increased rapidly in recent years and have improved further in recent months. Many companies are now investing in the development of general-purpose AI agents, as a potential direction for further advancement. Further capability advancements in the coming months and years could be anything from slow to extremely rapid. Several harms from general-purpose AI are already well established. As general-purpose AI becomes more capable, evidence of additional risks is gradually emerging. Risk management techniques are nascent, but progress is possible. The pace and unpredictability of advancements in general-purpose AI pose an ‘evidence dilemma’ for policymakers. There is broad consensus among researchers that advances regarding the following questions would be helpful: How rapidly will general-purpose AI capabilities advance in the coming years, and how can researchers reliably measure that progress? What are sensible risk thresholds to trigger mitigations? How can policymakers best gain access to information about general-purpose AI that is relevant to public safety? How can researchers, technology companies, and governments reliably assess the risks of general-purpose AI development and deployment? How do general-purpose AI models work internally? How can general-purpose AI be designed to behave reliably? AI does not happen to us: choices made by people determine its future.”