Why pre-deployment testing and evaluation is not general AI safety

By David Stephen

Sedona, AZ –The adoption of the pharmaceutical approach can be described as niche to AI safety. Pre-deployment testing and evaluation of OpenAI [o1], Anthropic [Claude 3.5 Sonnet] or others, only imply [circa] safety for those models, not general for AI.

The misuses that are possible by AI tools across the internet [most times] may have nothing to do with OpenAI and Anthropic models. AI tools, for all kinds of purposes, are too numerous beyond what can be actively pre-tested or evaluated, by the US or UK AI safety institute.

The pharmaceutical industry in most countries ensure that medications are approved, their ads are near accurate, the brands are identifiable among others [regulation] and there is the possibility for litigation or worse if they do not comply or mislead. There are several countries where counterfeit medications are available, but it is unlikely that a counterfeit medication group shows up in the mainstream.

This is the problem with AI, where there are tools for all kinds of stuff available on first pages of search engines, on several social media, app stores and so forth.

While the owners of the platforms have policies for safety, digital is also not the physical—without severe possibility for evasion and replication in short intervals. The question is how can AI be generally safe from synthetic [harmful, problematic] contents and several other misuses?

How can AI also be understood for its mind? What are the components of AI that can be ascribed as the mind, then what are the features of those, to prospect how AI might be working and how to generally improve safety?

General AI safety is not a case for just two major models, but around ways that AI can be safe across sources. If AI is a mind for digital contents or AI has a mind, what would be the components of that mind? Layers, nodes? If the components were extricated, how do they relay to define the outputs of AI? [Relays for the components of AI can be described as the math and compute that underpin AI]. Also, what are the channels from which alignment can be further introduced for general AI safety, across platforms? The human mind can be used to draw some parallels, while seeking out the mind for AI.

There is a recent report by NIST, Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet, stating that, “The U.S. Artificial Intelligence Safety Institute (US AISI) and the UK Artificial Intelligence Safety Institute (UK AISI) conducted a joint pre-deployment evaluation of Anthropic’s latest model – the upgraded Claude 3.5 Sonnet (released October 22, 2024). US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across four domains: (1) biological capabilities, (2) cyber capabilities, (3) software and AI development, and (4) safeguard efficacy. To assess the model’s relative capabilities and evaluate the potential real-world impacts of the upgraded Sonnet 3.5 across these four areas, US AISI and UK AISI compared its performance to a series of similar reference models: the prior version of Anthropic’s Sonnet 3.5, OpenAI’s o1-preview, and OpenAI’s GPT-4o.”

There is a similar report by NIST, Pre-Deployment Evaluation of OpenAI’s o1 Model, stating that, “The U.S. Artificial Intelligence Safety Institute (US AISI) and the UK Artificial Intelligence Safety Institute (UK AISI) conducted a joint pre-deployment evaluation of OpenAI’s latest model, o1 (released December 5, 2024). US AISI and UK AISI conducted testing during a limited period of pre-deployment access to the o1 model. Testing was conducted by expert engineers, scientists, and subject matter specialists from staff at both Institutes, and the findings were shared with OpenAI before the model was publicly released. US AISI and UK AISI ran separate but complementary tests to assess the model’s capabilities across three domains: (1) cyber capabilities, (2) biological capabilities, (3) and software and AI development. To assess the model’s relative capabilities and evaluate the potential real-world impacts of o1 across these areas, US AISI and UK AISI compared its performance to a series of similar reference models: OpenAI’s o1-preview, OpenAI’s GPT-4o, and both the upgraded and earlier version of Anthropic’s Claude 3.5 Sonnet.”

There is a recent announcement by OpenAI, Early access for safety testing, stating that, “We’re inviting safety researchers to apply for early access to our next frontier models. This early access program complements our existing frontier model testing process, which includes rigorous internal safety testing, external red teaming such as our Red Teaming Network⁠ and collaborations with third-party testing organizations, as well the U.S. AI Safety Institute and the UK AI Safety Institute. As models become more capable, we are hopeful that insights from the broader safety community can bring fresh perspectives, deepen our understanding of emerging risks, develop new evaluations, and highlight areas to advance safety research. As part of 12 Days of OpenAI⁠, we’re opening an application process for safety researchers to explore and surface the potential safety and security implications of the next frontier models.”