By David Stephen
Sedona, AZ — Private properties in any neighborhood have the right to privacy until something suspicious is observed. AI tools that are used to create deepfakes, voice impersonations and misinformation, may need their tokens scraped, towards safety from those.
AI safety—from current misuses—is not just about checking basic terms, creating user accounts, fine-tuning, guardrails, or making users pay, but the ability to track how the tools are being used, especially in ways that cause harm.
A problem, for example, with firearms, beyond background checks and transaction records, is that there is no way to know how they would be used, especially by those who use them for self-harm or to harm others, and those who obtain them illegally, to commit crimes.
Usage is a core part of safety. Reliance on traditional expectations is a key risk for several objects within human utility. AI is not just a basic tool. It creates and can do everything within the demanded stretch.
It is true that users share the outputs of AI, yes, but sharing could be the least of the effort in the range. This is the reason it may be necessary that the UK and US AI Safety Institutes begin computing research programs as an answer to the present risks problem, using web crawling and scraping of indexed AI tools.
It was recently reported by Search Engine Land, Perplexity Pages showing in Google AI Overviews, featured snippets, that, “You can now find AI overviews generated by the Perplexity search engine in your Google AI Overviews and Search results.”
Simply, Perplexity Pages are not just indexed or crawled by Google, their data is also scraped. While this is similar to what is obtainable across the web, it may have to go deeper, with direct embeddings, so that AI tools on search engines, hosting servers, app stores, and so forth, can have their vectorsscraped, to have inputs and outputs around certain keyword collected, so that if they match with harm, they can be tracked and prevented. It may also become a database to get outputs from non-indexed AIs, as well as to explore the development of useful intent, against existential risks of AI, towards safe superintelligence or safe AGI.
Already, digital hashing is used as a means to fight deepfakes, but this after the fact approach may not work for everyone, as misuses and evasive methods diffuse.
There is often news from tech companies of not storing AI prompts and so forth, which is fine, for benign usages. It is unlikely that for safety, the kind of hoisted data privacy will hold, when there are harms against people. Data protection and security can still be ensured, since embeddings may be used rather than the context length of tokens to factor concerns. Research may explore a healthy balance. End-to-end encryption often boosts user confidence, but when the tool is also misused, a way around it, for safety without revealing much, may work, like public CCTVs monitoring activities.
There is a recent preprint on arXIv, Proactive Detection of Voice Cloning with Localized Watermarking, where the authors wrote, “We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator / detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility.”
Detection would apply as part of the solutions. However, in some others, AI safety may depend on tracking sources of AI-generated outputs, by their tokens—developing novel models to approach this, against present risks.
AI safety is also a problem of the human mind, where desires, vengefulness, pleasure, amusement and so forth may decide choices for people. Regulations may work in some cases, hashing, reporting and so forth may also work, but another technical basis, with web crawling and scraping may extend safety.