Classification of AI safety risks
2025-02-06
Safety risks exist at every stage throughout the AI chain, from system design to research and development (R&D), training, testing, deployment,utilization, and maintenance. These risks stem from inherent technical flawsas well as misuse, abuse, and malicious use of AI.
3.1 AI's inherent safety risks
3.1.1 Risks from models and algorithms
(a) Risks of explainability
AI algorithms, represented by deep learning, have complex internal workings. Their black-box or grey-box inference process results in unpredictable and untraceable outputs, making it challenging to quickly rectify them or trace their origins for accountability should any anomalies arise.
(b) Risks of bias and discrimination
During the algorithm design and training process, personal biases may be introduced, either intentionally or unintentionally. Additionally, poor-quality datasets can lead to biased or discriminatory outcomes in the algorithm's design and outputs, including discriminatory content regarding ethnicity, religion, nationality and region.
(c) Risks of robustness
As deep neural networks are normally non-linear and large in size, AI systems are susceptible to complex and changing operational environments or malicious interference and inductions, possibly leading to various problems like reduced performance and decision-making errors.
(d) Risks of stealing and tampering
Core algorithm information, including parameters, structures, and functions, faces risks of inversion attacks, stealing, modification, and even backdoor injection, which can lead to infringement of intellectual property rights (IPR) and leakage of business secrets. It can also lead to unreliable inference, wrong decision output and even operational failures.
(e) Risks of unreliable output
Generative AI can cause hallucinations, meaning that an AI model generates untruthful or unreasonable content, but presents it as if it were a fact, leading to biased and misleading information.
(f) Risks of adversarial attack
Attackers can craft well-designed adversarial examples to subtly mislead, influence and even manipulate AI models, causing incorrect outputs and potentially leading to operational failures.
3.1.2 Risks from data
(a)Risks of illegal collection and use of data
The collection of AI training data and the interaction with users during service provision pose security risks, including collecting data without consent and improper use of data and personal information.
(b)Risks of improper content and poisoning in training data
If the training data includes illegal or harmful information like false, biased and IPR-infringing content, or lacks diversity in its sources, the output may include harmful content like illegal, malicious, or extreme information.
Training data is also at risk of being poisoned from tampering, errorinjection, or misleading actions by attackers. This can interfere with themodel's probability distribution, reducing its accuracy and reliability.
(c)Risks of unregulated training data annotation
Issues with training data annotation, such as incomplete annotation guidelines, incapable annotators, and errors in annotation, can affect the accuracy, reliability, and effectiveness of models and algorithms. Moreover, they can introduce training biases, amplify discrimination, reduce generalization abilities, and result in incorrect outputs.
(d) Risks of data leakage
In AI research, development, and applications, issues such as improper data processing, unauthorized access, malicious attacks, and deceptive interactions can lead to data and personal information leaks.
3.1.3 Risks from AI systems
(a)Risks of exploitation through defects and backdoors
The standardized API, feature libraries, toolkits used in the design, training, and verification stages of AI algorithms and models, development interfaces, and execution platforms, may contain logical flaws and vulnerabilities. These weaknesses can be exploited, and in some cases, backdoors can be intentionally embedded, posing significant risks of being triggered and used for attacks.
(b) Risks of computing infrastructure security
The computing infrastructure underpinning AI training and operations, which relies on diverse and ubiquitous computing nodes and various typesof computing resources, faces risks such as malicious consumption of computing resources and cross-boundary transmission of security threats at the layer of computing infrastructure.
(c) Risks of supply chain security
The AI industry relies on a highly globalized supply chain. However, certain countries may use unilateral coercive measures, such as technology barriers and export restrictions, to create development obstacles and maliciously
disrupt the global AI supply chain. This can lead to significant risks of supply disruptions for chips, software, and tools.
3.2 Safety risks in AI applications
3.2.1 Cyberspace risks
(a) Risks of information and content safety
AI-generated or synthesized content can lead to the spread of false information, discrimination and bias, privacy leakage, and infringement issues, threatening the safety of citizens' lives and property, national security, ideological security, and causing ethical risks. If users’ inputs contain harmful content, the model may output illegal or damaging information without robust security mechanisms.
(b) Risks of confusing facts, misleading users, and bypassing authentication
AI systems and their outputs, if not clearly labeled, can make it difficult for users to discern whether they are interacting with AI and to identify thesource of generated content. This can impede users' ability to determine the authenticity of information, leading to misjudgment and misunderstanding. Additionally, AI-generated highly realistic images, audio, and videos may circumvent existing identity verification mechanisms, such as facial recognition and voice recognition, rendering these authentication processes ineffective.
(c) Risks of information leakage due to improper usage
Staff of government agencies and enterprises, if failing to use the AI service in a regulated and proper manner, may input internal data and industrial information into the AI model, leading to leakage of work secrets, business secrets and other sensitive business data.
(d) Risks of abuse for cyberattacks
AI can be used in launching automatic cyberattacks or increasing attack efficiency, including exploring and making use of vulnerabilities, cracking passwords, generating malicious codes, sending phishing emails, network scanning, and social engineering attacks. All these lower the threshold for cyberattacks and increase the difficulty of security protection.
(e) Risks of security flaw transmission caused by model reuse
Re-engineering or fine-tuning based on foundation models is commonly used in AI applications. If security flaws occur in foundation models, it will lead to risk transmission to downstream models.
3.2.2 Real-world risks
(a)Inducing traditional economic and social security risks
AI is used in finance, energy, telecommunications, traffic, and people's livelihoods, such as self-driving and smart diagnosis and treatment. Hallucinations and erroneous decisions of models and algorithms, along with issues such as system performance degradation, interruption, and loss of control caused by improper use or external attacks, will pose security threats to users' personal safety, property, and socioeconomic security and stability.
(b) Risks of using AI in illegal and criminal activities
AI can be used in traditional illegal or criminal activities related to terrorism, violence, gambling, and drugs, such as teaching criminal techniques, concealing illicit acts, and creating tools for illegal and criminal activities.
(c) Risks of misuse of dual-use items and technologies
Due to improper use or abuse, AI can pose serious risks to national security, economic security, and public health security, such as greatly reducing the capability requirements for non-experts to design, synthesize, acquire, and use nuclear, biological, and chemical weapons and missiles; designing cyber weapons that launch network attacks on a wide range of potential targets through methods like automatic vulnerability discovering and exploiting.
3.2.3 Cognitive risks
(a) Risks of amplifying the effects of "information cocoons"
AI can be extensively utilized for customized information services, collecting user information, and analyzing types of users, their needs, intentions, preferences, habits, and even mainstream public awareness over a certain period. It can then be used to offer formulaic and tailored information and service, aggravating the effects of "information cocoons."
(b) Risks of usage in launching cognitive warfare
AI can be used to make and spread fake news, images, audio, and videos, propagate content of terrorism, extremism, and organized crimes, interfere in internal affairs of other countries, social systems, and social order, and jeopardize sovereignty of other countries. AI can shape public values and cognitive thinking with social media bots gaining discourse power and agenda-setting power in cyberspace.
3.2.4 Ethical risks
(a)Risks of exacerbating social discrimination and prejudice, and widening the intelligence divide
AI can be used to collect and analyze human behaviors, social status, economic status, and individual personalities, labeling and categorizing groups of people to treat them discriminatingly, thus causing systematical and structural social discrimination and prejudice. At the same time, the intelligence divide would be expanded among regions.
(b)Risks of challenging traditional social order
The development and application of AI may lead to tremendous changes in production tools and relations, accelerating the reconstruction of traditional industry modes, transforming traditional views on employment, fertility, and education, and bringing challenges to stable performance of traditional social order.
(c)Risks of AI becoming uncontrollable in the future
With the fast development of AI technologies, there is a risk of AI autonomously acquiring external resources, conducting self-replication, become self-aware, seeking for external power, and attempting to seize control from humans.