Please log in using either your email address or your membership number.
Please register with your name, email address, password and email preferences. You will be sent an email to verify the address.
Please enter the email address used for your account. A temporary password will be emailed to you.
This is the sixth minor update since version 3.0. This update has introduced a number of changes which has resulted in a slight incompatibility to previous updates of the 3.x series.
To assist users of earlier 3.x versions of the guidance in ensuring that their existing data safety arguments have not been impacted by this update, a version of this document is available which has been annotated with change bars.
Since the last edition, Artificial Intelligence (AI) systems, particularly those based on Large Language Models (LLMs) have received a lot of attention. These are very heavily data-driven systems, but most of the hazards that have been identified are at the societal, system, or algorithmic level and so not within the scope of this guidance. However, issues of biasing, interpretation and, arguably, falsification may arise, and those sections have been reviewed to ensure they remain applicable to this rapidly changing field. Similarly, the Data Properties Completeness, Analysability, and Explainability have been reviewed to ensure their continued applicability.
Data is here: Data is becoming ever more important in our lives: influencing, managing and even controlling many critical aspects. The use of AI systems is a new, exciting but potentially hazardous use of data. Large Language Model (LLM) based systems are trained on vast amount of data, and it this data which enables them to be useful
Data is growing: There are at least two reasons why the use of data has grown and, equally important, why it is expected to continue to grow. The first relates to the rapid expansion of the area loosely termed “Big Data”, including the use of large data sets to support machine learning and artificial intelligence applications. The second is the growing use of systems of systems, where data is the lifeblood that connects together disparate elements and allows a cohesive capability to be built. Put simply, the need to address data-related issues is a pressing problem and will continue to be so.
Data is causing harm: Strictly speaking, data can neither cause nor prevent harm. However, mistakes introduced in data, or the inappropriate use of data, within safety-related systems have been factors in a number of documented accidents and incidents. Examples include: aircraft attempting to take off from the wrong runway (and consequently crashing); ships running aground; and patients being exposed to higher than planned doses of radiation.