As privacy concerns mount — both cyber threats and legal requirements — a clear, formal, standard model of data components and their history has become necessary. Here, we introduce the concept of the data bill of materials or personal data bill of materials, a comprehensive inventory of personal data used in software systems.
The DBoM records the ownership, sharing history, storage and collection purpose of a unit of data. The purpose of a DBoM is to identify personal data as an asset and an essential component of the software and system inventory, just as integral as programs, servers and other components. The purpose of this is to maintain the integrity of personal or sensitive data; ensure the confidentiality of data throughout the life cycle; and provide transparency about the collecting, usage, storing, sharing and destruction of personal data. This will improve data security, privacy and user confidence in data systems, and will make compliance with international privacy laws simpler and more effective.
Bill of materials for software systems
Used frequently in supply chain organization, a bill of materials is a “comprehensive inventory of the raw materials, assemblies, subassemblies, parts and components … needed to manufacture a product.” Essentially, the BoM consists of all the goods and resources involved in assembling a final product. Having a detailed BoM can help companies estimate material costs, plan purchases, control their inventory and maintain accurate records.
After the SolarWinds supply chain attack, where hackers inserted a backdoor into commercial software to access users’ computers, U.S. President Joe Biden issued an executive order to improve cybersecurity in the United States. Part of the cybersecurity improvements included the National Telecommunications and Information Administration publishing guidance on the software bill of materials, a record of individual components in a piece of software. In a world of software development with various open-source libraries and a highly interdependent ecosystem, SBoMs can help with software transparency, integrity and identity, allowing users to better inventory the components of their software to find and resolve vulnerabilities.
Introducing the data bill of materials
While the SBoM is a good step in the right direction to bring transparency and integrity, it does not provide these benefits to every part of a system. The SBoM covers software components, but the data assets stored and processed within the system are not covered. Data, which can be in various forms like structured (ex., databases) or unstructured (ex., file shares), are an essential part of the software ecosystem. The data in the system is in fact a critical asset that needs to be protected. While all kinds of data exist, personal data has special importance for security and privacy. There is a need for a comprehensive inventory of personal data collected, used, processed and destroyed in the system life cycle, which we call the data bill of materials or personal data bill of materials.
The essential components of the DBoM are proposed below. This list is a detailed starting point for what should be recorded in the DBoM, but can be developed and made more comprehensive by adding more factors in the future.
Overall, a DBoM should answer the following questions:
- Data use
- What personal data, sensitive data, and nonpersonal data are you using?
- When was the data collected?
- How do you use the collected data?
- Who is responsible for the data set?
- Who is responsible for administering the data?
- What other first and third parties have access to the data?
- What applications use the data?
- What mechanisms are used to protect the data?
- Data collection
- Who are you collecting data from?
- What categories of personal and nonpersonal data are you collecting?
- How do you collect data?
- Data location
- Do your data use, collecting, and processing comply with applicable laws?
- Where is your data stored?
- Who are you sharing data with?
Benefits of a DBoM
Like how an SBoM simplifies the identification of the digital components of a piece of software into a machine-readable format, a DBoM will help stakeholders and data collectors find sensitive data as well as vital information about that data — such as when, why and how it was collected. Having such a record built into the data collection process will increase the transparency of the data collection and cataloging process. This helps stakeholders more effectively take inventory of the data they possess, which can otherwise be difficult considering the vast quantities of data companies might acquire.