ISA99 Cybersecurity

ISA/IEC 62443 – A Holistic Risk Management Approach to Cybersecurity

Cybersecurity is traditionally an IT concept. Industrial Automation and Control Systems (IACS) used to be connected via proprietary networks and isolated from Internet. In recent years, more and more IACS start using standard IT protocols such as Ethernet, TCP/IP, and commercially off the shelf IT products such as Windows operating system, servers, workstations, etc. Standard IT systems and products improve connectivity and standardize system maintenance between IT and IACS, but they increase cybersecurity risk to IACS and the processes being controlled.

Although cybersecurity has been studied in IT systems, IACS has specific needs and requirement. For example, the availability of IACS is measured in real-time (millisecond) to keep the process or equipment in safe running state; a cyber attack on IACS may lead to equipment damage or personal injury. To address the cybersecurity risk in IACS, ISA/IEC 62443 (formally ISA 99) was developed and specifically targeted for cybersecurity in IACS. For a person working in IACS but has little knowledge of cybersecurity like myself, this standard seems to be a good start. This article summarizes my study of the standard, highlighting its features, structures, and some limitations.

This standard has following features:

It suggests a holistic and systematic approach by proposing a cybersecurity Management System (CSMS) and lifecycle management method.
It adopts a risk management approach for cybersecurity. This approach includes activities for risk assessment, risk tolerance selection, countermeasure selection, Security Level (SL) determination, and product certification.
It defines a list of Fundamental Requirements (FR) to characterize a system or component in terms of security function. FR expands the typical CIA (Confidentiality, Integrity, Availability) security requirement and provides a baseline to compare security functions.
It defines zones and conduits to group assets and communication channels that have similar security requirements.

Standard Structure

The standard has multi-parts, grouped into four categories:

1) General

62443-1-1 Terminology and concepts
62443-1-2 Master Glossary
62443-1-3 System security compliance metrics
62443-1-4 Security lifecycle and use-case

2) Policy and Procedures

62443-2-1 Requirement for an IACS security management system
62443-2-2 Implementation guidance for an IACS security management system
62443-2-3 Patch management in the IACS environment
62443-2-4 Requirements for IACS solution suppliers

3) System

62443-3-1 Security technologies for IACS
62443-3-2 Security risk assessment and system design
62443-3-3 System security requirements and security levels

4) Components

62443-4-1 Product development requirements
62443-4-2 Technical security requirements for IACS components

The target audience are asset owners and operators, system integrators, product suppliers, service providers, and government agencies and regulatory bodies. Each audience group can use related parts of the standard and carry out related action to manage the cybersecurity risks. For example, asset owners assess risks and specify level of protection required; product suppliers develop hardware and software; service providers and integrators deploy components and maintain the IACS.

Holistic Approach

When cybersecurity is discussed, it is often approached from technical perspective, emphasizing the counter measures to stop intruders. ISA/IEC 62443 suggests a holistic approach that involves organizational methods such as policies & procedures, system development such as security risk assessment and security technologies, life cycle management, and component development for technical security requirements. The standard intends to establish a Cybersecurity Management System (CSMS) that would cover above aspects and also tailor to the specific needs of the organization. The CSMS would include three parts:

Initiate CSMS and risk analysis. Establish business rationale for establishing a CSMS; identify, classify, and assess cybersecurity risks; define the scope of CSMS; identify organizational support and resources.
Addressing risk with CSMS. Create security policy, improve organization processes, and set up staff training for awareness; select countermeasures such as network segmentation, account administration, authentication, and authorization; implement countermeasures such as system development, document management, incident planning and response.
Monitor and improve CSMS. Review, maintain, and continuously improve the CSMS to ensure risk conformance.

CSMS provides a framework to organize related activities for cybersecurity during the whole life cycle of a facility. The standard has described processes to develop a CSMS and recognizes developing a CSMS is a journey that may take months or years. It objects to address cybersecurity as a project with a start and end date.

Risk Management

In cybersecurity context, risk is defined by the expected loss because of threat agents making use of vulnerabilities and abuse or damage assets. Assets may be physical, logic, or human. Risk is a function of threat, vulnerability, and consequence. To reduce risk, asset owners can implement counter measures to prevent or minimize the magnitude of the consequence. To make a sound decision, asset owners should assess the risk, evaluate risk tolerance, and select responses and countermeasures. Following risk responses can be considered:

Design the risk out.
Reduce the risk.
Accept the risk.
Transfer or share the risk.
Eliminate or redesign redundant or ineffective controls.

Risk depends on two factors: the likelihood of occurrence and the significance of consequence. In process or machine safety, the likelihood of occurrence is caused by the equipment failures/malfunctions and human errors. Historical data and failure mode analysis can be used to estimate equipment failure rate and human error rate. For similar types of failures in equipment and operations, failure rates can be assumed as identical and historical data accumulated for estimation. When available failure data grows, the failure rate estimate becomes more accurate.

Cybersecurity incidents are carried out by threat agents and often with intention to cause damage. Their likelihood of occurrence depends on the vulnerabilities in the system (networking devices and equipment) and the motivation, skills, and capabilities of threat agents. There are many difficulties when estimating occurrence likelihood:

The two factors affecting occurrence are interrelated. When vulnerabilities are discovered or revealed in a system, threat agents launch more attacks.
It is hard to collect incident data across broad types of industries and use it to estimate occurrence likelihood. For business organizations with different natures, the possible threat agents will be vastly different. For example, potential monetary gains would attract more threat agents to attack banking systems. For manufacturing facilities, cybersecurity attacks are expected to be less. Even for same type of manufacturing facilities, if facilities are in areas subject to territorial or political dispute, the occurrence will be higher than the ones in peaceful areas. Incident data has to be grouped and categorized before using it. For a given industry, this specificity requirement would lead to difficulty of collecting sufficient valid data and using it for estimating likelihood of occurrence.
Cybersecurity incidents may not be reported and accounted for because of concerns such as loss of reputation and customers.

The ISA/IEC 62443 has not discussed above difficulties of estimating cybersecurity occurrence likelihood. The presented method is based on a qualitative risk matrix. The occurrence likelihood and consequence are classified into different categories. For example, occurrence likelihood can be remote, unlikely, possible, likely, and certain; the consequence can be minor, moderate, major, and catastrophic. Risk category and security level requirement are assigned by considering both likelihood and consequence categories.

Security Level (SL)

SL is a very similar acronym as SIL. SIL is short for Safety Integrity Level, a functional safety concept, defined based on the probability of failure on demand. SIL is used to describe the safety performance of a safety protection loop. When a safety hazard realizes (i.e. demand), safety protection loop activates and brings the process or system to a safety state. But any device, equipment, and system are subject to failures. The probability of failure on demand of a safety loop indicates its protection capability. Apparently, the lower the probability of failure, the higher the SIL and the cost are. SIL concept provides a consistent criterion in the development, selection, design, and verification of safety loops. In short, hazards pose risk, safety loops are countermeasure for hazards, and SIL are protection performance measure of safety loops.

SL can be understood in similar ways. Like hazards in a safety system, threat agents pose risk to security objective; countermeasures are actions to reduce risk to acceptable level. Countermeasures can be technical, administrative, or physical.

SL is used to describe how well a countermeasure can protect assets and ensure security objectives are maintained. It is a qualitative index, often labeled from 1 to 4. The lowest level, SL-1, indicates the capability of preventing passive type of threat such as eavesdropping and casual exposure; the highest level, SL-4, indicates the capability of preventing active threats with extended resources, skills, and motivation.

For SIL, the protection of a safety function is measured by a scalar numeric, probability of failure on demand. In contrast, there are multiple objectives in cybersecurity. In a simple version, it is to ensure Confidentiality, Integrity, and Availability (CIA). ISA/IEC 62443 defines a more detailed list, called Fundamental Requirements (FR), which includes the following:

Identification and Authentication Control (IAC) – Control access to device, information, and assets.
Use Control (UC) – Control use of device, information, and assets.
Data Integrity (DI) – Ensure the integrity of data or communication channels.
Data Confidentiality (DC) – Ensure the confidentiality of data or communication channels.
Restrict Data Flow (RDF) – Restrict the flow of data on communication channels.
Timely Response to Event (TRE) – Respond to security violations in a timely manner.
Resource Availability (RA) – Ensure availability of network resources.

The standard then further expands each FR into a series of SR (System Requirement) and zero or more RE (Requirement Enhancements). A component or system must meet a specific list of SR and RE to meet a particular SL. For example, in FR 1 – IAC, it includes following SR:

SR 1.1 – Human user identification and authentication.
SR 1.2 – Software process and device identification and authentication.
SR 1.3 – Account management.
…

As there are seven FR, SL is written as a vector of seven elements, identifying the SL of each FR. For example {2 2 1 1 3 1 3} with each scalar number representing SL for the corresponding FR in sequential order.

Zones and Conduits

For a complex system like IACS, it may not be practical or necessary to apply same level security to all components. Zones and conduits provides a means to group assets and communication channels by security requirement. This is particularly useful for IACS as many reference models already identify layers of systems such as:

Level 0 – Process (equipment under control).
Level 1 – Safety and protection; basic control.
Level 2 – Supervisory control.
Level 3 – Operations management.
Level 4 – Enterprise systems (business planning and logistics).

Organizations can use reference models, detailed asset models, and network architectures to identify assets that share common security requirements. By grouping assets and communication channels into zones and conduits, various countermeasures can be applied to each zone and conduit to achieve the target security levels.

The achieved SL of a zone or conduit is related to multiple factors:

SL capability of countermeasures.
Achieved SL of zones that current zone has communication with.
Effectiveness of countermeasures.
Audit and testing interval of countermeasures.

The standard defines three types of SL: SL target, SL capability, and SL achieved. Distinguishing capability and achieved highlights the importance of proper configuration of networking devices and countermeasures. For example, a firewall may be capable as SL-2, but if not configured to block unused ports and traffic, it is not providing protection to achieve required SL. The configured system must be functionally tested to ensure the correct configuration and implementation.

Summary

ISA/IEC 62443 is a comprehensive standard for managing cybersecurity risk in IACS. It mimics the functional safety standard and presents a CSMS as a systematic and lifecycle management method; it proposes risk management methods, though the risk assessment is limited to qualitative only; it defines concepts of SL, FR, zones, and conduit that provides a consistent basis for specifying security requirement, selecting components, and verifying requirement. For IACS professionals interested in learning about cybersecurity, this standard is a good start.

Reference

[1] ISA-62443-1-1/2-1/3-3 Security for Industrial, Automation, and Control Systems, ISA, 2009.

[2] Cybersecurity of Industrial Systems, Jean-Marie Flaus, Hoboken: Wiley, 2019.