440 lines
14 KiB
Markdown
440 lines
14 KiB
Markdown
# Calejo Control Adapter - Safety Framework
|
|
|
|
## Overview
|
|
|
|
The Calejo Control Adapter implements a comprehensive multi-layer safety framework designed to prevent equipment damage, operational hazards, and ensure reliable pump station operation under all conditions, including system failures, communication loss, and cyber attacks.
|
|
|
|
**Safety Philosophy**: "Safety First" - All setpoints must pass through safety enforcement before reaching SCADA systems.
|
|
|
|
## Multi-Layer Safety Architecture
|
|
|
|
### Three-Layer Safety Model
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Layer 3: Optimization Constraints (Calejo Optimize) │
|
|
│ - Economic optimization bounds: 25-45 Hz │
|
|
│ - Energy efficiency constraints │
|
|
│ - Production optimization limits │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Layer 2: Station Safety Limits (Control Adapter) │
|
|
│ - Database-enforced limits: 20-50 Hz │
|
|
│ - Rate of change limiting │
|
|
│ - Emergency stop integration │
|
|
│ - Failsafe mechanisms │
|
|
└─────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Layer 1: Physical Hard Limits (PLC/VFD) │
|
|
│ - Hardware-enforced limits: 15-55 Hz │
|
|
│ - Physical safety mechanisms │
|
|
│ - Equipment protection │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Safety Components
|
|
|
|
### 1. Safety Limit Enforcer (`src/core/safety.py`)
|
|
|
|
#### Purpose
|
|
The Safety Limit Enforcer is the **LAST line of defense** before setpoints are exposed to SCADA systems. ALL setpoints MUST pass through this enforcer.
|
|
|
|
#### Key Features
|
|
|
|
- **Multi-Layer Limit Enforcement**:
|
|
- Hard operational limits (speed, level, power, flow)
|
|
- Rate of change limiting
|
|
- Emergency stop integration
|
|
- Failsafe mode activation
|
|
|
|
- **Safety Limit Types**:
|
|
```python
|
|
@dataclass
|
|
class SafetyLimits:
|
|
hard_min_speed_hz: float # Minimum speed limit (Hz)
|
|
hard_max_speed_hz: float # Maximum speed limit (Hz)
|
|
hard_min_level_m: Optional[float] # Minimum level limit (meters)
|
|
hard_max_level_m: Optional[float] # Maximum level limit (meters)
|
|
hard_max_power_kw: Optional[float] # Maximum power limit (kW)
|
|
max_speed_change_hz_per_min: float # Rate of change limit
|
|
```
|
|
|
|
#### Enforcement Process
|
|
|
|
```python
|
|
def enforce_setpoint(station_id: str, pump_id: str, setpoint: float) -> Tuple[float, List[str]]:
|
|
"""
|
|
Enforce safety limits on setpoint.
|
|
|
|
Returns:
|
|
Tuple of (enforced_setpoint, violations)
|
|
- enforced_setpoint: Safe setpoint (clamped if necessary)
|
|
- violations: List of safety violations (for logging/alerting)
|
|
"""
|
|
|
|
# 1. Check emergency stop first (highest priority)
|
|
if emergency_stop_active:
|
|
return (0.0, ["EMERGENCY_STOP_ACTIVE"])
|
|
|
|
# 2. Enforce hard speed limits
|
|
if setpoint < hard_min_speed_hz:
|
|
enforced_setpoint = hard_min_speed_hz
|
|
violations.append("BELOW_MIN_SPEED")
|
|
elif setpoint > hard_max_speed_hz:
|
|
enforced_setpoint = hard_max_speed_hz
|
|
violations.append("ABOVE_MAX_SPEED")
|
|
|
|
# 3. Enforce rate of change limits
|
|
rate_violation = check_rate_of_change(previous_setpoint, enforced_setpoint)
|
|
if rate_violation:
|
|
enforced_setpoint = limit_rate_of_change(previous_setpoint, enforced_setpoint)
|
|
violations.append("RATE_OF_CHANGE_VIOLATION")
|
|
|
|
# 4. Return safe setpoint
|
|
return (enforced_setpoint, violations)
|
|
```
|
|
|
|
### 2. Emergency Stop Manager (`src/core/emergency_stop.py`)
|
|
|
|
#### Purpose
|
|
Provides manual override capability for emergency situations with highest priority override of all other controls.
|
|
|
|
#### Emergency Stop Levels
|
|
|
|
1. **Station-Level Emergency Stop**:
|
|
- Stops all pumps in a station
|
|
- Activated by station operators
|
|
- Requires manual reset
|
|
|
|
2. **Pump-Level Emergency Stop**:
|
|
- Stops individual pumps
|
|
- Activated for specific equipment issues
|
|
- Individual reset capability
|
|
|
|
#### Emergency Stop Features
|
|
|
|
- **Immediate Action**: Setpoints forced to 0 Hz immediately
|
|
- **Audit Logging**: All emergency operations logged
|
|
- **Manual Reset**: Requires explicit operator action to clear
|
|
- **Status Monitoring**: Real-time emergency stop status
|
|
- **Integration**: Seamless integration with safety framework
|
|
|
|
#### Emergency Stop API
|
|
|
|
```python
|
|
class EmergencyStopManager:
|
|
def activate_emergency_stop(self, station_id: str, pump_id: Optional[str] = None):
|
|
"""Activate emergency stop for station or specific pump."""
|
|
|
|
def clear_emergency_stop(self, station_id: str, pump_id: Optional[str] = None):
|
|
"""Clear emergency stop condition."""
|
|
|
|
def is_emergency_stop_active(self, station_id: str, pump_id: Optional[str] = None) -> bool:
|
|
"""Check if emergency stop is active."""
|
|
```
|
|
|
|
### 3. Database Watchdog (`src/monitoring/watchdog.py`)
|
|
|
|
#### Purpose
|
|
Ensures database connectivity and activates failsafe mode if updates stop, preventing stale or unsafe setpoints.
|
|
|
|
#### Watchdog Features
|
|
|
|
- **Periodic Health Checks**: Continuous database connectivity monitoring
|
|
- **Failsafe Activation**: Automatic activation on connectivity loss
|
|
- **Graceful Degradation**: Safe fallback to default setpoints
|
|
- **Alert Generation**: Immediate notification on watchdog activation
|
|
- **Auto-Recovery**: Automatic recovery when connectivity restored
|
|
|
|
#### Watchdog Configuration
|
|
|
|
```python
|
|
class DatabaseWatchdog:
|
|
def __init__(self, db_client, alert_manager, timeout_seconds: int):
|
|
"""
|
|
Args:
|
|
timeout_seconds: Time without updates before failsafe activation
|
|
"""
|
|
```
|
|
|
|
### 4. Rate of Change Limiting
|
|
|
|
#### Purpose
|
|
Prevents sudden speed changes that could damage pumps or cause operational issues.
|
|
|
|
#### Implementation
|
|
|
|
```python
|
|
def check_rate_of_change(self, previous_setpoint: float, new_setpoint: float) -> bool:
|
|
"""Check if rate of change exceeds limits."""
|
|
change_per_minute = abs(new_setpoint - previous_setpoint) * 60
|
|
return change_per_minute > self.max_speed_change_hz_per_min
|
|
|
|
def limit_rate_of_change(self, previous_setpoint: float, new_setpoint: float) -> float:
|
|
"""Limit setpoint change to safe rate."""
|
|
max_change = self.max_speed_change_hz_per_min / 60 # Convert to per-second
|
|
if new_setpoint > previous_setpoint:
|
|
return min(new_setpoint, previous_setpoint + max_change)
|
|
else:
|
|
return max(new_setpoint, previous_setpoint - max_change)
|
|
```
|
|
|
|
## Safety Configuration
|
|
|
|
### Database Schema for Safety Limits
|
|
|
|
```sql
|
|
-- Safety limits table
|
|
CREATE TABLE safety_limits (
|
|
station_id VARCHAR(50) NOT NULL,
|
|
pump_id VARCHAR(50) NOT NULL,
|
|
hard_min_speed_hz DECIMAL(5,2) NOT NULL,
|
|
hard_max_speed_hz DECIMAL(5,2) NOT NULL,
|
|
hard_min_level_m DECIMAL(6,2),
|
|
hard_max_level_m DECIMAL(6,2),
|
|
hard_max_power_kw DECIMAL(8,2),
|
|
max_speed_change_hz_per_min DECIMAL(5,2) NOT NULL,
|
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
PRIMARY KEY (station_id, pump_id)
|
|
);
|
|
|
|
-- Emergency stop status table
|
|
CREATE TABLE emergency_stop_status (
|
|
station_id VARCHAR(50) NOT NULL,
|
|
pump_id VARCHAR(50),
|
|
active BOOLEAN NOT NULL DEFAULT FALSE,
|
|
activated_at TIMESTAMP,
|
|
activated_by VARCHAR(100),
|
|
reason TEXT,
|
|
PRIMARY KEY (station_id, COALESCE(pump_id, 'STATION'))
|
|
);
|
|
```
|
|
|
|
### Configuration Parameters
|
|
|
|
#### Safety Limits Configuration
|
|
|
|
```yaml
|
|
safety_limits:
|
|
default_hard_min_speed_hz: 20.0
|
|
default_hard_max_speed_hz: 50.0
|
|
default_max_speed_change_hz_per_min: 30.0
|
|
|
|
# Per-station overrides
|
|
station_overrides:
|
|
station_001:
|
|
hard_min_speed_hz: 25.0
|
|
hard_max_speed_hz: 48.0
|
|
station_002:
|
|
hard_min_speed_hz: 22.0
|
|
hard_max_speed_hz: 52.0
|
|
```
|
|
|
|
#### Watchdog Configuration
|
|
|
|
```yaml
|
|
watchdog:
|
|
timeout_seconds: 1200 # 20 minutes
|
|
check_interval_seconds: 60
|
|
failsafe_setpoints:
|
|
default_speed_hz: 30.0
|
|
station_overrides:
|
|
station_001: 35.0
|
|
station_002: 28.0
|
|
```
|
|
|
|
## Safety Procedures
|
|
|
|
### Emergency Stop Procedures
|
|
|
|
#### Activation Procedure
|
|
|
|
1. **Operator Action**:
|
|
- Access emergency stop control via REST API or dashboard
|
|
- Select station and/or specific pump
|
|
- Provide reason for emergency stop
|
|
- Confirm activation
|
|
|
|
2. **System Response**:
|
|
- Immediate setpoint override to 0 Hz
|
|
- Audit log entry with timestamp and operator
|
|
- Alert notification to configured channels
|
|
- Safety status update in all protocol servers
|
|
|
|
#### Clearance Procedure
|
|
|
|
1. **Operator Action**:
|
|
- Access emergency stop control
|
|
- Verify safe conditions for restart
|
|
- Clear emergency stop condition
|
|
- Confirm clearance
|
|
|
|
2. **System Response**:
|
|
- Resume normal setpoint calculation
|
|
- Audit log entry for clearance
|
|
- Alert notification of system restoration
|
|
- Safety status update
|
|
|
|
### Failsafe Mode Activation
|
|
|
|
#### Automatic Activation Conditions
|
|
|
|
1. **Database Connectivity Loss**:
|
|
- Watchdog timeout exceeded
|
|
- No successful database updates
|
|
- Automatic failsafe activation
|
|
|
|
2. **Safety Framework Failure**:
|
|
- Safety limit enforcer unresponsive
|
|
- Emergency stop manager failure
|
|
- Component health check failures
|
|
|
|
#### Failsafe Behavior
|
|
|
|
- **Default Setpoints**: Pre-configured safe setpoints
|
|
- **Limited Functionality**: Basic operational mode
|
|
- **Alert Generation**: Immediate notification of failsafe activation
|
|
- **Auto-Recovery**: Automatic return to normal operation when safe
|
|
|
|
## Safety Testing & Validation
|
|
|
|
### Unit Testing
|
|
|
|
```python
|
|
class TestSafetyFramework:
|
|
def test_emergency_stop_override(self):
|
|
"""Test that emergency stop overrides all other controls."""
|
|
|
|
def test_speed_limit_enforcement(self):
|
|
"""Test that speed limits are properly enforced."""
|
|
|
|
def test_rate_of_change_limiting(self):
|
|
"""Test that rate of change limits are enforced."""
|
|
|
|
def test_failsafe_activation(self):
|
|
"""Test failsafe mode activation on watchdog timeout."""
|
|
```
|
|
|
|
### Integration Testing
|
|
|
|
```python
|
|
class TestSafetyIntegration:
|
|
def test_end_to_end_safety_workflow(self):
|
|
"""Test complete safety workflow from optimization to SCADA."""
|
|
|
|
def test_emergency_stop_integration(self):
|
|
"""Test emergency stop integration with all components."""
|
|
|
|
def test_watchdog_integration(self):
|
|
"""Test watchdog integration with alert system."""
|
|
```
|
|
|
|
### Validation Procedures
|
|
|
|
#### Safety Validation Checklist
|
|
|
|
- [ ] All setpoints pass through safety enforcer
|
|
- [ ] Emergency stop overrides all controls
|
|
- [ ] Rate of change limits are enforced
|
|
- [ ] Failsafe mode activates on connectivity loss
|
|
- [ ] Audit logging captures all safety events
|
|
- [ ] Alert system notifies on safety violations
|
|
|
|
#### Performance Validation
|
|
|
|
- **Response Time**: Safety enforcement < 10ms per setpoint
|
|
- **Emergency Stop**: Immediate activation (< 100ms)
|
|
- **Watchdog**: Timely detection of connectivity issues
|
|
- **Recovery**: Graceful recovery from failure conditions
|
|
|
|
## Safety Compliance & Certification
|
|
|
|
### Regulatory Compliance
|
|
|
|
#### IEC 61508 / IEC 61511
|
|
- **Safety Integrity Level (SIL)**: Designed for SIL 2 requirements
|
|
- **Fault Tolerance**: Redundant safety mechanisms
|
|
- **Failure Analysis**: Comprehensive failure mode analysis
|
|
- **Safety Validation**: Rigorous testing and validation
|
|
|
|
#### Industry Standards
|
|
- **Water/Wastewater**: Compliance with industry safety standards
|
|
- **Municipal Operations**: Alignment with municipal safety requirements
|
|
- **Equipment Protection**: Protection of pump and motor equipment
|
|
|
|
### Safety Certification Process
|
|
|
|
#### Documentation Requirements
|
|
- Safety Requirements Specification (SRS)
|
|
- Safety Manual
|
|
- Validation Test Reports
|
|
- Safety Case Documentation
|
|
|
|
#### Testing & Validation
|
|
- Safety Function Testing
|
|
- Failure Mode Testing
|
|
- Integration Testing
|
|
- Operational Testing
|
|
|
|
## Safety Monitoring & Reporting
|
|
|
|
### Real-Time Safety Monitoring
|
|
|
|
#### Safety Status Dashboard
|
|
- Current safety limits for each pump
|
|
- Emergency stop status
|
|
- Rate of change monitoring
|
|
- Watchdog status
|
|
- Safety violation history
|
|
|
|
#### Safety Metrics
|
|
- Safety enforcement statistics
|
|
- Emergency stop activations
|
|
- Rate of change violations
|
|
- Failsafe mode activations
|
|
- Response time metrics
|
|
|
|
### Safety Reporting
|
|
|
|
#### Daily Safety Reports
|
|
- Safety violations summary
|
|
- Emergency stop events
|
|
- System health status
|
|
- Compliance metrics
|
|
|
|
#### Compliance Reports
|
|
- Safety framework performance
|
|
- Regulatory compliance status
|
|
- Certification maintenance
|
|
- Audit trail verification
|
|
|
|
## Incident Response & Recovery
|
|
|
|
### Safety Incident Response
|
|
|
|
#### Incident Classification
|
|
- **Critical**: Equipment damage risk or safety hazard
|
|
- **Major**: Operational impact or safety violation
|
|
- **Minor**: Safety system warnings or alerts
|
|
|
|
#### Response Procedures
|
|
1. **Immediate Action**: Activate emergency stop if required
|
|
2. **Investigation**: Analyze safety violation details
|
|
3. **Correction**: Implement corrective actions
|
|
4. **Documentation**: Complete incident report
|
|
5. **Prevention**: Update safety procedures if needed
|
|
|
|
### System Recovery
|
|
|
|
#### Recovery Procedures
|
|
- Verify safety system integrity
|
|
- Clear emergency stop conditions
|
|
- Resume normal operations
|
|
- Monitor system performance
|
|
- Validate safety enforcement
|
|
|
|
---
|
|
|
|
*This safety framework documentation provides comprehensive guidance on the safety mechanisms, procedures, and compliance requirements for the Calejo Control Adapter. All safety-critical operations must follow these documented procedures.* |