# Calejo Control Adapter - Safety Framework ## Overview The Calejo Control Adapter implements a comprehensive multi-layer safety framework designed to prevent equipment damage, operational hazards, and ensure reliable pump station operation under all conditions, including system failures, communication loss, and cyber attacks. **Safety Philosophy**: "Safety First" - All setpoints must pass through safety enforcement before reaching SCADA systems. ## Multi-Layer Safety Architecture ### Three-Layer Safety Model ``` ┌─────────────────────────────────────────────────────────┐ │ Layer 3: Optimization Constraints (Calejo Optimize) │ │ - Economic optimization bounds: 25-45 Hz │ │ - Energy efficiency constraints │ │ - Production optimization limits │ └─────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────┐ │ Layer 2: Station Safety Limits (Control Adapter) │ │ - Database-enforced limits: 20-50 Hz │ │ - Rate of change limiting │ │ - Emergency stop integration │ │ - Failsafe mechanisms │ └─────────────────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────┐ │ Layer 1: Physical Hard Limits (PLC/VFD) │ │ - Hardware-enforced limits: 15-55 Hz │ │ - Physical safety mechanisms │ │ - Equipment protection │ └─────────────────────────────────────────────────────────┘ ``` ## Safety Components ### 1. Safety Limit Enforcer (`src/core/safety.py`) #### Purpose The Safety Limit Enforcer is the **LAST line of defense** before setpoints are exposed to SCADA systems. ALL setpoints MUST pass through this enforcer. #### Key Features - **Multi-Layer Limit Enforcement**: - Hard operational limits (speed, level, power, flow) - Rate of change limiting - Emergency stop integration - Failsafe mode activation - **Safety Limit Types**: ```python @dataclass class SafetyLimits: hard_min_speed_hz: float # Minimum speed limit (Hz) hard_max_speed_hz: float # Maximum speed limit (Hz) hard_min_level_m: Optional[float] # Minimum level limit (meters) hard_max_level_m: Optional[float] # Maximum level limit (meters) hard_max_power_kw: Optional[float] # Maximum power limit (kW) max_speed_change_hz_per_min: float # Rate of change limit ``` #### Enforcement Process ```python def enforce_setpoint(station_id: str, pump_id: str, setpoint: float) -> Tuple[float, List[str]]: """ Enforce safety limits on setpoint. Returns: Tuple of (enforced_setpoint, violations) - enforced_setpoint: Safe setpoint (clamped if necessary) - violations: List of safety violations (for logging/alerting) """ # 1. Check emergency stop first (highest priority) if emergency_stop_active: return (0.0, ["EMERGENCY_STOP_ACTIVE"]) # 2. Enforce hard speed limits if setpoint < hard_min_speed_hz: enforced_setpoint = hard_min_speed_hz violations.append("BELOW_MIN_SPEED") elif setpoint > hard_max_speed_hz: enforced_setpoint = hard_max_speed_hz violations.append("ABOVE_MAX_SPEED") # 3. Enforce rate of change limits rate_violation = check_rate_of_change(previous_setpoint, enforced_setpoint) if rate_violation: enforced_setpoint = limit_rate_of_change(previous_setpoint, enforced_setpoint) violations.append("RATE_OF_CHANGE_VIOLATION") # 4. Return safe setpoint return (enforced_setpoint, violations) ``` ### 2. Emergency Stop Manager (`src/core/emergency_stop.py`) #### Purpose Provides manual override capability for emergency situations with highest priority override of all other controls. #### Emergency Stop Levels 1. **Station-Level Emergency Stop**: - Stops all pumps in a station - Activated by station operators - Requires manual reset 2. **Pump-Level Emergency Stop**: - Stops individual pumps - Activated for specific equipment issues - Individual reset capability #### Emergency Stop Features - **Immediate Action**: Setpoints forced to 0 Hz immediately - **Audit Logging**: All emergency operations logged - **Manual Reset**: Requires explicit operator action to clear - **Status Monitoring**: Real-time emergency stop status - **Integration**: Seamless integration with safety framework #### Emergency Stop API ```python class EmergencyStopManager: def activate_emergency_stop(self, station_id: str, pump_id: Optional[str] = None): """Activate emergency stop for station or specific pump.""" def clear_emergency_stop(self, station_id: str, pump_id: Optional[str] = None): """Clear emergency stop condition.""" def is_emergency_stop_active(self, station_id: str, pump_id: Optional[str] = None) -> bool: """Check if emergency stop is active.""" ``` ### 3. Database Watchdog (`src/monitoring/watchdog.py`) #### Purpose Ensures database connectivity and activates failsafe mode if updates stop, preventing stale or unsafe setpoints. #### Watchdog Features - **Periodic Health Checks**: Continuous database connectivity monitoring - **Failsafe Activation**: Automatic activation on connectivity loss - **Graceful Degradation**: Safe fallback to default setpoints - **Alert Generation**: Immediate notification on watchdog activation - **Auto-Recovery**: Automatic recovery when connectivity restored #### Watchdog Configuration ```python class DatabaseWatchdog: def __init__(self, db_client, alert_manager, timeout_seconds: int): """ Args: timeout_seconds: Time without updates before failsafe activation """ ``` ### 4. Rate of Change Limiting #### Purpose Prevents sudden speed changes that could damage pumps or cause operational issues. #### Implementation ```python def check_rate_of_change(self, previous_setpoint: float, new_setpoint: float) -> bool: """Check if rate of change exceeds limits.""" change_per_minute = abs(new_setpoint - previous_setpoint) * 60 return change_per_minute > self.max_speed_change_hz_per_min def limit_rate_of_change(self, previous_setpoint: float, new_setpoint: float) -> float: """Limit setpoint change to safe rate.""" max_change = self.max_speed_change_hz_per_min / 60 # Convert to per-second if new_setpoint > previous_setpoint: return min(new_setpoint, previous_setpoint + max_change) else: return max(new_setpoint, previous_setpoint - max_change) ``` ## Safety Configuration ### Database Schema for Safety Limits ```sql -- Safety limits table CREATE TABLE safety_limits ( station_id VARCHAR(50) NOT NULL, pump_id VARCHAR(50) NOT NULL, hard_min_speed_hz DECIMAL(5,2) NOT NULL, hard_max_speed_hz DECIMAL(5,2) NOT NULL, hard_min_level_m DECIMAL(6,2), hard_max_level_m DECIMAL(6,2), hard_max_power_kw DECIMAL(8,2), max_speed_change_hz_per_min DECIMAL(5,2) NOT NULL, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (station_id, pump_id) ); -- Emergency stop status table CREATE TABLE emergency_stop_status ( station_id VARCHAR(50) NOT NULL, pump_id VARCHAR(50), active BOOLEAN NOT NULL DEFAULT FALSE, activated_at TIMESTAMP, activated_by VARCHAR(100), reason TEXT, PRIMARY KEY (station_id, COALESCE(pump_id, 'STATION')) ); ``` ### Configuration Parameters #### Safety Limits Configuration ```yaml safety_limits: default_hard_min_speed_hz: 20.0 default_hard_max_speed_hz: 50.0 default_max_speed_change_hz_per_min: 30.0 # Per-station overrides station_overrides: station_001: hard_min_speed_hz: 25.0 hard_max_speed_hz: 48.0 station_002: hard_min_speed_hz: 22.0 hard_max_speed_hz: 52.0 ``` #### Watchdog Configuration ```yaml watchdog: timeout_seconds: 1200 # 20 minutes check_interval_seconds: 60 failsafe_setpoints: default_speed_hz: 30.0 station_overrides: station_001: 35.0 station_002: 28.0 ``` ## Safety Procedures ### Emergency Stop Procedures #### Activation Procedure 1. **Operator Action**: - Access emergency stop control via REST API or dashboard - Select station and/or specific pump - Provide reason for emergency stop - Confirm activation 2. **System Response**: - Immediate setpoint override to 0 Hz - Audit log entry with timestamp and operator - Alert notification to configured channels - Safety status update in all protocol servers #### Clearance Procedure 1. **Operator Action**: - Access emergency stop control - Verify safe conditions for restart - Clear emergency stop condition - Confirm clearance 2. **System Response**: - Resume normal setpoint calculation - Audit log entry for clearance - Alert notification of system restoration - Safety status update ### Failsafe Mode Activation #### Automatic Activation Conditions 1. **Database Connectivity Loss**: - Watchdog timeout exceeded - No successful database updates - Automatic failsafe activation 2. **Safety Framework Failure**: - Safety limit enforcer unresponsive - Emergency stop manager failure - Component health check failures #### Failsafe Behavior - **Default Setpoints**: Pre-configured safe setpoints - **Limited Functionality**: Basic operational mode - **Alert Generation**: Immediate notification of failsafe activation - **Auto-Recovery**: Automatic return to normal operation when safe ## Safety Testing & Validation ### Unit Testing ```python class TestSafetyFramework: def test_emergency_stop_override(self): """Test that emergency stop overrides all other controls.""" def test_speed_limit_enforcement(self): """Test that speed limits are properly enforced.""" def test_rate_of_change_limiting(self): """Test that rate of change limits are enforced.""" def test_failsafe_activation(self): """Test failsafe mode activation on watchdog timeout.""" ``` ### Integration Testing ```python class TestSafetyIntegration: def test_end_to_end_safety_workflow(self): """Test complete safety workflow from optimization to SCADA.""" def test_emergency_stop_integration(self): """Test emergency stop integration with all components.""" def test_watchdog_integration(self): """Test watchdog integration with alert system.""" ``` ### Validation Procedures #### Safety Validation Checklist - [ ] All setpoints pass through safety enforcer - [ ] Emergency stop overrides all controls - [ ] Rate of change limits are enforced - [ ] Failsafe mode activates on connectivity loss - [ ] Audit logging captures all safety events - [ ] Alert system notifies on safety violations #### Performance Validation - **Response Time**: Safety enforcement < 10ms per setpoint - **Emergency Stop**: Immediate activation (< 100ms) - **Watchdog**: Timely detection of connectivity issues - **Recovery**: Graceful recovery from failure conditions ## Safety Compliance & Certification ### Regulatory Compliance #### IEC 61508 / IEC 61511 - **Safety Integrity Level (SIL)**: Designed for SIL 2 requirements - **Fault Tolerance**: Redundant safety mechanisms - **Failure Analysis**: Comprehensive failure mode analysis - **Safety Validation**: Rigorous testing and validation #### Industry Standards - **Water/Wastewater**: Compliance with industry safety standards - **Municipal Operations**: Alignment with municipal safety requirements - **Equipment Protection**: Protection of pump and motor equipment ### Safety Certification Process #### Documentation Requirements - Safety Requirements Specification (SRS) - Safety Manual - Validation Test Reports - Safety Case Documentation #### Testing & Validation - Safety Function Testing - Failure Mode Testing - Integration Testing - Operational Testing ## Safety Monitoring & Reporting ### Real-Time Safety Monitoring #### Safety Status Dashboard - Current safety limits for each pump - Emergency stop status - Rate of change monitoring - Watchdog status - Safety violation history #### Safety Metrics - Safety enforcement statistics - Emergency stop activations - Rate of change violations - Failsafe mode activations - Response time metrics ### Safety Reporting #### Daily Safety Reports - Safety violations summary - Emergency stop events - System health status - Compliance metrics #### Compliance Reports - Safety framework performance - Regulatory compliance status - Certification maintenance - Audit trail verification ## Incident Response & Recovery ### Safety Incident Response #### Incident Classification - **Critical**: Equipment damage risk or safety hazard - **Major**: Operational impact or safety violation - **Minor**: Safety system warnings or alerts #### Response Procedures 1. **Immediate Action**: Activate emergency stop if required 2. **Investigation**: Analyze safety violation details 3. **Correction**: Implement corrective actions 4. **Documentation**: Complete incident report 5. **Prevention**: Update safety procedures if needed ### System Recovery #### Recovery Procedures - Verify safety system integrity - Clear emergency stop conditions - Resume normal operations - Monitor system performance - Validate safety enforcement --- *This safety framework documentation provides comprehensive guidance on the safety mechanisms, procedures, and compliance requirements for the Calejo Control Adapter. All safety-critical operations must follow these documented procedures.*