# Calejo Control Adapter - Implementation Plan ## Overview This document outlines the comprehensive step-by-step implementation plan for the Calejo Control Adapter v2.0 with Safety & Security Framework. The plan is organized into 7 phases with detailed tasks, testing strategies, and acceptance criteria. ## Project Timeline & Phases ### Phase 1: Core Infrastructure & Database Setup (Week 1-2) **Objective**: Establish the foundation with database schema, core infrastructure, and basic components. #### TASK-1.1: Set up PostgreSQL database with complete schema - **Description**: Create all database tables as specified in the specification - **Database Tables**: - `pump_stations` - Station metadata - `pumps` - Pump configuration and control parameters - `pump_plans` - Optimization plans from Calejo Optimize - `pump_feedback` - Real-time feedback from pumps - `pump_safety_limits` - Hard operational limits - `safety_limit_violations` - Audit trail of limit violations - `failsafe_events` - Failsafe mode activations - `emergency_stop_events` - Emergency stop events - `audit_log` - Immutable compliance audit trail - **Acceptance Criteria**: - All tables created with correct constraints and indexes - Read-only user `control_reader` with appropriate permissions - Test data inserted for validation - Database connection successful from application #### TASK-1.2: Implement database client with connection pooling - **Description**: Enhance database client with async support and robust error handling - **Features**: - Connection pooling for performance - Async/await support for non-blocking operations - Comprehensive error handling and retry logic - Query timeout management - Connection health monitoring - **Acceptance Criteria**: - Database operations complete within 100ms - Connection failures handled gracefully - Connection pool recovers automatically - All queries execute without blocking #### TASK-1.3: Complete auto-discovery module - **Description**: Implement full auto-discovery of stations and pumps from database - **Features**: - Automatic discovery on startup - Periodic refresh of discovered assets - Filtering by station and active status - Integration with configuration - **Acceptance Criteria**: - All active stations and pumps discovered on startup - Discovery completes within 30 seconds - Configuration changes trigger rediscovery - Invalid stations/pumps handled gracefully #### TASK-1.4: Implement configuration management - **Description**: Complete settings.py with comprehensive environment variable support - **Configuration Areas**: - Database connection parameters - Protocol endpoints and ports - Safety timeout settings - Security settings (JWT, TLS) - Alert configuration (email, SMS, webhook) - Logging configuration - **Acceptance Criteria**: - All settings loaded from environment variables - Type validation for all configuration values - Sensitive values properly secured - Configuration errors provide clear messages #### TASK-1.5: Set up structured logging and audit system - **Description**: Implement structlog with JSON formatting and audit trail - **Features**: - Structured logging in JSON format - Correlation IDs for request tracing - Audit trail for compliance requirements - Log levels configurable at runtime - Log rotation and retention policies - **Acceptance Criteria**: - All log entries include correlation IDs - Audit events logged to database - Logs searchable and filterable - Performance impact < 5% on operations ### Phase 2: Safety Framework Implementation (Week 3-4) **Objective**: Implement comprehensive safety mechanisms to prevent equipment damage and operational hazards. #### TASK-2.1: Complete SafetyLimitEnforcer with all limit types - **Description**: Implement multi-layer safety limits enforcement - **Limit Types**: - Speed limits (hard min/max) - Level limits (min/max, emergency stop, dry run protection) - Power and flow limits - Rate of change limits - Operational limits (starts per hour, run times) - **Acceptance Criteria**: - All setpoints pass through safety enforcer - Violations logged and reported - Rate of change limits prevent sudden changes - Emergency stop levels trigger immediate action #### TASK-2.2: Implement DatabaseWatchdog with failsafe mode - **Description**: Monitor database updates and trigger failsafe when updates stop - **Features**: - 20-minute timeout detection - Automatic revert to default setpoints - Alert generation on failsafe activation - Automatic recovery when updates resume - **Acceptance Criteria**: - Failsafe triggered within 20 minutes of no updates - Default setpoints applied correctly - Alerts sent to operators - System recovers automatically when updates resume #### TASK-2.3: Implement EmergencyStopManager with big red button - **Description**: System-wide and targeted emergency stop functionality - **Features**: - Single pump emergency stop - Station-wide emergency stop - System-wide emergency stop - Manual clearance with audit trail - Integration with all protocol interfaces - **Acceptance Criteria**: - Emergency stop triggers within 1 second - All affected pumps set to default setpoints - Clear audit trail of stop/clear events - REST API endpoints functional #### TASK-2.4: Implement AlertManager with multi-channel alerts - **Description**: Email, SMS, webhook, and SCADA alarm integration - **Alert Channels**: - Email alerts with configurable recipients - SMS alerts for critical events - Webhook integration for external systems - SCADA HMI alarm integration via OPC UA - **Acceptance Criteria**: - Alerts delivered within 30 seconds - Multiple delivery attempts for failed alerts - Alert content includes all relevant context - Alert history maintained #### TASK-2.5: Create comprehensive safety tests - **Description**: Test all safety scenarios including edge cases and failure modes - **Test Scenarios**: - Normal operation within limits - Safety limit violations - Failsafe mode activation and recovery - Emergency stop functionality - Alert delivery verification - **Acceptance Criteria**: - 100% test coverage for safety components - All failure modes tested and handled - Performance under load validated - Integration with other components verified ### Phase 3: Plan-to-Setpoint Logic Engine (Week 5-6) **Objective**: Implement control logic for different pump types with safety integration. #### TASK-3.1: Implement SetpointManager with safety integration - **Description**: Coordinate safety checks and setpoint calculation - **Integration Points**: - Emergency stop status checking - Failsafe mode detection - Safety limit enforcement - Control type-specific calculation - **Acceptance Criteria**: - Safety checks performed before setpoint calculation - Emergency stop overrides all other logic - Failsafe mode uses default setpoints - Performance: setpoint calculation < 10ms #### TASK-3.2: Create control calculators for different pump types - **Description**: Implement calculators for DIRECT_SPEED, LEVEL_CONTROLLED, POWER_CONTROLLED - **Calculator Types**: - DirectSpeedCalculator: Direct speed control - LevelControlledCalculator: Level-based control with PID - PowerControlledCalculator: Power-based optimization - **Acceptance Criteria**: - Each calculator produces valid setpoints - Control parameters configurable per pump - Feedback integration for adaptive control - Smooth transitions between setpoints #### TASK-3.3: Implement feedback integration - **Description**: Use real-time feedback for adaptive control - **Feedback Sources**: - Actual speed measurements - Power consumption - Flow rates - Wet well levels - Pump running status - **Acceptance Criteria**: - Feedback used to validate setpoint effectiveness - Adaptive control based on actual performance - Feedback delays handled appropriately - Invalid feedback data rejected #### TASK-3.4: Create plan-to-setpoint integration tests - **Description**: Test all control scenarios with safety integration - **Test Scenarios**: - Normal optimization plan execution - Control type-specific calculations - Safety limit integration - Emergency stop override - Failsafe mode operation - **Acceptance Criteria**: - All control scenarios tested - Safety integration verified - Performance requirements met - Edge cases handled correctly ### Phase 4: Multi-Protocol Server Implementation (Week 7-8) **Objective**: Implement OPC UA, Modbus TCP, and REST API servers with security. #### TASK-4.1: Implement OPC UA Server with asyncua - **Description**: Create OPC UA server with pump data nodes and alarms - **OPC UA Features**: - Pump setpoint nodes (read/write) - Status and feedback nodes (read-only) - Alarm and event notifications - Security with certificates - Historical data access - **Acceptance Criteria**: - OPC UA clients can connect and read data - Setpoint changes processed through safety layer - Alarms generated for safety events - Performance: < 100ms response time #### TASK-4.2: Implement Modbus TCP Server with pymodbus - **Description**: Create Modbus server with holding registers for setpoints - **Modbus Features**: - Holding registers for setpoints - Input registers for status and feedback - Coils for control commands - Multiple slave support - Error handling and validation - **Acceptance Criteria**: - Modbus clients can read/write setpoints - Data mapping correct and consistent - Error responses for invalid requests - Performance: < 50ms response time #### TASK-4.3: Implement REST API with FastAPI - **Description**: Create REST endpoints for monitoring and emergency stop - **API Endpoints**: - Emergency stop management - Safety status and violations - Pump and station information - System health and metrics - Configuration management - **Acceptance Criteria**: - All endpoints functional and documented - Authentication and authorization working - OpenAPI documentation generated - Performance: < 200ms response time #### TASK-4.4: Implement security layer for all protocols - **Description**: Authentication, authorization, and encryption for all interfaces - **Security Features**: - JWT token authentication for REST API - Certificate-based authentication for OPC UA - IP-based access control for Modbus - Role-based authorization - TLS/SSL encryption - **Acceptance Criteria**: - Unauthorized access blocked - Authentication required for sensitive operations - Encryption active for all external communications - Security events logged to audit trail #### TASK-4.5: Create protocol integration tests - **Description**: Test all protocol interfaces with simulated SCADA clients - **Test Scenarios**: - OPC UA client connectivity and data access - Modbus TCP register mapping and updates - REST API endpoint functionality - Security and authentication testing - Performance under concurrent connections - **Acceptance Criteria**: - All protocols functional with real clients - Security controls effective - Performance requirements met under load - Error conditions handled gracefully ### Phase 5: Security & Compliance Implementation (Week 9) **Objective**: Implement security features and compliance with IEC 62443, ISO 27001, NIS2. #### TASK-5.1: Implement authentication and authorization - **Description**: JWT tokens, role-based access control, and certificate auth - **Security Controls**: - Multi-factor authentication support - Role-based access control (RBAC) - Certificate pinning for OPC UA - Session management and timeout - Password policy enforcement - **Acceptance Criteria**: - All access properly authenticated - Authorization rules enforced - Session security maintained - Security events monitored and alerted #### TASK-5.2: Implement audit logging for compliance - **Description**: Immutable audit trail for IEC 62443, ISO 27001, NIS2 - **Audit Requirements**: - All security events logged - Configuration changes tracked - User actions recorded - System events captured - Immutable log storage - **Acceptance Criteria**: - Audit trail complete and searchable - Logs protected from tampering - Compliance reports generatable - Retention policies enforced #### TASK-5.3: Implement TLS/SSL encryption - **Description**: Secure communications for all protocols - **Encryption Implementation**: - TLS 1.3 for REST API - OPC UA Secure Conversation - Certificate management and rotation - Cipher suite configuration - Perfect forward secrecy - **Acceptance Criteria**: - All external communications encrypted - Certificates properly validated - Encryption performance acceptable - Certificate expiration monitored #### TASK-5.4: Create security compliance documentation - **Description**: Document compliance with standards and security controls - **Documentation Areas**: - Security architecture documentation - Compliance matrix for standards - Security control implementation details - Risk assessment documentation - Incident response procedures - **Acceptance Criteria**: - Documentation complete and accurate - Compliance evidence documented - Security controls mapped to requirements - Documentation maintained and versioned ### Phase 6: Integration & System Testing (Week 10-11) **Objective**: End-to-end testing and validation of the complete system. #### TASK-6.1: Set up test database with realistic data - **Description**: Create test data for multiple stations and pump scenarios - **Test Data**: - Multiple pump stations with different configurations - Various pump types and control strategies - Historical optimization plans - Safety limit configurations - Realistic feedback data - **Acceptance Criteria**: - Test data covers all scenarios - Data relationships maintained - Performance testing possible - Edge cases represented #### TASK-6.2: Create end-to-end integration tests - **Description**: Test full system workflow from optimization to SCADA - **Test Workflows**: - Normal optimization control flow - Safety limit violation handling - Emergency stop activation and clearance - Failsafe mode operation - Protocol integration testing - **Acceptance Criteria**: - All workflows function correctly - Data flows through entire system - Performance meets requirements - Error conditions handled appropriately #### TASK-6.3: Implement performance and load testing - **Description**: Test system under load with multiple pumps and protocols - **Load Testing**: - Concurrent protocol connections - High-frequency setpoint updates - Multiple safety limit checks - Database query performance - Memory and CPU utilization - **Acceptance Criteria**: - System handles expected load - Response times within requirements - Resource utilization acceptable - No memory leaks or performance degradation #### TASK-6.4: Create failure mode and recovery tests - **Description**: Test system behavior during failures and recovery - **Failure Scenarios**: - Database connection loss - Network connectivity issues - Protocol server failures - Safety system failures - Resource exhaustion - **Acceptance Criteria**: - System fails safely - Recovery automatic where possible - Alerts generated for failures - Data integrity maintained #### TASK-6.5: Implement health monitoring and metrics - **Description**: Prometheus metrics and health checks - **Monitoring Areas**: - System health and availability - Performance metrics - Safety system status - Protocol connectivity - Resource utilization - **Acceptance Criteria**: - All critical metrics monitored - Health checks functional - Alert thresholds configured - Dashboard available for visualization ### Phase 7: Deployment & Production Readiness (Week 12) **Objective**: Prepare for production deployment with operational support. #### TASK-7.1: Complete Docker containerization - **Description**: Optimize Dockerfile and create docker-compose for production - **Containerization**: - Multi-stage Docker build - Security scanning and vulnerability assessment - Resource limits and constraints - Health check implementation - Logging configuration - **Acceptance Criteria**: - Container builds successfully - Security vulnerabilities addressed - Resource usage optimized - Logging functional in container #### TASK-7.2: Create deployment documentation - **Description**: Deployment guides, configuration examples, and troubleshooting - **Documentation**: - Installation and setup guide - Configuration reference - Troubleshooting guide - Upgrade procedures - Backup and recovery procedures - **Acceptance Criteria**: - Documentation complete and accurate - Step-by-step procedures validated - Common issues documented - Maintenance procedures clear #### TASK-7.3: Implement monitoring and alerting - **Description**: Grafana dashboards, alert rules, and operational monitoring - **Monitoring Setup**: - Grafana dashboards for all metrics - Alert rules for critical conditions - Log aggregation and analysis - Performance trending - Capacity planning data - **Acceptance Criteria**: - Dashboards provide operational visibility - Alerts generated for critical conditions - Logs searchable and analyzable - Performance baselines established #### TASK-7.4: Create backup and recovery procedures - **Description**: Database backup, configuration backup, and disaster recovery - **Backup Strategy**: - Database backup procedures - Configuration backup - Certificate and key backup - Recovery procedures - Testing of backup restoration - **Acceptance Criteria**: - Backup procedures documented and tested - Recovery time objectives met - Data integrity maintained - Backup success monitored #### TASK-7.5: Final security review and hardening - **Description**: Security audit, vulnerability assessment, and hardening - **Security Activities**: - Penetration testing - Vulnerability scanning - Security configuration review - Access control validation - Security incident response testing - **Acceptance Criteria**: - All security vulnerabilities addressed - Security controls validated - Incident response procedures tested - Production security posture established ## Testing Strategy ### Unit Testing - **Coverage**: 90%+ code coverage for all components - **Focus**: Individual component functionality - **Tools**: pytest, pytest-asyncio, pytest-cov ### Integration Testing - **Coverage**: All component interactions - **Focus**: Data flow between components - **Tools**: pytest with test database ### System Testing - **Coverage**: End-to-end workflows - **Focus**: Complete system functionality - **Tools**: Docker Compose, test automation ### Performance Testing - **Coverage**: Load and stress testing - **Focus**: Response times and resource usage - **Tools**: Locust, k6, custom load generators ### Security Testing - **Coverage**: All security controls - **Focus**: Vulnerability assessment - **Tools**: OWASP ZAP, security scanners ## Risk Management ### Technical Risks - Database performance under load - Protocol compatibility with SCADA systems - Safety system reliability - Security vulnerabilities ### Mitigation Strategies - Performance testing early and often - Protocol testing with real SCADA systems - Redundant safety mechanisms - Regular security assessments ## Success Criteria ### Functional Requirements - All safety mechanisms operational - Multi-protocol support functional - Real-time performance requirements met - Compliance with standards achieved ### Non-Functional Requirements - 99.9% system availability - Sub-second response times - Secure operation validated - Comprehensive documentation ## Conclusion This implementation plan provides a comprehensive roadmap for developing the Calejo Control Adapter v2.0 with Safety & Security Framework. The phased approach ensures systematic development with thorough testing at each stage, resulting in a robust, secure, and reliable system for municipal wastewater pump station control.