# Calejo Control Adapter - Implementation Plan

## Overview

This document outlines the comprehensive step-by-step implementation plan for the Calejo Control Adapter v2.0 with Safety & Security Framework. The plan is organized into 7 phases with detailed tasks, testing strategies, and acceptance criteria.

## Project Timeline & Phases

### Phase 1: Core Infrastructure & Database Setup (Week 1-2)

**Objective**: Establish the foundation with database schema, core infrastructure, and basic components.

#### TASK-1.1: Set up PostgreSQL database with complete schema
- **Description**: Create all database tables as specified in the specification
- **Database Tables**:
  - `pump_stations` - Station metadata
  - `pumps` - Pump configuration and control parameters
  - `pump_plans` - Optimization plans from Calejo Optimize
  - `pump_feedback` - Real-time feedback from pumps
  - `pump_safety_limits` - Hard operational limits
  - `safety_limit_violations` - Audit trail of limit violations
  - `failsafe_events` - Failsafe mode activations
  - `emergency_stop_events` - Emergency stop events
  - `audit_log` - Immutable compliance audit trail
- **Acceptance Criteria**:
  - All tables created with correct constraints and indexes
  - Read-only user `control_reader` with appropriate permissions
  - Test data inserted for validation
  - Database connection successful from application

#### TASK-1.2: Implement database client with connection pooling
- **Description**: Enhance database client with async support and robust error handling
- **Features**:
  - Connection pooling for performance
  - Async/await support for non-blocking operations
  - Comprehensive error handling and retry logic
  - Query timeout management
  - Connection health monitoring
- **Acceptance Criteria**:
  - Database operations complete within 100ms
  - Connection failures handled gracefully
  - Connection pool recovers automatically
  - All queries execute without blocking

#### TASK-1.3: Complete auto-discovery module
- **Description**: Implement full auto-discovery of stations and pumps from database
- **Features**:
  - Automatic discovery on startup
  - Periodic refresh of discovered assets
  - Filtering by station and active status
  - Integration with configuration
- **Acceptance Criteria**:
  - All active stations and pumps discovered on startup
  - Discovery completes within 30 seconds
  - Configuration changes trigger rediscovery
  - Invalid stations/pumps handled gracefully

#### TASK-1.4: Implement configuration management
- **Description**: Complete settings.py with comprehensive environment variable support
- **Configuration Areas**:
  - Database connection parameters
  - Protocol endpoints and ports
  - Safety timeout settings
  - Security settings (JWT, TLS)
  - Alert configuration (email, SMS, webhook)
  - Logging configuration
- **Acceptance Criteria**:
  - All settings loaded from environment variables
  - Type validation for all configuration values
  - Sensitive values properly secured
  - Configuration errors provide clear messages

#### TASK-1.5: Set up structured logging and audit system
- **Description**: Implement structlog with JSON formatting and audit trail
- **Features**:
  - Structured logging in JSON format
  - Correlation IDs for request tracing
  - Audit trail for compliance requirements
  - Log levels configurable at runtime
  - Log rotation and retention policies
- **Acceptance Criteria**:
  - All log entries include correlation IDs
  - Audit events logged to database
  - Logs searchable and filterable
  - Performance impact < 5% on operations

### Phase 2: Safety Framework Implementation (Week 3-4)

**Objective**: Implement comprehensive safety mechanisms to prevent equipment damage and operational hazards.

#### TASK-2.1: Complete SafetyLimitEnforcer with all limit types
- **Description**: Implement multi-layer safety limits enforcement
- **Limit Types**:
  - Speed limits (hard min/max)
  - Level limits (min/max, emergency stop, dry run protection)
  - Power and flow limits
  - Rate of change limits
  - Operational limits (starts per hour, run times)
- **Acceptance Criteria**:
  - All setpoints pass through safety enforcer
  - Violations logged and reported
  - Rate of change limits prevent sudden changes
  - Emergency stop levels trigger immediate action

#### TASK-2.2: Implement DatabaseWatchdog with failsafe mode
- **Description**: Monitor database updates and trigger failsafe when updates stop
- **Features**:
  - 20-minute timeout detection
  - Automatic revert to default setpoints
  - Alert generation on failsafe activation
  - Automatic recovery when updates resume
- **Acceptance Criteria**:
  - Failsafe triggered within 20 minutes of no updates
  - Default setpoints applied correctly
  - Alerts sent to operators
  - System recovers automatically when updates resume

#### TASK-2.3: Implement EmergencyStopManager with big red button
- **Description**: System-wide and targeted emergency stop functionality
- **Features**:
  - Single pump emergency stop
  - Station-wide emergency stop
  - System-wide emergency stop
  - Manual clearance with audit trail
  - Integration with all protocol interfaces
- **Acceptance Criteria**:
  - Emergency stop triggers within 1 second
  - All affected pumps set to default setpoints
  - Clear audit trail of stop/clear events
  - REST API endpoints functional

#### TASK-2.4: Implement AlertManager with multi-channel alerts
- **Description**: Email, SMS, webhook, and SCADA alarm integration
- **Alert Channels**:
  - Email alerts with configurable recipients
  - SMS alerts for critical events
  - Webhook integration for external systems
  - SCADA HMI alarm integration via OPC UA
- **Acceptance Criteria**:
  - Alerts delivered within 30 seconds
  - Multiple delivery attempts for failed alerts
  - Alert content includes all relevant context
  - Alert history maintained

#### TASK-2.5: Create comprehensive safety tests
- **Description**: Test all safety scenarios including edge cases and failure modes
- **Test Scenarios**:
  - Normal operation within limits
  - Safety limit violations
  - Failsafe mode activation and recovery
  - Emergency stop functionality
  - Alert delivery verification
- **Acceptance Criteria**:
  - 100% test coverage for safety components
  - All failure modes tested and handled
  - Performance under load validated
  - Integration with other components verified

### Phase 3: Plan-to-Setpoint Logic Engine (Week 5-6)

**Objective**: Implement control logic for different pump types with safety integration.

#### TASK-3.1: Implement SetpointManager with safety integration
- **Description**: Coordinate safety checks and setpoint calculation
- **Integration Points**:
  - Emergency stop status checking
  - Failsafe mode detection
  - Safety limit enforcement
  - Control type-specific calculation
- **Acceptance Criteria**:
  - Safety checks performed before setpoint calculation
  - Emergency stop overrides all other logic
  - Failsafe mode uses default setpoints
  - Performance: setpoint calculation < 10ms

#### TASK-3.2: Create control calculators for different pump types
- **Description**: Implement calculators for DIRECT_SPEED, LEVEL_CONTROLLED, POWER_CONTROLLED
- **Calculator Types**:
  - DirectSpeedCalculator: Direct speed control
  - LevelControlledCalculator: Level-based control with PID
  - PowerControlledCalculator: Power-based optimization
- **Acceptance Criteria**:
  - Each calculator produces valid setpoints
  - Control parameters configurable per pump
  - Feedback integration for adaptive control
  - Smooth transitions between setpoints

#### TASK-3.3: Implement feedback integration
- **Description**: Use real-time feedback for adaptive control
- **Feedback Sources**:
  - Actual speed measurements
  - Power consumption
  - Flow rates
  - Wet well levels
  - Pump running status
- **Acceptance Criteria**:
  - Feedback used to validate setpoint effectiveness
  - Adaptive control based on actual performance
  - Feedback delays handled appropriately
  - Invalid feedback data rejected

#### TASK-3.4: Create plan-to-setpoint integration tests
- **Description**: Test all control scenarios with safety integration
- **Test Scenarios**:
  - Normal optimization plan execution
  - Control type-specific calculations
  - Safety limit integration
  - Emergency stop override
  - Failsafe mode operation
- **Acceptance Criteria**:
  - All control scenarios tested
  - Safety integration verified
  - Performance requirements met
  - Edge cases handled correctly

### Phase 4: Multi-Protocol Server Implementation (Week 7-8)

**Objective**: Implement OPC UA, Modbus TCP, and REST API servers with security.

#### TASK-4.1: Implement OPC UA Server with asyncua
- **Description**: Create OPC UA server with pump data nodes and alarms
- **OPC UA Features**:
  - Pump setpoint nodes (read/write)
  - Status and feedback nodes (read-only)
  - Alarm and event notifications
  - Security with certificates
  - Historical data access
- **Acceptance Criteria**:
  - OPC UA clients can connect and read data
  - Setpoint changes processed through safety layer
  - Alarms generated for safety events
  - Performance: < 100ms response time

#### TASK-4.2: Implement Modbus TCP Server with pymodbus
- **Description**: Create Modbus server with holding registers for setpoints
- **Modbus Features**:
  - Holding registers for setpoints
  - Input registers for status and feedback
  - Coils for control commands
  - Multiple slave support
  - Error handling and validation
- **Acceptance Criteria**:
  - Modbus clients can read/write setpoints
  - Data mapping correct and consistent
  - Error responses for invalid requests
  - Performance: < 50ms response time

#### TASK-4.3: Implement REST API with FastAPI
- **Description**: Create REST endpoints for monitoring and emergency stop
- **API Endpoints**:
  - Emergency stop management
  - Safety status and violations
  - Pump and station information
  - System health and metrics
  - Configuration management
- **Acceptance Criteria**:
  - All endpoints functional and documented
  - Authentication and authorization working
  - OpenAPI documentation generated
  - Performance: < 200ms response time

#### TASK-4.4: Implement security layer for all protocols
- **Description**: Authentication, authorization, and encryption for all interfaces
- **Security Features**:
  - JWT token authentication for REST API
  - Certificate-based authentication for OPC UA
  - IP-based access control for Modbus
  - Role-based authorization
  - TLS/SSL encryption
- **Acceptance Criteria**:
  - Unauthorized access blocked
  - Authentication required for sensitive operations
  - Encryption active for all external communications
  - Security events logged to audit trail

#### TASK-4.5: Create protocol integration tests
- **Description**: Test all protocol interfaces with simulated SCADA clients
- **Test Scenarios**:
  - OPC UA client connectivity and data access
  - Modbus TCP register mapping and updates
  - REST API endpoint functionality
  - Security and authentication testing
  - Performance under concurrent connections
- **Acceptance Criteria**:
  - All protocols functional with real clients
  - Security controls effective
  - Performance requirements met under load
  - Error conditions handled gracefully

### Phase 5: Security & Compliance Implementation (Week 9)

**Objective**: Implement security features and compliance with IEC 62443, ISO 27001, NIS2.

#### TASK-5.1: Implement authentication and authorization
- **Description**: JWT tokens, role-based access control, and certificate auth
- **Security Controls**:
  - Multi-factor authentication support
  - Role-based access control (RBAC)
  - Certificate pinning for OPC UA
  - Session management and timeout
  - Password policy enforcement
- **Acceptance Criteria**:
  - All access properly authenticated
  - Authorization rules enforced
  - Session security maintained
  - Security events monitored and alerted

#### TASK-5.2: Implement audit logging for compliance
- **Description**: Immutable audit trail for IEC 62443, ISO 27001, NIS2
- **Audit Requirements**:
  - All security events logged
  - Configuration changes tracked
  - User actions recorded
  - System events captured
  - Immutable log storage
- **Acceptance Criteria**:
  - Audit trail complete and searchable
  - Logs protected from tampering
  - Compliance reports generatable
  - Retention policies enforced

#### TASK-5.3: Implement TLS/SSL encryption
- **Description**: Secure communications for all protocols
- **Encryption Implementation**:
  - TLS 1.3 for REST API
  - OPC UA Secure Conversation
  - Certificate management and rotation
  - Cipher suite configuration
  - Perfect forward secrecy
- **Acceptance Criteria**:
  - All external communications encrypted
  - Certificates properly validated
  - Encryption performance acceptable
  - Certificate expiration monitored

#### TASK-5.4: Create security compliance documentation
- **Description**: Document compliance with standards and security controls
- **Documentation Areas**:
  - Security architecture documentation
  - Compliance matrix for standards
  - Security control implementation details
  - Risk assessment documentation
  - Incident response procedures
- **Acceptance Criteria**:
  - Documentation complete and accurate
  - Compliance evidence documented
  - Security controls mapped to requirements
  - Documentation maintained and versioned

### Phase 6: Integration & System Testing (Week 10-11)

**Objective**: End-to-end testing and validation of the complete system.

#### TASK-6.1: Set up test database with realistic data
- **Description**: Create test data for multiple stations and pump scenarios
- **Test Data**:
  - Multiple pump stations with different configurations
  - Various pump types and control strategies
  - Historical optimization plans
  - Safety limit configurations
  - Realistic feedback data
- **Acceptance Criteria**:
  - Test data covers all scenarios
  - Data relationships maintained
  - Performance testing possible
  - Edge cases represented

#### TASK-6.2: Create end-to-end integration tests
- **Description**: Test full system workflow from optimization to SCADA
- **Test Workflows**:
  - Normal optimization control flow
  - Safety limit violation handling
  - Emergency stop activation and clearance
  - Failsafe mode operation
  - Protocol integration testing
- **Acceptance Criteria**:
  - All workflows function correctly
  - Data flows through entire system
  - Performance meets requirements
  - Error conditions handled appropriately

#### TASK-6.3: Implement performance and load testing
- **Description**: Test system under load with multiple pumps and protocols
- **Load Testing**:
  - Concurrent protocol connections
  - High-frequency setpoint updates
  - Multiple safety limit checks
  - Database query performance
  - Memory and CPU utilization
- **Acceptance Criteria**:
  - System handles expected load
  - Response times within requirements
  - Resource utilization acceptable
  - No memory leaks or performance degradation

#### TASK-6.4: Create failure mode and recovery tests
- **Description**: Test system behavior during failures and recovery
- **Failure Scenarios**:
  - Database connection loss
  - Network connectivity issues
  - Protocol server failures
  - Safety system failures
  - Resource exhaustion
- **Acceptance Criteria**:
  - System fails safely
  - Recovery automatic where possible
  - Alerts generated for failures
  - Data integrity maintained

#### TASK-6.5: Implement health monitoring and metrics
- **Description**: Prometheus metrics and health checks
- **Monitoring Areas**:
  - System health and availability
  - Performance metrics
  - Safety system status
  - Protocol connectivity
  - Resource utilization
- **Acceptance Criteria**:
  - All critical metrics monitored
  - Health checks functional
  - Alert thresholds configured
  - Dashboard available for visualization

### Phase 7: Deployment & Production Readiness (Week 12)

**Objective**: Prepare for production deployment with operational support.

#### TASK-7.1: Complete Docker containerization
- **Description**: Optimize Dockerfile and create docker-compose for production
- **Containerization**:
  - Multi-stage Docker build
  - Security scanning and vulnerability assessment
  - Resource limits and constraints
  - Health check implementation
  - Logging configuration
- **Acceptance Criteria**:
  - Container builds successfully
  - Security vulnerabilities addressed
  - Resource usage optimized
  - Logging functional in container

#### TASK-7.2: Create deployment documentation
- **Description**: Deployment guides, configuration examples, and troubleshooting
- **Documentation**:
  - Installation and setup guide
  - Configuration reference
  - Troubleshooting guide
  - Upgrade procedures
  - Backup and recovery procedures
- **Acceptance Criteria**:
  - Documentation complete and accurate
  - Step-by-step procedures validated
  - Common issues documented
  - Maintenance procedures clear

#### TASK-7.3: Implement monitoring and alerting
- **Description**: Grafana dashboards, alert rules, and operational monitoring
- **Monitoring Setup**:
  - Grafana dashboards for all metrics
  - Alert rules for critical conditions
  - Log aggregation and analysis
  - Performance trending
  - Capacity planning data
- **Acceptance Criteria**:
  - Dashboards provide operational visibility
  - Alerts generated for critical conditions
  - Logs searchable and analyzable
  - Performance baselines established

#### TASK-7.4: Create backup and recovery procedures
- **Description**: Database backup, configuration backup, and disaster recovery
- **Backup Strategy**:
  - Database backup procedures
  - Configuration backup
  - Certificate and key backup
  - Recovery procedures
  - Testing of backup restoration
- **Acceptance Criteria**:
  - Backup procedures documented and tested
  - Recovery time objectives met
  - Data integrity maintained
  - Backup success monitored

#### TASK-7.5: Final security review and hardening
- **Description**: Security audit, vulnerability assessment, and hardening
- **Security Activities**:
  - Penetration testing
  - Vulnerability scanning
  - Security configuration review
  - Access control validation
  - Security incident response testing
- **Acceptance Criteria**:
  - All security vulnerabilities addressed
  - Security controls validated
  - Incident response procedures tested
  - Production security posture established

## Testing Strategy

### Unit Testing
- **Coverage**: 90%+ code coverage for all components
- **Focus**: Individual component functionality
- **Tools**: pytest, pytest-asyncio, pytest-cov

### Integration Testing
- **Coverage**: All component interactions
- **Focus**: Data flow between components
- **Tools**: pytest with test database

### System Testing
- **Coverage**: End-to-end workflows
- **Focus**: Complete system functionality
- **Tools**: Docker Compose, test automation

### Performance Testing
- **Coverage**: Load and stress testing
- **Focus**: Response times and resource usage
- **Tools**: Locust, k6, custom load generators

### Security Testing
- **Coverage**: All security controls
- **Focus**: Vulnerability assessment
- **Tools**: OWASP ZAP, security scanners

## Risk Management

### Technical Risks
- Database performance under load
- Protocol compatibility with SCADA systems
- Safety system reliability
- Security vulnerabilities

### Mitigation Strategies
- Performance testing early and often
- Protocol testing with real SCADA systems
- Redundant safety mechanisms
- Regular security assessments

## Success Criteria

### Functional Requirements
- All safety mechanisms operational
- Multi-protocol support functional
- Real-time performance requirements met
- Compliance with standards achieved

### Non-Functional Requirements
- 99.9% system availability
- Sub-second response times
- Secure operation validated
- Comprehensive documentation

## Conclusion

This implementation plan provides a comprehensive roadmap for developing the Calejo Control Adapter v2.0 with Safety & Security Framework. The phased approach ensures systematic development with thorough testing at each stage, resulting in a robust, secure, and reliable system for municipal wastewater pump station control.