562 lines
21 KiB
Markdown
562 lines
21 KiB
Markdown
Can you make the test script output an automated result list per test file and/or system tested rathar than just a total number? Is this doable in idiomatic python?# Calejo Control Adapter - Implementation Plan
|
|
|
|
## Overview
|
|
|
|
This document outlines the comprehensive step-by-step implementation plan for the Calejo Control Adapter v2.0 with Safety & Security Framework. The plan is organized into 7 phases with detailed tasks, testing strategies, and acceptance criteria.
|
|
|
|
## Current Status Summary
|
|
|
|
| Phase | Status | Completion Date | Tests Passing |
|
|
|-------|--------|-----------------|---------------|
|
|
| Phase 1: Core Infrastructure | ✅ **COMPLETE** | 2025-10-26 | All tests passing |
|
|
| Phase 2: Multi-Protocol Servers | ✅ **COMPLETE** | 2025-10-26 | All tests passing |
|
|
| Phase 3: Setpoint Management | ✅ **COMPLETE** | 2025-10-26 | All tests passing |
|
|
| Phase 4: Security Layer | ✅ **COMPLETE** | 2025-10-27 | 56/56 security tests |
|
|
| Phase 5: Protocol Servers | ✅ **COMPLETE** | 2025-10-28 | 220/220 tests passing |
|
|
| Phase 6: Integration & Testing | ⏳ **PENDING** | - | - |
|
|
| Phase 7: Production Hardening | ⏳ **PENDING** | - | - |
|
|
|
|
**Overall Test Status:** 220/220 tests passing across all implemented components
|
|
|
|
## Project Timeline & Phases
|
|
|
|
### Phase 1: Core Infrastructure & Database Setup (Week 1-2)
|
|
|
|
**Objective**: Establish the foundation with database schema, core infrastructure, and basic components.
|
|
|
|
#### TASK-1.1: Set up PostgreSQL database with complete schema
|
|
- **Description**: Create all database tables as specified in the specification
|
|
- **Database Tables**:
|
|
- `pump_stations` - Station metadata
|
|
- `pumps` - Pump configuration and control parameters
|
|
- `pump_plans` - Optimization plans from Calejo Optimize
|
|
- `pump_feedback` - Real-time feedback from pumps
|
|
- `pump_safety_limits` - Hard operational limits
|
|
- `safety_limit_violations` - Audit trail of limit violations
|
|
- `failsafe_events` - Failsafe mode activations
|
|
- `emergency_stop_events` - Emergency stop events
|
|
- `audit_log` - Immutable compliance audit trail
|
|
- **Acceptance Criteria**:
|
|
- All tables created with correct constraints and indexes
|
|
- Read-only user `control_reader` with appropriate permissions
|
|
- Test data inserted for validation
|
|
- Database connection successful from application
|
|
|
|
#### TASK-1.2: Implement database client with connection pooling
|
|
- **Description**: Enhance database client with async support and robust error handling
|
|
- **Features**:
|
|
- Connection pooling for performance
|
|
- Async/await support for non-blocking operations
|
|
- Comprehensive error handling and retry logic
|
|
- Query timeout management
|
|
- Connection health monitoring
|
|
- **Acceptance Criteria**:
|
|
- Database operations complete within 100ms
|
|
- Connection failures handled gracefully
|
|
- Connection pool recovers automatically
|
|
- All queries execute without blocking
|
|
|
|
#### TASK-1.3: Complete auto-discovery module
|
|
- **Description**: Implement full auto-discovery of stations and pumps from database
|
|
- **Features**:
|
|
- Automatic discovery on startup
|
|
- Periodic refresh of discovered assets
|
|
- Filtering by station and active status
|
|
- Integration with configuration
|
|
- **Acceptance Criteria**:
|
|
- All active stations and pumps discovered on startup
|
|
- Discovery completes within 30 seconds
|
|
- Configuration changes trigger rediscovery
|
|
- Invalid stations/pumps handled gracefully
|
|
|
|
#### TASK-1.4: Implement configuration management
|
|
- **Description**: Complete settings.py with comprehensive environment variable support
|
|
- **Configuration Areas**:
|
|
- Database connection parameters
|
|
- Protocol endpoints and ports
|
|
- Safety timeout settings
|
|
- Security settings (JWT, TLS)
|
|
- Alert configuration (email, SMS, webhook)
|
|
- Logging configuration
|
|
- **Acceptance Criteria**:
|
|
- All settings loaded from environment variables
|
|
- Type validation for all configuration values
|
|
- Sensitive values properly secured
|
|
- Configuration errors provide clear messages
|
|
|
|
#### TASK-1.5: Set up structured logging and audit system
|
|
- **Description**: Implement structlog with JSON formatting and audit trail
|
|
- **Features**:
|
|
- Structured logging in JSON format
|
|
- Correlation IDs for request tracing
|
|
- Audit trail for compliance requirements
|
|
- Log levels configurable at runtime
|
|
- Log rotation and retention policies
|
|
- **Acceptance Criteria**:
|
|
- All log entries include correlation IDs
|
|
- Audit events logged to database
|
|
- Logs searchable and filterable
|
|
- Performance impact < 5% on operations
|
|
|
|
### Phase 2: Safety Framework Implementation (Week 3-4)
|
|
|
|
**Objective**: Implement comprehensive safety mechanisms to prevent equipment damage and operational hazards.
|
|
|
|
#### TASK-2.1: Complete SafetyLimitEnforcer with all limit types
|
|
- **Description**: Implement multi-layer safety limits enforcement
|
|
- **Limit Types**:
|
|
- Speed limits (hard min/max)
|
|
- Level limits (min/max, emergency stop, dry run protection)
|
|
- Power and flow limits
|
|
- Rate of change limits
|
|
- Operational limits (starts per hour, run times)
|
|
- **Acceptance Criteria**:
|
|
- All setpoints pass through safety enforcer
|
|
- Violations logged and reported
|
|
- Rate of change limits prevent sudden changes
|
|
- Emergency stop levels trigger immediate action
|
|
|
|
#### TASK-2.2: Implement DatabaseWatchdog with failsafe mode
|
|
- **Description**: Monitor database updates and trigger failsafe when updates stop
|
|
- **Features**:
|
|
- 20-minute timeout detection
|
|
- Automatic revert to default setpoints
|
|
- Alert generation on failsafe activation
|
|
- Automatic recovery when updates resume
|
|
- **Acceptance Criteria**:
|
|
- Failsafe triggered within 20 minutes of no updates
|
|
- Default setpoints applied correctly
|
|
- Alerts sent to operators
|
|
- System recovers automatically when updates resume
|
|
|
|
#### TASK-2.3: Implement EmergencyStopManager with big red button
|
|
- **Description**: System-wide and targeted emergency stop functionality
|
|
- **Features**:
|
|
- Single pump emergency stop
|
|
- Station-wide emergency stop
|
|
- System-wide emergency stop
|
|
- Manual clearance with audit trail
|
|
- Integration with all protocol interfaces
|
|
- **Acceptance Criteria**:
|
|
- Emergency stop triggers within 1 second
|
|
- All affected pumps set to default setpoints
|
|
- Clear audit trail of stop/clear events
|
|
- REST API endpoints functional
|
|
|
|
#### TASK-2.4: Implement AlertManager with multi-channel alerts
|
|
- **Description**: Email, SMS, webhook, and SCADA alarm integration
|
|
- **Alert Channels**:
|
|
- Email alerts with configurable recipients
|
|
- SMS alerts for critical events
|
|
- Webhook integration for external systems
|
|
- SCADA HMI alarm integration via OPC UA
|
|
- **Acceptance Criteria**:
|
|
- Alerts delivered within 30 seconds
|
|
- Multiple delivery attempts for failed alerts
|
|
- Alert content includes all relevant context
|
|
- Alert history maintained
|
|
|
|
#### TASK-2.5: Create comprehensive safety tests
|
|
- **Description**: Test all safety scenarios including edge cases and failure modes
|
|
- **Test Scenarios**:
|
|
- Normal operation within limits
|
|
- Safety limit violations
|
|
- Failsafe mode activation and recovery
|
|
- Emergency stop functionality
|
|
- Alert delivery verification
|
|
- **Acceptance Criteria**:
|
|
- 100% test coverage for safety components
|
|
- All failure modes tested and handled
|
|
- Performance under load validated
|
|
- Integration with other components verified
|
|
|
|
### Phase 3: Plan-to-Setpoint Logic Engine (Week 5-6)
|
|
|
|
**Objective**: Implement control logic for different pump types with safety integration.
|
|
|
|
#### TASK-3.1: Implement SetpointManager with safety integration
|
|
- **Description**: Coordinate safety checks and setpoint calculation
|
|
- **Integration Points**:
|
|
- Emergency stop status checking
|
|
- Failsafe mode detection
|
|
- Safety limit enforcement
|
|
- Control type-specific calculation
|
|
- **Acceptance Criteria**:
|
|
- Safety checks performed before setpoint calculation
|
|
- Emergency stop overrides all other logic
|
|
- Failsafe mode uses default setpoints
|
|
- Performance: setpoint calculation < 10ms
|
|
|
|
#### TASK-3.2: Create control calculators for different pump types
|
|
- **Description**: Implement calculators for DIRECT_SPEED, LEVEL_CONTROLLED, POWER_CONTROLLED
|
|
- **Calculator Types**:
|
|
- DirectSpeedCalculator: Direct speed control
|
|
- LevelControlledCalculator: Level-based control with PID
|
|
- PowerControlledCalculator: Power-based optimization
|
|
- **Acceptance Criteria**:
|
|
- Each calculator produces valid setpoints
|
|
- Control parameters configurable per pump
|
|
- Feedback integration for adaptive control
|
|
- Smooth transitions between setpoints
|
|
|
|
#### TASK-3.3: Implement feedback integration
|
|
- **Description**: Use real-time feedback for adaptive control
|
|
- **Feedback Sources**:
|
|
- Actual speed measurements
|
|
- Power consumption
|
|
- Flow rates
|
|
- Wet well levels
|
|
- Pump running status
|
|
- **Acceptance Criteria**:
|
|
- Feedback used to validate setpoint effectiveness
|
|
- Adaptive control based on actual performance
|
|
- Feedback delays handled appropriately
|
|
- Invalid feedback data rejected
|
|
|
|
#### TASK-3.4: Create plan-to-setpoint integration tests
|
|
- **Description**: Test all control scenarios with safety integration
|
|
- **Test Scenarios**:
|
|
- Normal optimization plan execution
|
|
- Control type-specific calculations
|
|
- Safety limit integration
|
|
- Emergency stop override
|
|
- Failsafe mode operation
|
|
- **Acceptance Criteria**:
|
|
- All control scenarios tested
|
|
- Safety integration verified
|
|
- Performance requirements met
|
|
- Edge cases handled correctly
|
|
|
|
### Phase 4: Security Layer Implementation (Week 4-5) ✅ **COMPLETE**
|
|
|
|
**Objective**: Implement comprehensive security features including authentication, authorization, TLS/SSL encryption, and compliance audit logging.
|
|
|
|
#### TASK-4.1: Implement authentication and authorization ✅ **COMPLETE**
|
|
- **Description**: JWT-based authentication with bcrypt password hashing and role-based access control
|
|
- **Security Features**:
|
|
- JWT token authentication with bcrypt password hashing
|
|
- Role-based access control with 4 roles (admin, operator, engineer, viewer)
|
|
- Permission-based access control for all operations
|
|
- User management with password policies
|
|
- Token-based authentication for REST API
|
|
- **Acceptance Criteria**: ✅ **MET**
|
|
- All access properly authenticated
|
|
- Authorization rules enforced
|
|
- Session security maintained
|
|
- Security events monitored and alerted
|
|
- **24 comprehensive tests passing**
|
|
|
|
#### TASK-4.2: Implement TLS/SSL encryption ✅ **COMPLETE**
|
|
- **Description**: Secure communications with certificate management and validation
|
|
- **Encryption Implementation**:
|
|
- TLS/SSL manager with certificate validation
|
|
- Certificate rotation monitoring
|
|
- Self-signed certificate generation for development
|
|
- REST API TLS support
|
|
- Secure cipher suites configuration
|
|
- **Acceptance Criteria**: ✅ **MET**
|
|
- All external communications encrypted
|
|
- Certificates properly validated
|
|
- Encryption performance acceptable
|
|
- Certificate expiration monitored
|
|
- **17 comprehensive tests passing**
|
|
|
|
#### TASK-4.3: Implement compliance audit logging ✅ **COMPLETE**
|
|
- **Description**: Enhanced audit logging compliant with IEC 62443, ISO 27001, and NIS2
|
|
- **Audit Requirements**:
|
|
- Comprehensive audit event types (35+ event types)
|
|
- Audit trail retrieval and query capabilities
|
|
- Compliance reporting generation
|
|
- Immutable log storage
|
|
- Integration with all security events
|
|
- **Acceptance Criteria**: ✅ **MET**
|
|
- Audit trail complete and searchable
|
|
- Logs protected from tampering
|
|
- Compliance reports generatable
|
|
- Retention policies enforced
|
|
- **15 comprehensive tests passing**
|
|
|
|
#### TASK-4.4: Create security compliance documentation ✅ **COMPLETE**
|
|
- **Description**: Document compliance with standards and security controls
|
|
- **Documentation Areas**:
|
|
- Security architecture documentation
|
|
- Compliance matrix for standards
|
|
- Security control implementation details
|
|
- Risk assessment documentation
|
|
- Incident response procedures
|
|
- **Acceptance Criteria**: ✅ **MET**
|
|
- Documentation complete and accurate
|
|
- Compliance evidence documented
|
|
- Security controls mapped to requirements
|
|
- Documentation maintained and versioned
|
|
|
|
**Phase 4 Summary**: ✅ **56 security tests passing** - All requirements exceeded with more secure implementations than originally specified
|
|
|
|
### Phase 5: Protocol Server Enhancement (Week 5-6) ✅ **COMPLETE**
|
|
|
|
**Objective**: Enhance protocol servers with security integration and complete multi-protocol support.
|
|
|
|
#### TASK-5.1: Enhance OPC UA Server with security integration
|
|
- **Description**: Integrate security layer with OPC UA server
|
|
- **Security Integration**:
|
|
- Certificate-based authentication for OPC UA
|
|
- Role-based authorization for OPC UA operations
|
|
- Security event logging for OPC UA access
|
|
- Integration with compliance audit logging
|
|
- Secure communication with OPC UA clients
|
|
- **Acceptance Criteria**:
|
|
- OPC UA clients authenticated and authorized
|
|
- Security events logged to audit trail
|
|
- Performance: < 100ms response time
|
|
- Error conditions handled gracefully
|
|
|
|
#### TASK-5.2: Enhance Modbus TCP Server with security features
|
|
- **Description**: Add security controls to Modbus TCP server
|
|
- **Security Features**:
|
|
- IP-based access control for Modbus
|
|
- Rate limiting for Modbus requests
|
|
- Security event logging for Modbus operations
|
|
- Integration with compliance audit logging
|
|
- Secure communication validation
|
|
- **Acceptance Criteria**:
|
|
- Unauthorized Modbus access blocked
|
|
- Security events logged to audit trail
|
|
- Performance: < 50ms response time
|
|
- Error responses for invalid requests
|
|
|
|
#### TASK-5.3: Complete REST API security integration
|
|
- **Description**: Finalize REST API security with all endpoints protected
|
|
- **API Security**:
|
|
- All REST endpoints protected with JWT authentication
|
|
- Role-based authorization for all operations
|
|
- Rate limiting and request validation
|
|
- Security headers and CORS configuration
|
|
- OpenAPI documentation with security schemes
|
|
- **Acceptance Criteria**:
|
|
- All endpoints properly secured
|
|
- Authentication required for sensitive operations
|
|
- Performance: < 200ms response time
|
|
- OpenAPI documentation complete
|
|
|
|
#### TASK-5.4: Create protocol security integration tests
|
|
- **Description**: Test security integration across all protocol interfaces
|
|
- **Test Scenarios**:
|
|
- OPC UA client authentication and authorization
|
|
- Modbus TCP access control and rate limiting
|
|
- REST API endpoint security testing
|
|
- Cross-protocol security consistency
|
|
- Performance under security overhead
|
|
- **Acceptance Criteria**: ✅ **MET**
|
|
- All protocols properly secured
|
|
- Security controls effective across interfaces
|
|
- Performance requirements met under security overhead
|
|
- Error conditions handled gracefully
|
|
|
|
**Phase 5 Summary**: ✅ **220 total tests passing** - All protocol servers enhanced with security integration, performance optimizations, and comprehensive monitoring. Implementation exceeds requirements with additional performance features and production readiness.
|
|
|
|
### Phase 6: Integration & System Testing (Week 10-11)
|
|
|
|
**Objective**: End-to-end testing and validation of the complete system.
|
|
|
|
#### TASK-6.1: Set up test database with realistic data
|
|
- **Description**: Create test data for multiple stations and pump scenarios
|
|
- **Test Data**:
|
|
- Multiple pump stations with different configurations
|
|
- Various pump types and control strategies
|
|
- Historical optimization plans
|
|
- Safety limit configurations
|
|
- Realistic feedback data
|
|
- **Acceptance Criteria**:
|
|
- Test data covers all scenarios
|
|
- Data relationships maintained
|
|
- Performance testing possible
|
|
- Edge cases represented
|
|
|
|
#### TASK-6.2: Create end-to-end integration tests
|
|
- **Description**: Test full system workflow from optimization to SCADA
|
|
- **Test Workflows**:
|
|
- Normal optimization control flow
|
|
- Safety limit violation handling
|
|
- Emergency stop activation and clearance
|
|
- Failsafe mode operation
|
|
- Protocol integration testing
|
|
- **Acceptance Criteria**:
|
|
- All workflows function correctly
|
|
- Data flows through entire system
|
|
- Performance meets requirements
|
|
- Error conditions handled appropriately
|
|
|
|
#### TASK-6.3: Implement performance and load testing
|
|
- **Description**: Test system under load with multiple pumps and protocols
|
|
- **Load Testing**:
|
|
- Concurrent protocol connections
|
|
- High-frequency setpoint updates
|
|
- Multiple safety limit checks
|
|
- Database query performance
|
|
- Memory and CPU utilization
|
|
- **Acceptance Criteria**:
|
|
- System handles expected load
|
|
- Response times within requirements
|
|
- Resource utilization acceptable
|
|
- No memory leaks or performance degradation
|
|
|
|
#### TASK-6.4: Create failure mode and recovery tests
|
|
- **Description**: Test system behavior during failures and recovery
|
|
- **Failure Scenarios**:
|
|
- Database connection loss
|
|
- Network connectivity issues
|
|
- Protocol server failures
|
|
- Safety system failures
|
|
- Resource exhaustion
|
|
- **Acceptance Criteria**:
|
|
- System fails safely
|
|
- Recovery automatic where possible
|
|
- Alerts generated for failures
|
|
- Data integrity maintained
|
|
|
|
#### TASK-6.5: Implement health monitoring and metrics
|
|
- **Description**: Prometheus metrics and health checks
|
|
- **Monitoring Areas**:
|
|
- System health and availability
|
|
- Performance metrics
|
|
- Safety system status
|
|
- Protocol connectivity
|
|
- Resource utilization
|
|
- **Acceptance Criteria**:
|
|
- All critical metrics monitored
|
|
- Health checks functional
|
|
- Alert thresholds configured
|
|
- Dashboard available for visualization
|
|
|
|
### Phase 7: Deployment & Production Readiness (Week 12)
|
|
|
|
**Objective**: Prepare for production deployment with operational support.
|
|
|
|
#### TASK-7.1: Complete Docker containerization
|
|
- **Description**: Optimize Dockerfile and create docker-compose for production
|
|
- **Containerization**:
|
|
- Multi-stage Docker build
|
|
- Security scanning and vulnerability assessment
|
|
- Resource limits and constraints
|
|
- Health check implementation
|
|
- Logging configuration
|
|
- **Acceptance Criteria**:
|
|
- Container builds successfully
|
|
- Security vulnerabilities addressed
|
|
- Resource usage optimized
|
|
- Logging functional in container
|
|
|
|
#### TASK-7.2: Create deployment documentation
|
|
- **Description**: Deployment guides, configuration examples, and troubleshooting
|
|
- **Documentation**:
|
|
- Installation and setup guide
|
|
- Configuration reference
|
|
- Troubleshooting guide
|
|
- Upgrade procedures
|
|
- Backup and recovery procedures
|
|
- **Acceptance Criteria**:
|
|
- Documentation complete and accurate
|
|
- Step-by-step procedures validated
|
|
- Common issues documented
|
|
- Maintenance procedures clear
|
|
|
|
#### TASK-7.3: Implement monitoring and alerting
|
|
- **Description**: Grafana dashboards, alert rules, and operational monitoring
|
|
- **Monitoring Setup**:
|
|
- Grafana dashboards for all metrics
|
|
- Alert rules for critical conditions
|
|
- Log aggregation and analysis
|
|
- Performance trending
|
|
- Capacity planning data
|
|
- **Acceptance Criteria**:
|
|
- Dashboards provide operational visibility
|
|
- Alerts generated for critical conditions
|
|
- Logs searchable and analyzable
|
|
- Performance baselines established
|
|
|
|
#### TASK-7.4: Create backup and recovery procedures
|
|
- **Description**: Database backup, configuration backup, and disaster recovery
|
|
- **Backup Strategy**:
|
|
- Database backup procedures
|
|
- Configuration backup
|
|
- Certificate and key backup
|
|
- Recovery procedures
|
|
- Testing of backup restoration
|
|
- **Acceptance Criteria**:
|
|
- Backup procedures documented and tested
|
|
- Recovery time objectives met
|
|
- Data integrity maintained
|
|
- Backup success monitored
|
|
|
|
#### TASK-7.5: Final security review and hardening
|
|
- **Description**: Security audit, vulnerability assessment, and hardening
|
|
- **Security Activities**:
|
|
- Penetration testing
|
|
- Vulnerability scanning
|
|
- Security configuration review
|
|
- Access control validation
|
|
- Security incident response testing
|
|
- **Acceptance Criteria**:
|
|
- All security vulnerabilities addressed
|
|
- Security controls validated
|
|
- Incident response procedures tested
|
|
- Production security posture established
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Testing
|
|
- **Coverage**: 90%+ code coverage for all components
|
|
- **Focus**: Individual component functionality
|
|
- **Tools**: pytest, pytest-asyncio, pytest-cov
|
|
|
|
### Integration Testing
|
|
- **Coverage**: All component interactions
|
|
- **Focus**: Data flow between components
|
|
- **Tools**: pytest with test database
|
|
|
|
### System Testing
|
|
- **Coverage**: End-to-end workflows
|
|
- **Focus**: Complete system functionality
|
|
- **Tools**: Docker Compose, test automation
|
|
|
|
### Performance Testing
|
|
- **Coverage**: Load and stress testing
|
|
- **Focus**: Response times and resource usage
|
|
- **Tools**: Locust, k6, custom load generators
|
|
|
|
### Security Testing
|
|
- **Coverage**: All security controls
|
|
- **Focus**: Vulnerability assessment
|
|
- **Tools**: OWASP ZAP, security scanners
|
|
|
|
## Risk Management
|
|
|
|
### Technical Risks
|
|
- Database performance under load
|
|
- Protocol compatibility with SCADA systems
|
|
- Safety system reliability
|
|
- Security vulnerabilities
|
|
|
|
### Mitigation Strategies
|
|
- Performance testing early and often
|
|
- Protocol testing with real SCADA systems
|
|
- Redundant safety mechanisms
|
|
- Regular security assessments
|
|
|
|
## Success Criteria
|
|
|
|
### Functional Requirements
|
|
- All safety mechanisms operational
|
|
- Multi-protocol support functional
|
|
- Real-time performance requirements met
|
|
- Compliance with standards achieved
|
|
|
|
### Non-Functional Requirements
|
|
- 99.9% system availability
|
|
- Sub-second response times
|
|
- Secure operation validated
|
|
- Comprehensive documentation
|
|
|
|
## Conclusion
|
|
|
|
This implementation plan provides a comprehensive roadmap for developing the Calejo Control Adapter v2.0 with Safety & Security Framework. The phased approach ensures systematic development with thorough testing at each stage, resulting in a robust, secure, and reliable system for municipal wastewater pump station control. |