# Calejo Control Adapter - Operations & Maintenance Guide ## Overview This guide provides comprehensive procedures for daily operations, monitoring, troubleshooting, and maintenance of the Calejo Control Adapter system. ## Daily Operations ### System Startup and Shutdown #### Normal Startup Procedure ```bash # Start all services docker-compose up -d # Verify services are running docker-compose ps # Check health status curl http://localhost:8080/api/v1/health ``` #### Graceful Shutdown Procedure ```bash # Stop services gracefully docker-compose down # Verify all services stopped docker-compose ps ``` #### Emergency Shutdown ```bash # Immediate shutdown (use only in emergencies) docker-compose down --timeout 0 ``` ### Daily Health Checks #### Automated Health Monitoring ```bash # Run automated health check ./scripts/health-check.sh # Check specific components curl http://localhost:8080/api/v1/health/detailed ``` #### Manual Health Verification ```python # Check database connectivity psql "${DATABASE_URL}" -c "SELECT 1;" # Check protocol servers opcua-client connect opc.tcp://localhost:4840 modbus-tcp read 127.0.0.1 502 40001 10 curl http://localhost:8080/api/v1/status ``` ### Performance Monitoring #### Key Performance Indicators | Metric | Target | Alert Threshold | |--------|--------|-----------------| | **Response Time** | < 100ms | > 500ms | | **CPU Usage** | < 70% | > 90% | | **Memory Usage** | < 80% | > 95% | | **Database Connections** | < 50% of max | > 80% of max | | **Network Latency** | < 10ms | > 50ms | #### Performance Monitoring Commands ```bash # Monitor system resources docker stats # Check application performance curl http://localhost:8080/api/v1/metrics # Monitor database performance psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_activity;" ``` ## Monitoring & Alerting ### Real-time Monitoring #### Application Monitoring ```bash # View application logs in real-time docker-compose logs -f control-adapter # Monitor specific components docker-compose logs -f control-adapter | grep -E "(ERROR|WARNING|CRITICAL)" # Check service status systemctl status calejo-control-adapter ``` #### Database Monitoring ```bash # Monitor database performance psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_database WHERE datname='calejo';" # Check connection pool psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity WHERE datname='calejo';" ``` ### Alert Configuration #### Email Alerts ```yaml # Email alert configuration alerts: email: enabled: true smtp_server: smtp.example.com smtp_port: 587 from_address: alerts@calejo.com to_addresses: - operations@calejo.com - engineering@calejo.com ``` #### SMS Alerts ```yaml # SMS alert configuration alerts: sms: enabled: true provider: twilio account_sid: ${TWILIO_ACCOUNT_SID} auth_token: ${TWILIO_AUTH_TOKEN} from_number: +1234567890 to_numbers: - +1234567891 - +1234567892 ``` #### Webhook Alerts ```yaml # Webhook alert configuration alerts: webhook: enabled: true url: https://monitoring.example.com/webhook secret: ${WEBHOOK_SECRET} ``` ### Alert Severity Levels | Severity | Description | Response Time | Notification Channels | |----------|-------------|---------------|----------------------| | **Critical** | System failure, safety violation | Immediate (< 15 min) | SMS, Email, Webhook | | **High** | Performance degradation, security event | Urgent (< 1 hour) | Email, Webhook | | **Medium** | Configuration issues, warnings | Standard (< 4 hours) | Email | | **Low** | Informational events | Routine (< 24 hours) | Dashboard only | ## Maintenance Procedures ### Regular Maintenance Tasks #### Daily Tasks ```bash # Check system health ./scripts/health-check.sh # Review error logs docker-compose logs control-adapter --since "24h" | grep ERROR # Verify backups ls -la /var/backup/calejo/ ``` #### Weekly Tasks ```bash # Database maintenance psql "${DATABASE_URL}" -c "VACUUM ANALYZE;" # Log rotation find /var/log/calejo -name "*.log" -mtime +7 -delete # Backup verification ./scripts/verify-backup.sh latest-backup.tar.gz ``` #### Monthly Tasks ```bash # Security updates docker-compose pull docker-compose build --no-cache # Performance analysis ./scripts/performance-analysis.sh # Compliance audit ./scripts/compliance-audit.sh ``` ### Backup and Recovery #### Automated Backups ```bash # Create full backup ./scripts/backup-full.sh # Create configuration-only backup ./scripts/backup-config.sh # Create database-only backup ./scripts/backup-database.sh ``` #### Backup Schedule | Backup Type | Frequency | Retention | Location | |-------------|-----------|-----------|----------| | **Full System** | Daily | 7 days | /var/backup/calejo/ | | **Database** | Hourly | 24 hours | /var/backup/calejo/database/ | | **Configuration** | Weekly | 4 weeks | /var/backup/calejo/config/ | #### Recovery Procedures ```bash # Full system recovery ./scripts/restore-full.sh /var/backup/calejo/calejo-backup-20231026.tar.gz # Database recovery ./scripts/restore-database.sh /var/backup/calejo/database/backup.sql # Configuration recovery ./scripts/restore-config.sh /var/backup/calejo/config/config-backup.tar.gz ``` ### Software Updates #### Update Procedure ```bash # 1. Create backup ./scripts/backup-full.sh # 2. Stop services docker-compose down # 3. Update application git pull origin main # 4. Rebuild services docker-compose build --no-cache # 5. Start services docker-compose up -d # 6. Verify update ./scripts/health-check.sh ``` #### Rollback Procedure ```bash # 1. Stop services docker-compose down # 2. Restore from backup ./scripts/restore-full.sh /var/backup/calejo/calejo-backup-pre-update.tar.gz # 3. Start services docker-compose up -d # 4. Verify rollback ./scripts/health-check.sh ``` ## Troubleshooting ### Common Issues and Solutions #### Database Connection Issues **Symptoms**: - "Connection refused" errors - Slow response times - Connection pool exhaustion **Solutions**: ```bash # Check PostgreSQL status systemctl status postgresql # Verify connection parameters psql "${DATABASE_URL}" -c "SELECT version();" # Check connection pool psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity;" ``` #### Protocol Server Issues **OPC UA Server Problems**: ```bash # Test OPC UA connectivity opcua-client connect opc.tcp://localhost:4840 # Check OPC UA logs docker-compose logs control-adapter | grep opcua # Verify certificate validity openssl x509 -in /app/certs/server.pem -text -noout ``` **Modbus TCP Issues**: ```bash # Test Modbus connectivity modbus-tcp read 127.0.0.1 502 40001 10 # Check Modbus logs docker-compose logs control-adapter | grep modbus # Verify port availability netstat -tulpn | grep :502 ``` #### Performance Issues **High CPU Usage**: ```bash # Identify resource usage docker stats # Check for runaway processes ps aux | grep python # Analyze database queries psql "${DATABASE_URL}" -c "SELECT query, calls, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;" ``` **Memory Issues**: ```bash # Check memory usage free -h # Monitor application memory docker stats control-adapter # Check for memory leaks journalctl -u docker --since "1 hour ago" | grep -i memory ``` ### Diagnostic Tools #### Log Analysis ```bash # View recent errors docker-compose logs control-adapter --since "1h" | grep -E "(ERROR|CRITICAL)" # Search for specific patterns docker-compose logs control-adapter | grep -i "connection" # Export logs for analysis docker-compose logs control-adapter > application-logs-$(date +%Y%m%d).log ``` #### Performance Analysis ```bash # Run performance tests ./scripts/performance-test.sh # Generate performance report ./scripts/performance-report.sh # Monitor real-time performance ./scripts/monitor-performance.sh ``` #### Security Analysis ```bash # Run security scan ./scripts/security-scan.sh # Check compliance status ./scripts/compliance-check.sh # Audit user activity ./scripts/audit-report.sh ``` ## Security Operations ### Access Control #### User Management ```bash # List current users curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/users # Create new user curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \ -d '{"username":"newuser","role":"operator","email":"user@example.com"}' \ http://localhost:8080/api/v1/users # Deactivate user curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \ http://localhost:8080/api/v1/users/user123 ``` #### Role Management ```bash # View role permissions curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/roles # Update role permissions curl -X PUT -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \ -d '{"permissions":["read_pump_status","emergency_stop"]}' \ http://localhost:8080/api/v1/roles/operator ``` ### Security Monitoring #### Audit Log Review ```bash # View recent security events psql "${DATABASE_URL}" -c "SELECT * FROM compliance_audit_log WHERE severity IN ('HIGH','CRITICAL') ORDER BY timestamp DESC LIMIT 10;" # Generate security report ./scripts/security-report.sh # Monitor failed login attempts psql "${DATABASE_URL}" -c "SELECT COUNT(*) FROM compliance_audit_log WHERE event_type='INVALID_AUTHENTICATION' AND timestamp > NOW() - INTERVAL '1 hour';" ``` #### Certificate Management ```bash # Check certificate expiration openssl x509 -in /app/certs/server.pem -enddate -noout # Rotate certificates ./scripts/rotate-certificates.sh # Verify certificate chain openssl verify -CAfile /app/certs/ca.crt /app/certs/server.pem ``` ## Compliance Operations ### Regulatory Compliance #### IEC 62443 Compliance ```bash # Generate compliance report ./scripts/iec62443-report.sh # Verify security controls ./scripts/security-controls-check.sh # Audit trail verification ./scripts/audit-trail-verification.sh ``` #### ISO 27001 Compliance ```bash # ISO 27001 controls check ./scripts/iso27001-check.sh # Risk assessment ./scripts/risk-assessment.sh # Security policy compliance ./scripts/security-policy-check.sh ``` ### Documentation and Reporting #### Compliance Reports ```bash # Generate monthly compliance report ./scripts/generate-compliance-report.sh # Export audit logs ./scripts/export-audit-logs.sh # Create security assessment ./scripts/security-assessment.sh ``` ## Emergency Procedures ### Emergency Stop Operations #### Manual Emergency Stop ```bash # Activate emergency stop for station curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \ -d '{"reason":"Emergency maintenance","operator":"operator001"}' \ http://localhost:8080/api/v1/pump-stations/station001/emergency-stop # Clear emergency stop curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \ http://localhost:8080/api/v1/pump-stations/station001/emergency-stop ``` #### System Recovery ```bash # Check emergency stop status curl -H "Authorization: Bearer ${TOKEN}" \ http://localhost:8080/api/v1/pump-stations/station001/emergency-stop-status # Verify system recovery ./scripts/emergency-recovery-check.sh ``` ### Disaster Recovery #### Full System Recovery ```bash # 1. Stop all services docker-compose down # 2. Restore from latest backup ./scripts/restore-full.sh /var/backup/calejo/calejo-backup-latest.tar.gz # 3. Start services docker-compose up -d # 4. Verify recovery ./scripts/health-check.sh ./scripts/emergency-recovery-verification.sh ``` #### Database Recovery ```bash # 1. Stop database-dependent services docker-compose stop control-adapter # 2. Restore database ./scripts/restore-database.sh /var/backup/calejo/database/backup-latest.sql # 3. Start services docker-compose up -d # 4. Verify data integrity ./scripts/database-integrity-check.sh ``` --- *This operations and maintenance guide provides comprehensive procedures for managing the Calejo Control Adapter system. Always follow documented procedures and maintain proper change control for all operational activities.*