12 KiB
12 KiB
Calejo Control Adapter - Operations & Maintenance Guide
Overview
This guide provides comprehensive procedures for daily operations, monitoring, troubleshooting, and maintenance of the Calejo Control Adapter system.
Daily Operations
System Startup and Shutdown
Normal Startup Procedure
# Start all services
docker-compose up -d
# Verify services are running
docker-compose ps
# Check health status
curl http://localhost:8080/api/v1/health
Graceful Shutdown Procedure
# Stop services gracefully
docker-compose down
# Verify all services stopped
docker-compose ps
Emergency Shutdown
# Immediate shutdown (use only in emergencies)
docker-compose down --timeout 0
Daily Health Checks
Automated Health Monitoring
# Run automated health check
./scripts/health-check.sh
# Check specific components
curl http://localhost:8080/api/v1/health/detailed
Manual Health Verification
# Check database connectivity
psql "${DATABASE_URL}" -c "SELECT 1;"
# Check protocol servers
opcua-client connect opc.tcp://localhost:4840
modbus-tcp read 127.0.0.1 502 40001 10
curl http://localhost:8080/api/v1/status
Performance Monitoring
Key Performance Indicators
| Metric | Target | Alert Threshold |
|---|---|---|
| Response Time | < 100ms | > 500ms |
| CPU Usage | < 70% | > 90% |
| Memory Usage | < 80% | > 95% |
| Database Connections | < 50% of max | > 80% of max |
| Network Latency | < 10ms | > 50ms |
Performance Monitoring Commands
# Monitor system resources
docker stats
# Check application performance
curl http://localhost:8080/api/v1/metrics
# Monitor database performance
psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_activity;"
Monitoring & Alerting
Real-time Monitoring
Application Monitoring
# View application logs in real-time
docker-compose logs -f control-adapter
# Monitor specific components
docker-compose logs -f control-adapter | grep -E "(ERROR|WARNING|CRITICAL)"
# Check service status
systemctl status calejo-control-adapter
Database Monitoring
# Monitor database performance
psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_database WHERE datname='calejo';"
# Check connection pool
psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity WHERE datname='calejo';"
Alert Configuration
Email Alerts
# Email alert configuration
alerts:
email:
enabled: true
smtp_server: smtp.example.com
smtp_port: 587
from_address: alerts@calejo.com
to_addresses:
- operations@calejo.com
- engineering@calejo.com
SMS Alerts
# SMS alert configuration
alerts:
sms:
enabled: true
provider: twilio
account_sid: ${TWILIO_ACCOUNT_SID}
auth_token: ${TWILIO_AUTH_TOKEN}
from_number: +1234567890
to_numbers:
- +1234567891
- +1234567892
Webhook Alerts
# Webhook alert configuration
alerts:
webhook:
enabled: true
url: https://monitoring.example.com/webhook
secret: ${WEBHOOK_SECRET}
Alert Severity Levels
| Severity | Description | Response Time | Notification Channels |
|---|---|---|---|
| Critical | System failure, safety violation | Immediate (< 15 min) | SMS, Email, Webhook |
| High | Performance degradation, security event | Urgent (< 1 hour) | Email, Webhook |
| Medium | Configuration issues, warnings | Standard (< 4 hours) | |
| Low | Informational events | Routine (< 24 hours) | Dashboard only |
Maintenance Procedures
Regular Maintenance Tasks
Daily Tasks
# Check system health
./scripts/health-check.sh
# Review error logs
docker-compose logs control-adapter --since "24h" | grep ERROR
# Verify backups
ls -la /var/backup/calejo/
Weekly Tasks
# Database maintenance
psql "${DATABASE_URL}" -c "VACUUM ANALYZE;"
# Log rotation
find /var/log/calejo -name "*.log" -mtime +7 -delete
# Backup verification
./scripts/verify-backup.sh latest-backup.tar.gz
Monthly Tasks
# Security updates
docker-compose pull
docker-compose build --no-cache
# Performance analysis
./scripts/performance-analysis.sh
# Compliance audit
./scripts/compliance-audit.sh
Backup and Recovery
Automated Backups
# Create full backup
./scripts/backup-full.sh
# Create configuration-only backup
./scripts/backup-config.sh
# Create database-only backup
./scripts/backup-database.sh
Backup Schedule
| Backup Type | Frequency | Retention | Location |
|---|---|---|---|
| Full System | Daily | 7 days | /var/backup/calejo/ |
| Database | Hourly | 24 hours | /var/backup/calejo/database/ |
| Configuration | Weekly | 4 weeks | /var/backup/calejo/config/ |
Recovery Procedures
# Full system recovery
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-20231026.tar.gz
# Database recovery
./scripts/restore-database.sh /var/backup/calejo/database/backup.sql
# Configuration recovery
./scripts/restore-config.sh /var/backup/calejo/config/config-backup.tar.gz
Software Updates
Update Procedure
# 1. Create backup
./scripts/backup-full.sh
# 2. Stop services
docker-compose down
# 3. Update application
git pull origin main
# 4. Rebuild services
docker-compose build --no-cache
# 5. Start services
docker-compose up -d
# 6. Verify update
./scripts/health-check.sh
Rollback Procedure
# 1. Stop services
docker-compose down
# 2. Restore from backup
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-pre-update.tar.gz
# 3. Start services
docker-compose up -d
# 4. Verify rollback
./scripts/health-check.sh
Troubleshooting
Common Issues and Solutions
Database Connection Issues
Symptoms:
- "Connection refused" errors
- Slow response times
- Connection pool exhaustion
Solutions:
# Check PostgreSQL status
systemctl status postgresql
# Verify connection parameters
psql "${DATABASE_URL}" -c "SELECT version();"
# Check connection pool
psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity;"
Protocol Server Issues
OPC UA Server Problems:
# Test OPC UA connectivity
opcua-client connect opc.tcp://localhost:4840
# Check OPC UA logs
docker-compose logs control-adapter | grep opcua
# Verify certificate validity
openssl x509 -in /app/certs/server.pem -text -noout
Modbus TCP Issues:
# Test Modbus connectivity
modbus-tcp read 127.0.0.1 502 40001 10
# Check Modbus logs
docker-compose logs control-adapter | grep modbus
# Verify port availability
netstat -tulpn | grep :502
Performance Issues
High CPU Usage:
# Identify resource usage
docker stats
# Check for runaway processes
ps aux | grep python
# Analyze database queries
psql "${DATABASE_URL}" -c "SELECT query, calls, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
Memory Issues:
# Check memory usage
free -h
# Monitor application memory
docker stats control-adapter
# Check for memory leaks
journalctl -u docker --since "1 hour ago" | grep -i memory
Diagnostic Tools
Log Analysis
# View recent errors
docker-compose logs control-adapter --since "1h" | grep -E "(ERROR|CRITICAL)"
# Search for specific patterns
docker-compose logs control-adapter | grep -i "connection"
# Export logs for analysis
docker-compose logs control-adapter > application-logs-$(date +%Y%m%d).log
Performance Analysis
# Run performance tests
./scripts/performance-test.sh
# Generate performance report
./scripts/performance-report.sh
# Monitor real-time performance
./scripts/monitor-performance.sh
Security Analysis
# Run security scan
./scripts/security-scan.sh
# Check compliance status
./scripts/compliance-check.sh
# Audit user activity
./scripts/audit-report.sh
Security Operations
Access Control
User Management
# List current users
curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/users
# Create new user
curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
-d '{"username":"newuser","role":"operator","email":"user@example.com"}' \
http://localhost:8080/api/v1/users
# Deactivate user
curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \
http://localhost:8080/api/v1/users/user123
Role Management
# View role permissions
curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/roles
# Update role permissions
curl -X PUT -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
-d '{"permissions":["read_pump_status","emergency_stop"]}' \
http://localhost:8080/api/v1/roles/operator
Security Monitoring
Audit Log Review
# View recent security events
psql "${DATABASE_URL}" -c "SELECT * FROM compliance_audit_log WHERE severity IN ('HIGH','CRITICAL') ORDER BY timestamp DESC LIMIT 10;"
# Generate security report
./scripts/security-report.sh
# Monitor failed login attempts
psql "${DATABASE_URL}" -c "SELECT COUNT(*) FROM compliance_audit_log WHERE event_type='INVALID_AUTHENTICATION' AND timestamp > NOW() - INTERVAL '1 hour';"
Certificate Management
# Check certificate expiration
openssl x509 -in /app/certs/server.pem -enddate -noout
# Rotate certificates
./scripts/rotate-certificates.sh
# Verify certificate chain
openssl verify -CAfile /app/certs/ca.crt /app/certs/server.pem
Compliance Operations
Regulatory Compliance
IEC 62443 Compliance
# Generate compliance report
./scripts/iec62443-report.sh
# Verify security controls
./scripts/security-controls-check.sh
# Audit trail verification
./scripts/audit-trail-verification.sh
ISO 27001 Compliance
# ISO 27001 controls check
./scripts/iso27001-check.sh
# Risk assessment
./scripts/risk-assessment.sh
# Security policy compliance
./scripts/security-policy-check.sh
Documentation and Reporting
Compliance Reports
# Generate monthly compliance report
./scripts/generate-compliance-report.sh
# Export audit logs
./scripts/export-audit-logs.sh
# Create security assessment
./scripts/security-assessment.sh
Emergency Procedures
Emergency Stop Operations
Manual Emergency Stop
# Activate emergency stop for station
curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
-d '{"reason":"Emergency maintenance","operator":"operator001"}' \
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop
# Clear emergency stop
curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop
System Recovery
# Check emergency stop status
curl -H "Authorization: Bearer ${TOKEN}" \
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop-status
# Verify system recovery
./scripts/emergency-recovery-check.sh
Disaster Recovery
Full System Recovery
# 1. Stop all services
docker-compose down
# 2. Restore from latest backup
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-latest.tar.gz
# 3. Start services
docker-compose up -d
# 4. Verify recovery
./scripts/health-check.sh
./scripts/emergency-recovery-verification.sh
Database Recovery
# 1. Stop database-dependent services
docker-compose stop control-adapter
# 2. Restore database
./scripts/restore-database.sh /var/backup/calejo/database/backup-latest.sql
# 3. Start services
docker-compose up -d
# 4. Verify data integrity
./scripts/database-integrity-check.sh
This operations and maintenance guide provides comprehensive procedures for managing the Calejo Control Adapter system. Always follow documented procedures and maintain proper change control for all operational activities.