576 lines
12 KiB
Markdown
576 lines
12 KiB
Markdown
# Calejo Control Adapter - Operations & Maintenance Guide
|
|
|
|
## Overview
|
|
|
|
This guide provides comprehensive procedures for daily operations, monitoring, troubleshooting, and maintenance of the Calejo Control Adapter system.
|
|
|
|
## Daily Operations
|
|
|
|
### System Startup and Shutdown
|
|
|
|
#### Normal Startup Procedure
|
|
|
|
```bash
|
|
# Start all services
|
|
docker-compose up -d
|
|
|
|
# Verify services are running
|
|
docker-compose ps
|
|
|
|
# Check health status
|
|
curl http://localhost:8080/api/v1/health
|
|
```
|
|
|
|
#### Graceful Shutdown Procedure
|
|
|
|
```bash
|
|
# Stop services gracefully
|
|
docker-compose down
|
|
|
|
# Verify all services stopped
|
|
docker-compose ps
|
|
```
|
|
|
|
#### Emergency Shutdown
|
|
|
|
```bash
|
|
# Immediate shutdown (use only in emergencies)
|
|
docker-compose down --timeout 0
|
|
```
|
|
|
|
### Daily Health Checks
|
|
|
|
#### Automated Health Monitoring
|
|
|
|
```bash
|
|
# Run automated health check
|
|
./scripts/health-check.sh
|
|
|
|
# Check specific components
|
|
curl http://localhost:8080/api/v1/health/detailed
|
|
```
|
|
|
|
#### Manual Health Verification
|
|
|
|
```python
|
|
# Check database connectivity
|
|
psql "${DATABASE_URL}" -c "SELECT 1;"
|
|
|
|
# Check protocol servers
|
|
opcua-client connect opc.tcp://localhost:4840
|
|
modbus-tcp read 127.0.0.1 502 40001 10
|
|
curl http://localhost:8080/api/v1/status
|
|
```
|
|
|
|
### Performance Monitoring
|
|
|
|
#### Key Performance Indicators
|
|
|
|
| Metric | Target | Alert Threshold |
|
|
|--------|--------|-----------------|
|
|
| **Response Time** | < 100ms | > 500ms |
|
|
| **CPU Usage** | < 70% | > 90% |
|
|
| **Memory Usage** | < 80% | > 95% |
|
|
| **Database Connections** | < 50% of max | > 80% of max |
|
|
| **Network Latency** | < 10ms | > 50ms |
|
|
|
|
#### Performance Monitoring Commands
|
|
|
|
```bash
|
|
# Monitor system resources
|
|
docker stats
|
|
|
|
# Check application performance
|
|
curl http://localhost:8080/api/v1/metrics
|
|
|
|
# Monitor database performance
|
|
psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_activity;"
|
|
```
|
|
|
|
## Monitoring & Alerting
|
|
|
|
### Real-time Monitoring
|
|
|
|
#### Application Monitoring
|
|
|
|
```bash
|
|
# View application logs in real-time
|
|
docker-compose logs -f control-adapter
|
|
|
|
# Monitor specific components
|
|
docker-compose logs -f control-adapter | grep -E "(ERROR|WARNING|CRITICAL)"
|
|
|
|
# Check service status
|
|
systemctl status calejo-control-adapter
|
|
```
|
|
|
|
#### Database Monitoring
|
|
|
|
```bash
|
|
# Monitor database performance
|
|
psql "${DATABASE_URL}" -c "SELECT * FROM pg_stat_database WHERE datname='calejo';"
|
|
|
|
# Check connection pool
|
|
psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity WHERE datname='calejo';"
|
|
```
|
|
|
|
### Alert Configuration
|
|
|
|
#### Email Alerts
|
|
|
|
```yaml
|
|
# Email alert configuration
|
|
alerts:
|
|
email:
|
|
enabled: true
|
|
smtp_server: smtp.example.com
|
|
smtp_port: 587
|
|
from_address: alerts@calejo.com
|
|
to_addresses:
|
|
- operations@calejo.com
|
|
- engineering@calejo.com
|
|
```
|
|
|
|
#### SMS Alerts
|
|
|
|
```yaml
|
|
# SMS alert configuration
|
|
alerts:
|
|
sms:
|
|
enabled: true
|
|
provider: twilio
|
|
account_sid: ${TWILIO_ACCOUNT_SID}
|
|
auth_token: ${TWILIO_AUTH_TOKEN}
|
|
from_number: +1234567890
|
|
to_numbers:
|
|
- +1234567891
|
|
- +1234567892
|
|
```
|
|
|
|
#### Webhook Alerts
|
|
|
|
```yaml
|
|
# Webhook alert configuration
|
|
alerts:
|
|
webhook:
|
|
enabled: true
|
|
url: https://monitoring.example.com/webhook
|
|
secret: ${WEBHOOK_SECRET}
|
|
```
|
|
|
|
### Alert Severity Levels
|
|
|
|
| Severity | Description | Response Time | Notification Channels |
|
|
|----------|-------------|---------------|----------------------|
|
|
| **Critical** | System failure, safety violation | Immediate (< 15 min) | SMS, Email, Webhook |
|
|
| **High** | Performance degradation, security event | Urgent (< 1 hour) | Email, Webhook |
|
|
| **Medium** | Configuration issues, warnings | Standard (< 4 hours) | Email |
|
|
| **Low** | Informational events | Routine (< 24 hours) | Dashboard only |
|
|
|
|
## Maintenance Procedures
|
|
|
|
### Regular Maintenance Tasks
|
|
|
|
#### Daily Tasks
|
|
|
|
```bash
|
|
# Check system health
|
|
./scripts/health-check.sh
|
|
|
|
# Review error logs
|
|
docker-compose logs control-adapter --since "24h" | grep ERROR
|
|
|
|
# Verify backups
|
|
ls -la /var/backup/calejo/
|
|
```
|
|
|
|
#### Weekly Tasks
|
|
|
|
```bash
|
|
# Database maintenance
|
|
psql "${DATABASE_URL}" -c "VACUUM ANALYZE;"
|
|
|
|
# Log rotation
|
|
find /var/log/calejo -name "*.log" -mtime +7 -delete
|
|
|
|
# Backup verification
|
|
./scripts/verify-backup.sh latest-backup.tar.gz
|
|
```
|
|
|
|
#### Monthly Tasks
|
|
|
|
```bash
|
|
# Security updates
|
|
docker-compose pull
|
|
docker-compose build --no-cache
|
|
|
|
# Performance analysis
|
|
./scripts/performance-analysis.sh
|
|
|
|
# Compliance audit
|
|
./scripts/compliance-audit.sh
|
|
```
|
|
|
|
### Backup and Recovery
|
|
|
|
#### Automated Backups
|
|
|
|
```bash
|
|
# Create full backup
|
|
./scripts/backup-full.sh
|
|
|
|
# Create configuration-only backup
|
|
./scripts/backup-config.sh
|
|
|
|
# Create database-only backup
|
|
./scripts/backup-database.sh
|
|
```
|
|
|
|
#### Backup Schedule
|
|
|
|
| Backup Type | Frequency | Retention | Location |
|
|
|-------------|-----------|-----------|----------|
|
|
| **Full System** | Daily | 7 days | /var/backup/calejo/ |
|
|
| **Database** | Hourly | 24 hours | /var/backup/calejo/database/ |
|
|
| **Configuration** | Weekly | 4 weeks | /var/backup/calejo/config/ |
|
|
|
|
#### Recovery Procedures
|
|
|
|
```bash
|
|
# Full system recovery
|
|
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-20231026.tar.gz
|
|
|
|
# Database recovery
|
|
./scripts/restore-database.sh /var/backup/calejo/database/backup.sql
|
|
|
|
# Configuration recovery
|
|
./scripts/restore-config.sh /var/backup/calejo/config/config-backup.tar.gz
|
|
```
|
|
|
|
### Software Updates
|
|
|
|
#### Update Procedure
|
|
|
|
```bash
|
|
# 1. Create backup
|
|
./scripts/backup-full.sh
|
|
|
|
# 2. Stop services
|
|
docker-compose down
|
|
|
|
# 3. Update application
|
|
git pull origin main
|
|
|
|
# 4. Rebuild services
|
|
docker-compose build --no-cache
|
|
|
|
# 5. Start services
|
|
docker-compose up -d
|
|
|
|
# 6. Verify update
|
|
./scripts/health-check.sh
|
|
```
|
|
|
|
#### Rollback Procedure
|
|
|
|
```bash
|
|
# 1. Stop services
|
|
docker-compose down
|
|
|
|
# 2. Restore from backup
|
|
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-pre-update.tar.gz
|
|
|
|
# 3. Start services
|
|
docker-compose up -d
|
|
|
|
# 4. Verify rollback
|
|
./scripts/health-check.sh
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### Database Connection Issues
|
|
|
|
**Symptoms**:
|
|
- "Connection refused" errors
|
|
- Slow response times
|
|
- Connection pool exhaustion
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check PostgreSQL status
|
|
systemctl status postgresql
|
|
|
|
# Verify connection parameters
|
|
psql "${DATABASE_URL}" -c "SELECT version();"
|
|
|
|
# Check connection pool
|
|
psql "${DATABASE_URL}" -c "SELECT count(*) FROM pg_stat_activity;"
|
|
```
|
|
|
|
#### Protocol Server Issues
|
|
|
|
**OPC UA Server Problems**:
|
|
```bash
|
|
# Test OPC UA connectivity
|
|
opcua-client connect opc.tcp://localhost:4840
|
|
|
|
# Check OPC UA logs
|
|
docker-compose logs control-adapter | grep opcua
|
|
|
|
# Verify certificate validity
|
|
openssl x509 -in /app/certs/server.pem -text -noout
|
|
```
|
|
|
|
**Modbus TCP Issues**:
|
|
```bash
|
|
# Test Modbus connectivity
|
|
modbus-tcp read 127.0.0.1 502 40001 10
|
|
|
|
# Check Modbus logs
|
|
docker-compose logs control-adapter | grep modbus
|
|
|
|
# Verify port availability
|
|
netstat -tulpn | grep :502
|
|
```
|
|
|
|
#### Performance Issues
|
|
|
|
**High CPU Usage**:
|
|
```bash
|
|
# Identify resource usage
|
|
docker stats
|
|
|
|
# Check for runaway processes
|
|
ps aux | grep python
|
|
|
|
# Analyze database queries
|
|
psql "${DATABASE_URL}" -c "SELECT query, calls, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"
|
|
```
|
|
|
|
**Memory Issues**:
|
|
```bash
|
|
# Check memory usage
|
|
free -h
|
|
|
|
# Monitor application memory
|
|
docker stats control-adapter
|
|
|
|
# Check for memory leaks
|
|
journalctl -u docker --since "1 hour ago" | grep -i memory
|
|
```
|
|
|
|
### Diagnostic Tools
|
|
|
|
#### Log Analysis
|
|
|
|
```bash
|
|
# View recent errors
|
|
docker-compose logs control-adapter --since "1h" | grep -E "(ERROR|CRITICAL)"
|
|
|
|
# Search for specific patterns
|
|
docker-compose logs control-adapter | grep -i "connection"
|
|
|
|
# Export logs for analysis
|
|
docker-compose logs control-adapter > application-logs-$(date +%Y%m%d).log
|
|
```
|
|
|
|
#### Performance Analysis
|
|
|
|
```bash
|
|
# Run performance tests
|
|
./scripts/performance-test.sh
|
|
|
|
# Generate performance report
|
|
./scripts/performance-report.sh
|
|
|
|
# Monitor real-time performance
|
|
./scripts/monitor-performance.sh
|
|
```
|
|
|
|
#### Security Analysis
|
|
|
|
```bash
|
|
# Run security scan
|
|
./scripts/security-scan.sh
|
|
|
|
# Check compliance status
|
|
./scripts/compliance-check.sh
|
|
|
|
# Audit user activity
|
|
./scripts/audit-report.sh
|
|
```
|
|
|
|
## Security Operations
|
|
|
|
### Access Control
|
|
|
|
#### User Management
|
|
|
|
```bash
|
|
# List current users
|
|
curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/users
|
|
|
|
# Create new user
|
|
curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
|
|
-d '{"username":"newuser","role":"operator","email":"user@example.com"}' \
|
|
http://localhost:8080/api/v1/users
|
|
|
|
# Deactivate user
|
|
curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \
|
|
http://localhost:8080/api/v1/users/user123
|
|
```
|
|
|
|
#### Role Management
|
|
|
|
```bash
|
|
# View role permissions
|
|
curl -H "Authorization: Bearer ${TOKEN}" http://localhost:8080/api/v1/roles
|
|
|
|
# Update role permissions
|
|
curl -X PUT -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
|
|
-d '{"permissions":["read_pump_status","emergency_stop"]}' \
|
|
http://localhost:8080/api/v1/roles/operator
|
|
```
|
|
|
|
### Security Monitoring
|
|
|
|
#### Audit Log Review
|
|
|
|
```bash
|
|
# View recent security events
|
|
psql "${DATABASE_URL}" -c "SELECT * FROM compliance_audit_log WHERE severity IN ('HIGH','CRITICAL') ORDER BY timestamp DESC LIMIT 10;"
|
|
|
|
# Generate security report
|
|
./scripts/security-report.sh
|
|
|
|
# Monitor failed login attempts
|
|
psql "${DATABASE_URL}" -c "SELECT COUNT(*) FROM compliance_audit_log WHERE event_type='INVALID_AUTHENTICATION' AND timestamp > NOW() - INTERVAL '1 hour';"
|
|
```
|
|
|
|
#### Certificate Management
|
|
|
|
```bash
|
|
# Check certificate expiration
|
|
openssl x509 -in /app/certs/server.pem -enddate -noout
|
|
|
|
# Rotate certificates
|
|
./scripts/rotate-certificates.sh
|
|
|
|
# Verify certificate chain
|
|
openssl verify -CAfile /app/certs/ca.crt /app/certs/server.pem
|
|
```
|
|
|
|
## Compliance Operations
|
|
|
|
### Regulatory Compliance
|
|
|
|
#### IEC 62443 Compliance
|
|
|
|
```bash
|
|
# Generate compliance report
|
|
./scripts/iec62443-report.sh
|
|
|
|
# Verify security controls
|
|
./scripts/security-controls-check.sh
|
|
|
|
# Audit trail verification
|
|
./scripts/audit-trail-verification.sh
|
|
```
|
|
|
|
#### ISO 27001 Compliance
|
|
|
|
```bash
|
|
# ISO 27001 controls check
|
|
./scripts/iso27001-check.sh
|
|
|
|
# Risk assessment
|
|
./scripts/risk-assessment.sh
|
|
|
|
# Security policy compliance
|
|
./scripts/security-policy-check.sh
|
|
```
|
|
|
|
### Documentation and Reporting
|
|
|
|
#### Compliance Reports
|
|
|
|
```bash
|
|
# Generate monthly compliance report
|
|
./scripts/generate-compliance-report.sh
|
|
|
|
# Export audit logs
|
|
./scripts/export-audit-logs.sh
|
|
|
|
# Create security assessment
|
|
./scripts/security-assessment.sh
|
|
```
|
|
|
|
## Emergency Procedures
|
|
|
|
### Emergency Stop Operations
|
|
|
|
#### Manual Emergency Stop
|
|
|
|
```bash
|
|
# Activate emergency stop for station
|
|
curl -X POST -H "Authorization: Bearer ${TOKEN}" -H "Content-Type: application/json" \
|
|
-d '{"reason":"Emergency maintenance","operator":"operator001"}' \
|
|
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop
|
|
|
|
# Clear emergency stop
|
|
curl -X DELETE -H "Authorization: Bearer ${TOKEN}" \
|
|
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop
|
|
```
|
|
|
|
#### System Recovery
|
|
|
|
```bash
|
|
# Check emergency stop status
|
|
curl -H "Authorization: Bearer ${TOKEN}" \
|
|
http://localhost:8080/api/v1/pump-stations/station001/emergency-stop-status
|
|
|
|
# Verify system recovery
|
|
./scripts/emergency-recovery-check.sh
|
|
```
|
|
|
|
### Disaster Recovery
|
|
|
|
#### Full System Recovery
|
|
|
|
```bash
|
|
# 1. Stop all services
|
|
docker-compose down
|
|
|
|
# 2. Restore from latest backup
|
|
./scripts/restore-full.sh /var/backup/calejo/calejo-backup-latest.tar.gz
|
|
|
|
# 3. Start services
|
|
docker-compose up -d
|
|
|
|
# 4. Verify recovery
|
|
./scripts/health-check.sh
|
|
./scripts/emergency-recovery-verification.sh
|
|
```
|
|
|
|
#### Database Recovery
|
|
|
|
```bash
|
|
# 1. Stop database-dependent services
|
|
docker-compose stop control-adapter
|
|
|
|
# 2. Restore database
|
|
./scripts/restore-database.sh /var/backup/calejo/database/backup-latest.sql
|
|
|
|
# 3. Start services
|
|
docker-compose up -d
|
|
|
|
# 4. Verify data integrity
|
|
./scripts/database-integrity-check.sh
|
|
```
|
|
|
|
---
|
|
|
|
*This operations and maintenance guide provides comprehensive procedures for managing the Calejo Control Adapter system. Always follow documented procedures and maintain proper change control for all operational activities.* |