Operational Excellence
Leadership and Prioritization Responsibilities
- Define and communicate priorities based on business needs and customer value.
- Coordinate cross-team collaboration for operational goals.
- Ensure teams are empowered to make timely decisions and escalate when necessary.
- Align team responsibilities with operational ownership and performance outcomes.
Roles:
Product Owner, Product Manager, Business Analyst, Project Manager, Team Lead, Executive Sponsor, Organizational Leadership
Engineering and Workload Management Responsibilities
- Implement observability through telemetry, logging, and distributed tracing.
- Enable frequent, small, and reversible changes to reduce deployment risks.
- Use deployment automation and rollback strategies to improve release quality.
- Continuously test workloads for operational readiness and resilience.
- Maintain version control, configuration management, and deployment consistency.
Roles:
DevOps Engineer, Site Reliability Engineer (SRE), Software Developer, Release Manager, Quality Assurance Engineer, Cloud Architect
Metrics, Alerts and Incident Management Responsibilities
- Define key performance indicators (KPIs) and operational metrics.
- Analyze workload behavior, performance, and alert patterns.
- Use dashboards and analytics to monitor system health and identify trends.
- Automate alerts and responses to reduce time to resolution.
- Establish escalation paths and communication plans for operational incidents.
Roles:
Operations Manager, DevOps Engineer, Data Analyst, Site Reliability Engineer, Incident Manager, Communication Specialist
Change, Risk, and Readiness Responsibilities
- Conduct readiness reviews and maintain operational runbooks.
- Plan and test for rollback and failure scenarios.
- Mitigate deployment risks through controlled release strategies.
- Assess impact of changes on workloads and business operations.
- Review configuration and ownership of all operational components.
Roles:
System Administrator, DevOps Engineer, Operations Analyst, Support Engineer, Release Manager, Quality Assurance Analyst
Post-Incident and Continuous Improvement Responsibilities
- Perform post-incident analysis to identify improvement areas.
- Capture lessons learned and embed them into future operations.
- Conduct metrics reviews to guide prioritization of improvements.
- Facilitate knowledge sharing and feedback loops across teams.
- Organize and lead game days to test response and recovery.
Roles:
Incident Response Manager, Data Analyst, Operations Manager, Continuous Improvement Specialist, Training Coordinator, Team Lead
Cost Optimisation Pillar
Financial & Cost Management Responsibilities
- Establish and maintain cloud budgets and forecasts.
- Provide cost transparency and accountability across teams.
- Monitor actual vs. forecast spend and analyze cost drivers.
- Drive cloud financial governance and optimization efforts.
- Promote a culture of cost awareness and ownership.
These responsibilities could be fulfilled by the following or similar roles in your organisation:
Cloud Financial Manager, Cloud Financial Analyst, IT Finance Officer, Cost Analyst, FinOps Specialist, Finance Manager, Finance Partner
Architecture & Engineering Responsibilities
- Design cloud architectures for cost efficiency and elasticity.
- Right-size resources and select pricing models appropriately.
- Implement automation to reduce manual cost overhead.
- Monitor and manage service usage dynamically.
- Ensure architecture aligns with organizational cost targets.
These responsibilities could be fulfilled by the following or similar roles in your organisation:
Cloud Architect, Solutions Architect, DevOps Engineer, AWS Solutions Architect, Cloud Administrator
Governance, Analytics & Operational Responsibilities
- Provide actionable cost reporting and analysis.
- Track usage against KPIs and optimize based on data trends.
- Enforce data retention and decommissioning policies.
- Configure tagging, account structure, and billing tools.
- Ensure compliance with internal cost governance standards.
These responsibilities could be fulfilled by the following or similar roles in your organisation:
Data Analyst, IT Operations Manager, Project Manager, Product Owner, Compliance Officer, Data Governance Officer, Department Head, Support Staff
Strategic Planning & Procurement Responsibilities
- Align cloud spending with business value and objectives.
- Drive initiatives that embed cost optimization into planning.
- Negotiate third-party contracts and optimize licensing terms.
- Advocate for cross-functional collaboration on cost decisions.
These responsibilities could be fulfilled by the following or similar roles in your organisation:
Business Stakeholder, Business Unit Leader, Operations Manager, Procurement Specialist, Team Lead
Performance Efficiency
Architecture and Resource Optimization Responsibilities
- Select cloud services, regions, and architectures based on performance and sustainability trade-offs.
- Benchmark and evaluate architecture decisions using performance data.
- Use guidance from cloud providers and partners to inform design patterns.
- Regularly review and update architecture to adapt to changes in usage and technology.
Roles:
Cloud Architect, Solutions Architect, Performance Engineer, DevOps Engineer, Technical Consultant, Project Manager
Compute and Storage Efficiency Responsibilities
- Select and configure compute resources to match workload requirements.
- Use dynamic scaling and right-sizing to optimize resource utilization.
- Collect and analyze compute-related metrics to support decision-making.
- Apply hardware acceleration and managed services where beneficial.
- Align compute and storage strategies with sustainability goals.
Roles:
Cloud Architect, DevOps Engineer, Application Developer, Systems Administrator, Financial Analyst, Cost Management Analyst
Data Access and Management Responsibilities
- Use purpose-built data stores for workload-specific access and storage patterns.
- Minimize data movement and evaluate caching or replication strategies.
- Collect performance metrics and manage the data lifecycle effectively.
- Implement shared access where possible to reduce redundancy and overhead.
Roles:
Data Architect, Database Administrator, Cloud Engineer, DevOps Engineer, Data Analyst, Application Developer
Networking Efficiency Responsibilities
- Evaluate and configure networking components for performance and reliability.
- Implement load balancing, optimized routing, and appropriate network protocols.
- Select the best connectivity options (e.g., VPN, Direct Connect) based on workload requirements.
- Monitor and improve network configuration based on observed metrics.
Roles:
Network Architect, Cloud Network Architect, Network Engineer, Security Engineer, DevOps Engineer, Cloud Solutions Architect
Performance Monitoring and Tuning Responsibilities
- Define KPIs and use monitoring tools to track workload performance.
- Review metrics regularly to identify bottlenecks and inefficiencies.
- Implement automation to remediate performance issues proactively.
- Conduct load testing and performance testing to validate system responsiveness.
Roles:
Performance Engineer, DevOps Engineer, Cloud Architect, Product Manager, QA Engineer, Data Analyst, SRE
Reliability
Quotas, Constraints, and Capacity Planning Responsibilities
- Monitor and manage service quotas across accounts and regions.
- Design architectures that account for service limits and fixed constraints.
- Automate quota monitoring and management where possible.
- Ensure sufficient buffer between usage and limits to support failover.
Roles:
Cloud Architect, DevOps Engineer, Compliance Officer, Infrastructure Manager, AWS Administrator, Project Manager, Operations Manager, Technical Support Engineer
Network Design and Topology Responsibilities
- Design resilient network topologies for high availability.
- Plan IP subnets to support future growth and avoid overlap.
- Use redundant connectivity between cloud and on-prem environments.
- Prefer hub-and-spoke over mesh network designs for simplicity and control.
Roles:
Network Architect, Cloud Engineer, Security Engineer, IT Operations Manager, DevOps Engineer
Service Architecture and Dependency Responsibilities
- Segment workloads into independent components.
- Design APIs with contracts and enforce idempotency.
- Apply fault isolation through bulkheads and stateless designs.
- Limit retries, control timeouts, and fail fast where necessary.
Roles:
Cloud Architect, Solution Architect, Software Developer, API Developer, DevOps Engineer, Product Owner, QA Engineer, Site Reliability Engineer (SRE)
Monitoring, Alerts, and Recovery Responsibilities
- Define and monitor workload metrics and key indicators.
- Implement distributed tracing, logging, and alerting.
- Automate responses to reduce recovery time.
- Use analytics to detect anomalies and trigger remediation.
Roles:
DevOps Engineer, Site Reliability Engineer (SRE), Cloud Operations Engineer, Data Analyst, Incident Response Team Member
Demand Adaptation and Auto Scaling Responsibilities
- Scale workloads dynamically based on load and demand.
- Test for performance degradation under high load.
- Detect impairments and provision failover automatically.
- Validate autoscaling configuration in production-like conditions.
Roles:
Cloud Architect, DevOps Engineer, Product Owner, Systems Administrator, SRE, Project Manager
Change Management and Resilience Testing Responsibilities
- Use automation, playbooks, and runbooks for predictable changes.
- Integrate testing (functional, scaling, resilience) into CI/CD pipelines.
- Deploy immutable infrastructure and validate rollback strategies.
- Conduct game days and post-incident reviews to improve readiness.
Roles:
DevOps Engineer, QA Engineer, System Administrator, Release Manager, SRE, Incident Response Manager, Communication Lead
Backup and Disaster Recovery Responsibilities
- Identify data to back up or reconstruct from source.
- Secure and validate backup integrity regularly.
- Define and test disaster recovery plans with clear objectives.
- Automate recovery to reduce RTO and configuration drift.
Roles:
Disaster Recovery Manager, Cloud Architect, Backup Administrator, System Administrator, Compliance Officer, Business Continuity Planner, Network Engineer
Security
Secure Operation Responsibilities
- Separate workloads using appropriate AWS accounts or organizational units.
- Secure root account credentials and apply strict access control policies.
- Keep up to date with security threats and recommendations.
- Identify, implement, and validate security control objectives.
- Establish cost-effective, automated methods for maintaining security posture.
Roles:
Cloud Security Architect, DevOps Engineer, Compliance Officer, Security Analyst, Security Operations Analyst, Incident Response Team Member, IT Manager
Identity and Access Management Responsibilities
- Enforce strong sign-in mechanisms and MFA.
- Use temporary credentials and centralized identity providers.
- Securely store and manage secrets.
- Define access requirements and implement least privilege principles.
- Audit and rotate credentials regularly and manage permissions through lifecycle.
- Define permission guardrails and secure cross-account or third-party sharing.
Roles:
IAM Administrator, Cloud Security Architect, Security Architect, DevOps Engineer, Compliance Officer, Security Administrator, Application Developer, IT Operations Manager
Security Monitoring and Event Response Responsibilities
- Implement actionable security events and alerting systems.
- Analyze logs, metrics, and findings centrally.
- Automate responses to detected threats.
- Conduct incident detection, triage, and escalation effectively.
- Maintain a security operations center or dedicated response team.
Roles:
Security Analyst, DevOps Engineer, Cloud Security Engineer, Incident Response Manager, Compliance Officer, IT Operations Manager, SOC Analyst
Network and Compute Protection Responsibilities
- Design network layers and control traffic at all levels.
- Implement inspection and automated protection mechanisms.
- Reduce attack surface and validate software integrity.
- Secure compute with managed services, automation, and remote administration.
Roles:
Network Architect, Security Engineer, Cloud Security Specialist, DevOps Engineer, System Administrator, IT Security Engineer
Data Classification and Protection Responsibilities
- Classify data and define data protection policies.
- Automate identification and lifecycle management of data.
- Secure data at rest using encryption and access controls.
- Secure data in transit through encryption and authentication.
- Minimize unnecessary data movement and visibility.
Roles:
Data Owner, Data Architect, Data Steward, Security and Compliance Manager, Cloud Architect, Data Analyst, Systems Administrator, Compliance Officer
Incident Preparedness and Recovery Responsibilities
- Identify key personnel and establish external support agreements.
- Develop and test incident response plans and playbooks.
- Prepare forensic capabilities and pre-deploy response tools.
- Run simulations and tabletop exercises.
- Establish feedback loops and a culture of learning from incidents.
Roles:
Incident Response Manager, Security Analyst, Forensic Specialist, Legal Advisor, Communications Officer, IT Operations Team, Training Coordinator, Executive Sponsor
Secure Development Lifecycle Responsibilities
- Embed security training and practices into development teams.
- Programmatically deploy secure and validated software.
- Automate testing and manual reviews throughout the pipeline.
- Centralize software dependency and artifact management.
- Assign security ownership within workload teams.
Roles:
DevOps Engineer, Software Developer, Security Analyst, Security Engineer, QA Engineer, Application Security Trainer, Security Champion, Compliance Officer, Project Manager
Sustainability
Region and Workload Placement Responsibilities
- Choose AWS Regions based on sustainability goals as well as business and latency requirements.
- Optimize geographic placement of workloads to reduce networking overhead and environmental impact.
Roles:
Cloud Architect, Business Analyst, Sustainability Officer, DevOps Engineer, Sustainability Lead, Infrastructure Architect, Network Engineer
Demand and Resource Alignment Responsibilities
- Dynamically scale infrastructure to align resource use with actual demand.
- Implement throttling or buffering strategies to flatten resource spikes.
- Stop creation or maintenance of unused assets and shadow resources.
- Align service-level agreements (SLAs) with sustainability objectives.
Roles:
Cloud Architect, DevOps Engineer, Operations Manager, Finance Manager, Product Owner, IT Procurement Manager, Governance and Compliance Lead, Data Analyst, Application Owner
Team and Process Efficiency Responsibilities
- Optimize team resourcing and activity alignment to sustainability goals.
- Use managed services and tools to reduce overhead and carbon footprint.
- Increase utilization of build environments and shared resources.
- Keep workloads and infrastructure up to date to avoid waste.
Roles:
Sustainability Program Manager, Procurement Specialist, IT Administrator, Infrastructure Team, Solutions Architect, Cloud Architect, DevOps Engineer, Application Developer
Software and Architecture Optimization Responsibilities
- Refactor or remove underused workload components.
- Optimize code to reduce compute cycles and improve efficiency.
- Design asynchronous or scheduled workloads to run during low-impact times.
- Consider the sustainability impact of devices, hardware, and client-side performance.
Roles:
Solutions Architect, Software Developer, DevOps Engineer, Product Owner, Business Analyst, Systems Engineer, Sustainability Specialist, Product Manager
Data Management and Storage Responsibilities
- Classify data to ensure efficient lifecycle management.
- Remove redundant or obsolete data to minimize storage needs.
- Use shared storage systems for common datasets.
- Back up data only when difficult to recreate or when required.
- Use elasticity and automation to scale storage based on demand.
- Minimize unnecessary data movement across regions and networks.
Roles:
Data Architect, Data Engineer, DevOps Engineer, Compliance Officer, Cloud Administrator, Data Steward, Storage Engineer, Product Manager, Security and Compliance Manager, Data Classification Owner
Hardware and Service Efficiency Responsibilities
- Use the minimum amount of hardware required to meet workload needs.
- Select instance types and hardware accelerators with lower environmental impact.
- Prefer managed services to reduce operational and infrastructure overhead.
Roles:
Chief Technology Officer, Solutions Architect, DevOps Engineer, Infrastructure Manager, IT Operations Manager, Cloud Architect, Sustainability Manager
Continuous Improvement and Organizational Support Responsibilities
- Adopt agile methods that support rapid sustainability enhancements.
- Centralize sustainability practices in day-to-day operations.
- Conduct regular reviews to identify areas for operational efficiency.
- Use managed device farms and shared environments to reduce impact of test operations.
Roles:
Sustainability Lead, Product Owner, QA Specialist, Cloud Architect, DevOps Engineer, Sustainability Champion, Sustainability Manager, Quality Assurance Engineer