Responsibilities

Operational Excellence

Leadership and Prioritization Responsibilities

  • Define and communicate priorities based on business needs and customer value.
  • Coordinate cross-team collaboration for operational goals.
  • Ensure teams are empowered to make timely decisions and escalate when necessary.
  • Align team responsibilities with operational ownership and performance outcomes.

Roles:
Product Owner, Product Manager, Business Analyst, Project Manager, Team Lead, Executive Sponsor, Organizational Leadership

Engineering and Workload Management Responsibilities

  • Implement observability through telemetry, logging, and distributed tracing.
  • Enable frequent, small, and reversible changes to reduce deployment risks.
  • Use deployment automation and rollback strategies to improve release quality.
  • Continuously test workloads for operational readiness and resilience.
  • Maintain version control, configuration management, and deployment consistency.

Roles:
DevOps Engineer, Site Reliability Engineer (SRE), Software Developer, Release Manager, Quality Assurance Engineer, Cloud Architect

Metrics, Alerts and Incident Management Responsibilities

  • Define key performance indicators (KPIs) and operational metrics.
  • Analyze workload behavior, performance, and alert patterns.
  • Use dashboards and analytics to monitor system health and identify trends.
  • Automate alerts and responses to reduce time to resolution.
  • Establish escalation paths and communication plans for operational incidents.

Roles:
Operations Manager, DevOps Engineer, Data Analyst, Site Reliability Engineer, Incident Manager, Communication Specialist

Change, Risk, and Readiness Responsibilities

  • Conduct readiness reviews and maintain operational runbooks.
  • Plan and test for rollback and failure scenarios.
  • Mitigate deployment risks through controlled release strategies.
  • Assess impact of changes on workloads and business operations.
  • Review configuration and ownership of all operational components.

Roles:
System Administrator, DevOps Engineer, Operations Analyst, Support Engineer, Release Manager, Quality Assurance Analyst

Post-Incident and Continuous Improvement Responsibilities

  • Perform post-incident analysis to identify improvement areas.
  • Capture lessons learned and embed them into future operations.
  • Conduct metrics reviews to guide prioritization of improvements.
  • Facilitate knowledge sharing and feedback loops across teams.
  • Organize and lead game days to test response and recovery.

Roles:
Incident Response Manager, Data Analyst, Operations Manager, Continuous Improvement Specialist, Training Coordinator, Team Lead

Cost Optimisation Pillar

Financial & Cost Management Responsibilities

  • Establish and maintain cloud budgets and forecasts.
  • Provide cost transparency and accountability across teams.
  • Monitor actual vs. forecast spend and analyze cost drivers.
  • Drive cloud financial governance and optimization efforts.
  • Promote a culture of cost awareness and ownership.

These responsibilities could be fulfilled by the following or similar roles in your organisation:
Cloud Financial Manager, Cloud Financial Analyst, IT Finance Officer, Cost Analyst, FinOps Specialist, Finance Manager, Finance Partner

Architecture & Engineering Responsibilities

  • Design cloud architectures for cost efficiency and elasticity.
  • Right-size resources and select pricing models appropriately.
  • Implement automation to reduce manual cost overhead.
  • Monitor and manage service usage dynamically.
  • Ensure architecture aligns with organizational cost targets.

These responsibilities could be fulfilled by the following or similar roles in your organisation:
Cloud Architect, Solutions Architect, DevOps Engineer, AWS Solutions Architect, Cloud Administrator

Governance, Analytics & Operational Responsibilities

  • Provide actionable cost reporting and analysis.
  • Track usage against KPIs and optimize based on data trends.
  • Enforce data retention and decommissioning policies.
  • Configure tagging, account structure, and billing tools.
  • Ensure compliance with internal cost governance standards.

These responsibilities could be fulfilled by the following or similar roles in your organisation:
Data Analyst, IT Operations Manager, Project Manager, Product Owner, Compliance Officer, Data Governance Officer, Department Head, Support Staff

Strategic Planning & Procurement Responsibilities

  • Align cloud spending with business value and objectives.
  • Drive initiatives that embed cost optimization into planning.
  • Negotiate third-party contracts and optimize licensing terms.
  • Advocate for cross-functional collaboration on cost decisions.

These responsibilities could be fulfilled by the following or similar roles in your organisation:
Business Stakeholder, Business Unit Leader, Operations Manager, Procurement Specialist, Team Lead

Performance Efficiency

Architecture and Resource Optimization Responsibilities

  • Select cloud services, regions, and architectures based on performance and sustainability trade-offs.
  • Benchmark and evaluate architecture decisions using performance data.
  • Use guidance from cloud providers and partners to inform design patterns.
  • Regularly review and update architecture to adapt to changes in usage and technology.

Roles:
Cloud Architect, Solutions Architect, Performance Engineer, DevOps Engineer, Technical Consultant, Project Manager

Compute and Storage Efficiency Responsibilities

  • Select and configure compute resources to match workload requirements.
  • Use dynamic scaling and right-sizing to optimize resource utilization.
  • Collect and analyze compute-related metrics to support decision-making.
  • Apply hardware acceleration and managed services where beneficial.
  • Align compute and storage strategies with sustainability goals.

Roles:
Cloud Architect, DevOps Engineer, Application Developer, Systems Administrator, Financial Analyst, Cost Management Analyst

Data Access and Management Responsibilities

  • Use purpose-built data stores for workload-specific access and storage patterns.
  • Minimize data movement and evaluate caching or replication strategies.
  • Collect performance metrics and manage the data lifecycle effectively.
  • Implement shared access where possible to reduce redundancy and overhead.

Roles:
Data Architect, Database Administrator, Cloud Engineer, DevOps Engineer, Data Analyst, Application Developer

Networking Efficiency Responsibilities

  • Evaluate and configure networking components for performance and reliability.
  • Implement load balancing, optimized routing, and appropriate network protocols.
  • Select the best connectivity options (e.g., VPN, Direct Connect) based on workload requirements.
  • Monitor and improve network configuration based on observed metrics.

Roles:
Network Architect, Cloud Network Architect, Network Engineer, Security Engineer, DevOps Engineer, Cloud Solutions Architect

Performance Monitoring and Tuning Responsibilities

  • Define KPIs and use monitoring tools to track workload performance.
  • Review metrics regularly to identify bottlenecks and inefficiencies.
  • Implement automation to remediate performance issues proactively.
  • Conduct load testing and performance testing to validate system responsiveness.

Roles:
Performance Engineer, DevOps Engineer, Cloud Architect, Product Manager, QA Engineer, Data Analyst, SRE

Reliability

Quotas, Constraints, and Capacity Planning Responsibilities

  • Monitor and manage service quotas across accounts and regions.
  • Design architectures that account for service limits and fixed constraints.
  • Automate quota monitoring and management where possible.
  • Ensure sufficient buffer between usage and limits to support failover.

Roles:
Cloud Architect, DevOps Engineer, Compliance Officer, Infrastructure Manager, AWS Administrator, Project Manager, Operations Manager, Technical Support Engineer

Network Design and Topology Responsibilities

  • Design resilient network topologies for high availability.
  • Plan IP subnets to support future growth and avoid overlap.
  • Use redundant connectivity between cloud and on-prem environments.
  • Prefer hub-and-spoke over mesh network designs for simplicity and control.

Roles:
Network Architect, Cloud Engineer, Security Engineer, IT Operations Manager, DevOps Engineer

Service Architecture and Dependency Responsibilities

  • Segment workloads into independent components.
  • Design APIs with contracts and enforce idempotency.
  • Apply fault isolation through bulkheads and stateless designs.
  • Limit retries, control timeouts, and fail fast where necessary.

Roles:
Cloud Architect, Solution Architect, Software Developer, API Developer, DevOps Engineer, Product Owner, QA Engineer, Site Reliability Engineer (SRE)

Monitoring, Alerts, and Recovery Responsibilities

  • Define and monitor workload metrics and key indicators.
  • Implement distributed tracing, logging, and alerting.
  • Automate responses to reduce recovery time.
  • Use analytics to detect anomalies and trigger remediation.

Roles:
DevOps Engineer, Site Reliability Engineer (SRE), Cloud Operations Engineer, Data Analyst, Incident Response Team Member

Demand Adaptation and Auto Scaling Responsibilities

  • Scale workloads dynamically based on load and demand.
  • Test for performance degradation under high load.
  • Detect impairments and provision failover automatically.
  • Validate autoscaling configuration in production-like conditions.

Roles:
Cloud Architect, DevOps Engineer, Product Owner, Systems Administrator, SRE, Project Manager

Change Management and Resilience Testing Responsibilities

  • Use automation, playbooks, and runbooks for predictable changes.
  • Integrate testing (functional, scaling, resilience) into CI/CD pipelines.
  • Deploy immutable infrastructure and validate rollback strategies.
  • Conduct game days and post-incident reviews to improve readiness.

Roles:
DevOps Engineer, QA Engineer, System Administrator, Release Manager, SRE, Incident Response Manager, Communication Lead

Backup and Disaster Recovery Responsibilities

  • Identify data to back up or reconstruct from source.
  • Secure and validate backup integrity regularly.
  • Define and test disaster recovery plans with clear objectives.
  • Automate recovery to reduce RTO and configuration drift.

Roles:
Disaster Recovery Manager, Cloud Architect, Backup Administrator, System Administrator, Compliance Officer, Business Continuity Planner, Network Engineer

Security

Secure Operation Responsibilities

  • Separate workloads using appropriate AWS accounts or organizational units.
  • Secure root account credentials and apply strict access control policies.
  • Keep up to date with security threats and recommendations.
  • Identify, implement, and validate security control objectives.
  • Establish cost-effective, automated methods for maintaining security posture.

Roles:
Cloud Security Architect, DevOps Engineer, Compliance Officer, Security Analyst, Security Operations Analyst, Incident Response Team Member, IT Manager

Identity and Access Management Responsibilities

  • Enforce strong sign-in mechanisms and MFA.
  • Use temporary credentials and centralized identity providers.
  • Securely store and manage secrets.
  • Define access requirements and implement least privilege principles.
  • Audit and rotate credentials regularly and manage permissions through lifecycle.
  • Define permission guardrails and secure cross-account or third-party sharing.

Roles:
IAM Administrator, Cloud Security Architect, Security Architect, DevOps Engineer, Compliance Officer, Security Administrator, Application Developer, IT Operations Manager

Security Monitoring and Event Response Responsibilities

  • Implement actionable security events and alerting systems.
  • Analyze logs, metrics, and findings centrally.
  • Automate responses to detected threats.
  • Conduct incident detection, triage, and escalation effectively.
  • Maintain a security operations center or dedicated response team.

Roles:
Security Analyst, DevOps Engineer, Cloud Security Engineer, Incident Response Manager, Compliance Officer, IT Operations Manager, SOC Analyst

Network and Compute Protection Responsibilities

  • Design network layers and control traffic at all levels.
  • Implement inspection and automated protection mechanisms.
  • Reduce attack surface and validate software integrity.
  • Secure compute with managed services, automation, and remote administration.

Roles:
Network Architect, Security Engineer, Cloud Security Specialist, DevOps Engineer, System Administrator, IT Security Engineer

Data Classification and Protection Responsibilities

  • Classify data and define data protection policies.
  • Automate identification and lifecycle management of data.
  • Secure data at rest using encryption and access controls.
  • Secure data in transit through encryption and authentication.
  • Minimize unnecessary data movement and visibility.

Roles:
Data Owner, Data Architect, Data Steward, Security and Compliance Manager, Cloud Architect, Data Analyst, Systems Administrator, Compliance Officer

Incident Preparedness and Recovery Responsibilities

  • Identify key personnel and establish external support agreements.
  • Develop and test incident response plans and playbooks.
  • Prepare forensic capabilities and pre-deploy response tools.
  • Run simulations and tabletop exercises.
  • Establish feedback loops and a culture of learning from incidents.

Roles:
Incident Response Manager, Security Analyst, Forensic Specialist, Legal Advisor, Communications Officer, IT Operations Team, Training Coordinator, Executive Sponsor

Secure Development Lifecycle Responsibilities

  • Embed security training and practices into development teams.
  • Programmatically deploy secure and validated software.
  • Automate testing and manual reviews throughout the pipeline.
  • Centralize software dependency and artifact management.
  • Assign security ownership within workload teams.

Roles:
DevOps Engineer, Software Developer, Security Analyst, Security Engineer, QA Engineer, Application Security Trainer, Security Champion, Compliance Officer, Project Manager

Sustainability

Region and Workload Placement Responsibilities

  • Choose AWS Regions based on sustainability goals as well as business and latency requirements.
  • Optimize geographic placement of workloads to reduce networking overhead and environmental impact.

Roles:
Cloud Architect, Business Analyst, Sustainability Officer, DevOps Engineer, Sustainability Lead, Infrastructure Architect, Network Engineer

Demand and Resource Alignment Responsibilities

  • Dynamically scale infrastructure to align resource use with actual demand.
  • Implement throttling or buffering strategies to flatten resource spikes.
  • Stop creation or maintenance of unused assets and shadow resources.
  • Align service-level agreements (SLAs) with sustainability objectives.

Roles:
Cloud Architect, DevOps Engineer, Operations Manager, Finance Manager, Product Owner, IT Procurement Manager, Governance and Compliance Lead, Data Analyst, Application Owner

Team and Process Efficiency Responsibilities

  • Optimize team resourcing and activity alignment to sustainability goals.
  • Use managed services and tools to reduce overhead and carbon footprint.
  • Increase utilization of build environments and shared resources.
  • Keep workloads and infrastructure up to date to avoid waste.

Roles:
Sustainability Program Manager, Procurement Specialist, IT Administrator, Infrastructure Team, Solutions Architect, Cloud Architect, DevOps Engineer, Application Developer

Software and Architecture Optimization Responsibilities

  • Refactor or remove underused workload components.
  • Optimize code to reduce compute cycles and improve efficiency.
  • Design asynchronous or scheduled workloads to run during low-impact times.
  • Consider the sustainability impact of devices, hardware, and client-side performance.

Roles:
Solutions Architect, Software Developer, DevOps Engineer, Product Owner, Business Analyst, Systems Engineer, Sustainability Specialist, Product Manager

Data Management and Storage Responsibilities

  • Classify data to ensure efficient lifecycle management.
  • Remove redundant or obsolete data to minimize storage needs.
  • Use shared storage systems for common datasets.
  • Back up data only when difficult to recreate or when required.
  • Use elasticity and automation to scale storage based on demand.
  • Minimize unnecessary data movement across regions and networks.

Roles:
Data Architect, Data Engineer, DevOps Engineer, Compliance Officer, Cloud Administrator, Data Steward, Storage Engineer, Product Manager, Security and Compliance Manager, Data Classification Owner

Hardware and Service Efficiency Responsibilities

  • Use the minimum amount of hardware required to meet workload needs.
  • Select instance types and hardware accelerators with lower environmental impact.
  • Prefer managed services to reduce operational and infrastructure overhead.

Roles:
Chief Technology Officer, Solutions Architect, DevOps Engineer, Infrastructure Manager, IT Operations Manager, Cloud Architect, Sustainability Manager

Continuous Improvement and Organizational Support Responsibilities

  • Adopt agile methods that support rapid sustainability enhancements.
  • Centralize sustainability practices in day-to-day operations.
  • Conduct regular reviews to identify areas for operational efficiency.
  • Use managed device farms and shared environments to reduce impact of test operations.

Roles:
Sustainability Lead, Product Owner, QA Specialist, Cloud Architect, DevOps Engineer, Sustainability Champion, Sustainability Manager, Quality Assurance Engineer