SRE Architect

3 weeks ago


St Louis, United States Advantage Solutions Full time
As an SRE Architect with a specialization in Devops, monitoring and diagnostics, you will play a critical role in ensuring the reliability, availability, and performance of our mission-critical services. You will design and implement end-to-end monitoring solutions, build observability pipelines, and help create scalable systems for proactive incident detection, diagnostics, and root cause analysis. In this role, you will work closely with engineering, product, and operations teams to drive a culture of reliability and continuous improvement.

Monitoring & Observability: Design and implement comprehensive monitoring and alerting solutions for production systems across multiple environments (cloud, on-prem, hybrid). Develop and refine metrics collection and visualization strategies using tools like Prometheus, Grafana, OpenTelemetry, and others. Build dashboards and custom monitoring solutions to ensure system health, performance, and security. Establish SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) to align with business goals. Incident Management & Diagnostics: Develop and implement tools and systems for real-time diagnostics and root cause analysis during incidents. Lead post-mortem analysis and drive remediation of systemic issues to prevent future incidents. Design diagnostic tools and automation to reduce mean time to detection (MTTD) and mean time to resolution (MTTR). Collaborate with engineering teams to define monitoring standards and ensure that new features and services meet reliability and observability requirements. System Design & Architecture: Architect scalable, resilient, and highly available systems with observability baked in from the start. Apply SRE principles to design and optimize services for reliability, availability, and performance. Identify and address single points of failure, bottlenecks, and other operational risks in production environments. Automation & Tooling: Create, maintain, and improve automation tools that enhance monitoring, diagnostics, and incident response. Integrate monitoring and observability tools into CI/CD pipelines for proactive issue detection and remediation. Contribute to the development of custom diagnostic tools for troubleshooting complex, distributed systems. Collaboration & Knowledge Sharing: Collaborate with software engineering, platform engineering, and DevOps teams to ensure seamless integration of monitoring and diagnostics practices. Mentor and coach junior SREs and other team members on best practices for observability and incident management. Stay up-to-date with the latest industry trends and innovations in monitoring, diagnostics, and reliability engineering.

Education & Training Experience: Experience with advanced observability techniques, such as synthetic monitoring, canary deployments, and feature flags. Certification in cloud platforms (AWS, GCP, Azure), or monitoring tools (e.g., Prometheus Certified Associate). Previous experience in an SRE or DevOps leadership role. Knowledge of serverless architecture, microservices, and edge computing environments. Strong experience in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes, Docker). Deep knowledge of monitoring tools such as Datadog and Cloud Monitoring Proficient in instrumentation techniques (e.g., OpenTelemetry, StatsD, custom metrics). Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar. Expertise in alerting and notification systems, including PagerDuty, Opsgenie, or VictorOps.

Architect position

This position is an individual contributor.

Travel required: 5%

  • St Louis, United States Adaptive Technology Insights Full time

    Job OverviewWe are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization.Key ResponsibilitiesDesign and implement robust CI/CD...


  • St Louis, United States Adaptive Technology Insights Full time

    Job OverviewWe are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization.Key ResponsibilitiesDesign and implement robust CI/CD...


  • St Louis, United States LSEG (London Stock Exchange Group) Full time

    LSEG (London Stock Exchange Group) is a world-leading financial markets infrastructure and data business. We are dedicated, open-access partners with a commitment to excellence in delivering services across Data & Analytics, Capital Markets, and Post Trade. Backed by three hundred years of experience, innovative technologies, and a team of over 23,000...


  • Saint Louis, MO, United States Adaptive Technology Insights Full time

    Job Overview We are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization. Key Responsibilities Design and implement robust CI/CD...


  • St Louis, United States Technology Partners Full time

    Technology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...


  • St Louis, United States Technology Partners Full time

    Technology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...


  • St Louis, United States Technology Partners Full time

    Technology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...


  • St Louis, United States Technology Partners Full time

    Technology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...


  • St Louis, United States Federal Reserve Bank Full time

    Company Federal Reserve Bank of St. Louis Reporting directly to the operations manager, the analyst works closely with a dedicated team comprised of operational support staff, developers, testers, business analysts, product owners, and scrum masters to prioritize, refine, and deliver quality solutions for the Central Accounting Reporting business line. The...