SRE Architect
3 weeks ago
Monitoring & Observability: Design and implement comprehensive monitoring and alerting solutions for production systems across multiple environments (cloud, on-prem, hybrid). Develop and refine metrics collection and visualization strategies using tools like Prometheus, Grafana, OpenTelemetry, and others. Build dashboards and custom monitoring solutions to ensure system health, performance, and security. Establish SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) to align with business goals. Incident Management & Diagnostics: Develop and implement tools and systems for real-time diagnostics and root cause analysis during incidents. Lead post-mortem analysis and drive remediation of systemic issues to prevent future incidents. Design diagnostic tools and automation to reduce mean time to detection (MTTD) and mean time to resolution (MTTR). Collaborate with engineering teams to define monitoring standards and ensure that new features and services meet reliability and observability requirements. System Design & Architecture: Architect scalable, resilient, and highly available systems with observability baked in from the start. Apply SRE principles to design and optimize services for reliability, availability, and performance. Identify and address single points of failure, bottlenecks, and other operational risks in production environments. Automation & Tooling: Create, maintain, and improve automation tools that enhance monitoring, diagnostics, and incident response. Integrate monitoring and observability tools into CI/CD pipelines for proactive issue detection and remediation. Contribute to the development of custom diagnostic tools for troubleshooting complex, distributed systems. Collaboration & Knowledge Sharing: Collaborate with software engineering, platform engineering, and DevOps teams to ensure seamless integration of monitoring and diagnostics practices. Mentor and coach junior SREs and other team members on best practices for observability and incident management. Stay up-to-date with the latest industry trends and innovations in monitoring, diagnostics, and reliability engineering.
Education & Training Experience: Experience with advanced observability techniques, such as synthetic monitoring, canary deployments, and feature flags. Certification in cloud platforms (AWS, GCP, Azure), or monitoring tools (e.g., Prometheus Certified Associate). Previous experience in an SRE or DevOps leadership role. Knowledge of serverless architecture, microservices, and edge computing environments. Strong experience in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes, Docker). Deep knowledge of monitoring tools such as Datadog and Cloud Monitoring Proficient in instrumentation techniques (e.g., OpenTelemetry, StatsD, custom metrics). Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar. Expertise in alerting and notification systems, including PagerDuty, Opsgenie, or VictorOps.
Architect position
This position is an individual contributor.
Travel required: 5%
-
DevOps Architect with CI/CD Expertise
3 weeks ago
St Louis, United States Adaptive Technology Insights Full timeJob OverviewWe are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization.Key ResponsibilitiesDesign and implement robust CI/CD...
-
DevOps Architect with CI/CD Expertise
2 days ago
St Louis, United States Adaptive Technology Insights Full timeJob OverviewWe are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization.Key ResponsibilitiesDesign and implement robust CI/CD...
-
St Louis, United States LSEG (London Stock Exchange Group) Full timeLSEG (London Stock Exchange Group) is a world-leading financial markets infrastructure and data business. We are dedicated, open-access partners with a commitment to excellence in delivering services across Data & Analytics, Capital Markets, and Post Trade. Backed by three hundred years of experience, innovative technologies, and a team of over 23,000...
-
DevOps Architect with CI/CD Expertise
2 days ago
Saint Louis, MO, United States Adaptive Technology Insights Full timeJob Overview We are seeking a highly skilled CI/CD Architect/Expert to lead the design, implementation, and management of continuous integration and continuous deployment pipelines. This role is pivotal in ensuring efficient, reliable, and scalable software delivery processes across the organization. Key Responsibilities Design and implement robust CI/CD...
-
Site Reliability Engineer
3 weeks ago
St Louis, United States Technology Partners Full timeTechnology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...
-
Site Reliability Engineer
4 weeks ago
St Louis, United States Technology Partners Full timeTechnology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...
-
Site Reliability Engineer
4 weeks ago
St Louis, United States Technology Partners Full timeTechnology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...
-
Site Reliability Engineer
4 weeks ago
St Louis, United States Technology Partners Full timeTechnology Partners is currently seeking a talented Site Reliability Engineer. Do you have experience building and scaling highly available AWS cloud architectures with IaC?Let us help you make your next big career move a reality!Why Join Us?Make a Real Impact: Collaborate directly with the Department of Treasury's Bureau of the Fiscal Service on a...
-
Senior Site Reliability Analyst
4 weeks ago
St Louis, United States Federal Reserve Bank Full timeCompany Federal Reserve Bank of St. Louis Reporting directly to the operations manager, the analyst works closely with a dedicated team comprised of operational support staff, developers, testers, business analysts, product owners, and scrum masters to prioritize, refine, and deliver quality solutions for the Central Accounting Reporting business line. The...