Senior Site Reliability Engineer
1 month ago
We are seeking a skilled Site Reliability Engineer to ensure the reliability, availability, and performance of our production systems. As an SRE, you will work closely with cross-functional teams to design and implement tools and processes to automate deployment, observability, and troubleshooting of our applications and infrastructure.
This individual must be skilled and have professional experience with the core functions of Site Reliability Engineering, including deployments, observability, monitoring, telemetry, and automation.
Please be sure to highlight your experience in these areas and how your technical experience matches the requirements below in your resume.
Responsibilities:
Ensure the reliability, availability, and performance of our production systems as we scale
Develop and maintain monitoring and alerting systems to detect and respond to incidents in a timely manner
Occasionally support planned deployment rollouts that may require working off-hours during store closure
Work with cross-functional teams to plan and execute scaling initiatives
Develop and maintain documentation of processes, procedures, and technical configurations
Requirements:
Strong written and verbal communication skills with peers, technical leads, project managers, and product owners
Must be able to collaborate with customers and cross-functional teams to design, test, and validate deliverables that meet or exceed expectations
Self-starter and highly motivated individual that is well-organized
Bachelor's degree in Computer Science or related field
5+ years of experience as a Site Reliability Engineer
Strong experience with automation tools and experience with automation scripting in Python
Experience with containerization technologies such as Docker and Kubernetes
Experience with cloud platforms such as Azure or AWS
Experience with monitoring and logging tools such as Datadog, Prometheus, Grafana, or Splunk
Strong understanding of networking, security, and systems administration
Excellent problem-solving skills and attention to detail
Must be available to work core hours PST.
Preferred qualifications:
Experience with distributed systems and supporting a large retail business
Experience with infrastructure as code tools such as Terraform or CloudFormation
Experience with CI/CD tools such as Jenkins
Experience with incident ticketing systems such as ServiceNow and Jira for tracking stories
Familiarity with Agile/Scrum methodologies and DevOps principles
If you are passionate about ensuring the reliability and availability of systems in our stores and enjoy collaborating with cross-functional teams to solve complex problems, we encourage you to apply for this exciting opportunity as an SRE.
-
Senior Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Veradigm Full timeWelcome to Veradigm, where our mission is to transform health through innovative solutions. We are seeking a highly skilled Senior Site Reliability Engineer to join our team and help us achieve our goals.As a Senior Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining robust, scalable, and reliable systems. You will...
-
Senior Site Reliability Engineer
3 weeks ago
Dallas, Texas, United States Capgemini Full timeSite Reliability Engineer Job DescriptionWe're seeking an experienced Site Reliability Engineer to join our team at Capgemini. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud...
-
Site Reliability Engineer
1 month ago
Dallas, Texas, United States Diverse Lynx Full timeJob Title: Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.**Key Responsibilities:*** Design, implement, and maintain scalable and reliable cloud...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Glow Networks Full timeSite Reliability Engineer (SRE for Datacenter)At Glow Networks, we are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability and performance of our datacenter infrastructure. Responsibilities:Data monitoring and alerting, data quality assurance, and anomaly...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Mastech Digital Full timeAbout the Role:We are seeking a skilled Site Reliability Engineer to join our team at Mastech Digital. As a Site Reliability Engineer, you will be responsible for ensuring the smooth operation of our IT systems and infrastructure.Key Responsibilities:Administration and troubleshooting in Linux and WindowsPatching and basic scripting skills (PowerShell,...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Diamondpick Full timeThe roleDiamondpick is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our services and platforms in a highly transactional 24x7 environment.Key Responsibilities:Monitor application performance and take steps to improve...
-
Senior Associate, Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Kyndryl Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Kyndryl. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our systems and services.Key ResponsibilitiesDesign and implement automated solutions to enhance the stability and security of our...
-
Site Reliability Engineer
1 month ago
Dallas, Texas, United States Motion Recruitment Partners LLC Full timeJob Title: Site Reliability Engineer - AzureJob Description:Motion Recruitment Partners LLC is seeking a highly skilled Site Reliability Engineer - Azure to join their team. The ideal candidate will have a strong background in monitoring and recovery of data systems, with experience in Azure and cloud infrastructure.Key Responsibilities:Develop and utilize...
-
Site Reliability Engineer, VP
4 weeks ago
Dallas, Texas, United States The Goldman Sachs Group Full timeJob SummaryAs a Site Reliability Engineer, VP at The Goldman Sachs Group, you will be responsible for ensuring the reliability and scalability of our Procmon Platform. This platform is a highly scalable and reliable ecosystem for scheduling business-critical jobs across the firm.Key ResponsibilitiesOwn technical operations for systems that manage hundreds of...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Bayone Full timeJob Title: Site Reliability Engineer - Cloud ExpertOverview:Bayone is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available and scalable applications deployed in Azure. You will work closely with development teams to ensure...
-
Senior InfoSec Site Reliability Engineer
3 weeks ago
Dallas, Texas, United States Aurora Innovation Full timeAurora Innovation is seeking a highly skilled Senior InfoSec Site Reliability Engineer to join our Identity and Access Management team.About the RoleThis is an exciting opportunity to design and support a variety of infrastructure solutions with the engineering and security teams, manage and evolve our public key infrastructure, and develop and utilize...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Goldman Sachs Full timeAbout the RoleWe are seeking a talented Site Reliability Engineer to join our team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the firm's cloud infrastructure. You will work closely with our development team to ensure the smooth operation of our systems and services.Key...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Goldman Sachs Full timeAbout This RoleAt Goldman Sachs, we're committed to building and running large-scale, massively distributed, fault-tolerant systems. As a Site Reliability Engineer, you'll play a critical role in ensuring the availability and reliability of our firm's most critical platform services.ResponsibilitiesDevelop and support automation tooling to improve the...
-
Senior InfoSec Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Aurora Innovation Full timeAbout the RoleAurora Innovation is seeking a highly skilled Senior InfoSec Site Reliability Engineer to join our team. As a key member of our InfoSec and Enterprise infrastructure services, you will be responsible for designing and supporting a variety of infrastructure solutions with the engineering and security teams.Key ResponsibilitiesManage and evolve...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Diverse Lynx Full timeJob Title: Site Reliability ManagerJob Summary:We are seeking a Site Reliability Manager with 8 to 12 years of experience to manage geospatial data projects, ensure data integrity, and leverage advanced technologies to drive business outcomes.Key Responsibilities:• Make monitoring and alerting notify on symptoms and not on outages.• Document findings to...
-
Principal Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States CARE Full timeAbout CARECARE is a consumer tech company with heart. We're on a mission to solve a human challenge we all face: finding great care for the ones we love. We're moms and dads and pet parents. Our culture and our products reflect that.Here, entrepreneurs, self-starters, team players, and big thinkers unite behind a common cause. We're applying data analytics,...
-
Principal Site Reliability Engineer
1 month ago
Dallas, Texas, United States Care Full timeJob OverviewCare.com is a leading provider of online services for finding family care and care jobs. We're seeking a highly skilled Principal Site Reliability Engineer to join our team and ensure the reliability, scalability, and performance of our critical systems.This is a leadership role that requires strong technical expertise and excellent communication...
-
Infrastructure Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States CVS Health Full timeJob SummaryAs a Site Reliability Engineer at CVS Health, you will play a critical role in designing, implementing, and managing the infrastructure systems and tools that enable reliability and performance of our technology platforms.This position requires a strong background in infrastructure engineering and a commitment to proactive monitoring,...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Motion Recruitment Full timeJob DescriptionOur client, a leading digital solutions provider, is seeking a Site Reliability Engineer to join their team in Dallas, Texas.This individual will be responsible for ensuring the stability and performance of their application, identifying areas for improvement, and implementing solutions to increase scalability and efficiency.The ideal...
-
Site Reliability Engineer
4 weeks ago
Dallas, Texas, United States Analytic Partners Full timeAnalytic Partners is a global leader in commercial measurement and optimization, turning data into expertise for the world's largest brands.Our holistic approach to decision-making is powered by our industry-leading platform and team of experts, who help leaders make better decisions, faster - unlocking business growth and creating powerful customer...