Site Reliability Engineering Manager

4 weeks ago


Dallas, United States Sharp Decisions Full time

NO 3RD PARTIES, NO C2C, NO H1B, NO RELOCATION


**CONTRACT TO HIRE***


Job Title: Manager, Site Reliability


Job Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and maintain reliable systems, implement best practices, and ensure seamless deployment processes. Your leadership will be pivotal in fostering a culture of reliability and continuous improvement.

Key Responsibilities:

  • Team Leadership:
  • Manage and mentor a team of SREs, providing guidance, performance feedback, and professional development opportunities.
  • Foster a collaborative and inclusive team environment, encouraging innovation and knowledge sharing.
  • System Reliability:
  • Design, implement, and maintain scalable, resilient, and high-performance systems.
  • Develop and enforce reliability standards, best practices, and processes across the organization.
  • Monitor and analyze system performance and reliability metrics, identifying areas for improvement.
  • Incident Management:
  • Lead incident response efforts, ensuring timely resolution of production issues.
  • Conduct root cause analysis and post-mortems to prevent recurrence and improve system robustness.
  • Develop and maintain incident response plans, including documentation and communication protocols.
  • Automation and Tooling:
  • Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime.
  • Implement and maintain monitoring, alerting, and logging tools to ensure visibility into system health.
  • Develop and maintain CI/CD pipelines to streamline deployment processes.
  • Collaboration and Communication:
  • Work closely with development teams to design and implement reliable and scalable applications.
  • Collaborate with product teams to understand requirements and ensure reliability considerations are integrated into the development process.
  • Communicate effectively with stakeholders, providing regular updates on system reliability and performance.
  • Security and Compliance:
  • Ensure systems adhere to security best practices and compliance requirements.
  • Conduct regular security assessments and audits, implementing necessary improvements.
  • Stay informed about emerging security threats and technologies, adapting practices as needed.


Qualifications:

  • Education and Experience:
  • Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree preferred.
  • 7+ years of experience in Site Reliability Engineering, DevOps, or related roles.
  • 3+ years of experience in a leadership or management position.
  • Technical Skills:
  • Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker).
  • Strong scripting and programming skills (Python, Go, Bash, etc.).
  • Experience with infrastructure as code (Terraform, Ansible, etc.) and configuration management tools.
  • Knowledge of networking, security, and database management.
  • Soft Skills:
  • Excellent leadership and team management abilities.
  • Strong problem-solving and analytical skills.
  • Effective communication and interpersonal skills.
  • Ability to work in a fast-paced, dynamic environment and manage multiple priorities.



  • Dallas, United States Sharp Decisions Full time

    NO 3RD PARTIES, NO C2C, NO H1B, NO RELOCATION**CONTRACT TO HIRE***Job Title: Manager, Site ReliabilityJob Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and...


  • Dallas, United States Creospan Full time

    Job Title: Site Reliability Engineer (SRE) and Java Developer Location: Dallas Job Summary: We are seeking a versatile and skilled professional who excels in both Site Reliability Engineering (SRE) and Java development. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of our platform while also contributing...


  • Dallas, United States OnwardPath Full time

    SRE (Site Reliability Engineer) Dallas, TX – Hybrid (Local Candidates Only) 6+ Months Contract Job Description (SRE) Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO’s and averting incidents altogether when possible. Collaborating with the customers to...


  • Dallas, United States Creospan Full time

    Job Title: Site Reliability Engineer (SRE) and Java Developer Location: Dallas Job Summary: We are seeking a versatile and skilled professional who excels in both Site Reliability Engineering (SRE) and Java development. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of our platform while also contributing...


  • Dallas, United States Creospan Inc. Full time

    Job Title: Site Reliability Engineer (SRE) and Java DeveloperLocation: DallasJob Summary: We are seeking a versatile and skilled professional who excels in both Site Reliability Engineering (SRE) and Java development. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of our platform while also contributing to...


  • Dallas, United States Creospan Inc. Full time

    Job Title: Site Reliability Engineer (SRE) and Java DeveloperLocation: DallasJob Summary: We are seeking a versatile and skilled professional who excels in both Site Reliability Engineering (SRE) and Java development. The ideal candidate will be responsible for ensuring the reliability, performance, and scalability of our platform while also contributing to...


  • Dallas, United States Donato Technologies Inc Full time

    Job Description Job Description Job Title: Site Reliability Engineer - Databricks & Snowflake Job Summary: We are looking for a skilled Site Reliability Engineer with expertise in optimizing Databricks jobs and Snowflake queries. As an SRE, you will play a critical role in ensuring the performance, reliability, and scalability of our data processing systems....


  • Dallas, United States OnwardPath Full time

    SRE (Site Reliability Engineer) Dallas, TX Hybrid (Local Candidates Only) 6+ Months Contract Job Description (SRE) Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLOs and averting incidents altogether when possible. Collaborating with the customers to...


  • Dallas, United States Onwardpath Full time

    SRE (Site Reliability Engineer)Dallas, TX – Hybrid (Local Candidates Only)6+ Months Contract Job Description (SRE)• Collaborating closely with engineering teams on building and enhancing tooling and automation solutions for faster resolution of issues impacting SLO’s and averting incidents altogether when possible.• Collaborating with the customers...


  • Dallas, United States Donato Technologies, Inc Full time

    Job DescriptionJob DescriptionJob Title: Site Reliability Engineer - Databricks & SnowflakeJob Summary: We are looking for a skilled Site Reliability Engineer with expertise in optimizing Databricks jobs and Snowflake queries. As an SRE, you will play a critical role in ensuring the performance, reliability, and scalability of our data processing systems....


  • Dallas, United States Collabera Full time

    Description Home Search Jobs Job Description Site Reliability Engineer Contract: Dallas, Texas, US Salary: $60.00 Per Hour Job Code: 350552 End Date: 2024-07-14 Days Left: 28 days, 3 hours left Apply Job Title: Cloud DevOps Engineer/Site Reliability EngineerDuration of project: 6+ Months + possible Extension Location: Remote Role Description: ...


  • Dallas, United States Collabera Full time

    Description Home Search Jobs Job Description Site Reliability Engineer Contract: Dallas, Texas, US Salary: $60.00 Per Hour Job Code: 350552 End Date: 2024-07-14 Days Left: 3 hours left Apply Job Title: Cloud DevOps Engineer/Site Reliability EngineerDuration of project: 6+ Months + possible Extension Location: Remote Role Description: Develops a...


  • Dallas, United States Diverse Lynx Full time

    Job Title: Site Reliability Engineer Location: Dallas, TX//Onsite Duration: Full Time-Only Job Description Responsible for ensuring the reliability of systems, minimizing downtime, and maintaining service-level objectives (SLOs). Developing, automation and implementing automation tools to streamline processes, deploy applications, and manage...


  • Dallas, United States Saxon Global Full time

    Job Summary: We are looking for a Site Reliability Engineer (SRE) who will be responsible for ensuring the reliability, availability, and performance of our production systems. As an SRE, you will work closely with cross development and engineering teams to design and implement tools and processes to automate deployment, observability, and troubleshooting...


  • Dallas, United States Saxon Global Full time

    As a member of the Production Support/SRE team you will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You'll excel if you have enthusiasm for digging deep, and a flare for technical communication, prioritization . You will work directly...


  • Dallas, United States Redwood Software Full time

    Important: We have been made aware that individuals are posing as Redwood recruiters in an attempt to deceive candidates into sharing personal information. Redwood employees will only contact you from an “@redwood.com” email domain. If you have questions or suspect an email is fraudulent, please contact us at recruitment@redwood.com . For this role, we...


  • Dallas, United States VIZIO Full time

    About the Team VIZIO releases firmware & software for millions of customers in a time efficient manner. Our goal is to maintain 99.9% uptime for our customers. We are seeking a Site Reliability Engineer to join our expanding organization. The Site Reliability Engineer will report to the Manager, DevOps Security and will play a crucial role in enhancing the...


  • Dallas, United States Tech Holding Full time

    Job DescriptionJob DescriptionAbout us:Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held...


  • Dallas, United States VIZIO Full time

    About the Team: VIZIO releases firmware & software for millions of customers in a time efficient manner. Our goal is to maintain 99.9% uptime for our customers. We are seeking a Site Reliability Engineer to join our expanding organization. The Site Reliability Engineer will report to the Manager, DevOps Security and will play a crucial role in enhancing the...


  • Dallas, United States Dice Full time

    Dice is the leading career destination for tech experts at every stage of their careers. Our client, Galaxy i Technologies, Inc., is seeking the following. Apply via Dice today! Site Reliability Engineer Dallas TX Onsite Full Time Skill: Site Reliability Engineer Ensures supported applications are functioning and available by minimizing downtime and...