We have other current jobs related to this field that you can find below


  • Dallas, United States Themesoft Inc. Full time

    Role: Site Reliability EngineerLocation: Dallas, TexasFull TimeSalary: $140,000 + Bonus+ BenefitsThe Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment....


  • Dallas, United States Themesoft Inc. Full time

    Role: Site Reliability EngineerLocation: Dallas, TexasFull TimeSalary: $140,000 + Bonus+ BenefitsThe Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment....


  • Dallas, United States Themesoft Inc. Full time

    The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. The roleMonitor application performance, take steps to improve overall application performance...


  • Dallas, United States Diamondpick Full time

    Hi,Hope you are doing well.Please find the below JD.Title: SRE EngineerLocation: Dallas, TX Type of Hire: Full TimeJob Description:The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a...


  • Dallas, United States Appspace Full time

    Your Role as a Site Reliability Engineer: Our Cloud Operations team seeks a Site Reliability Engineer who is passionate about problem-solving, automating, and maintaining Appspace’s Cloud Platform to support the needs of our Engineering and Customer Care teams. The ideal candidate will see manual work as an opportunity to exercise automation, will...


  • Dallas, United States VDart Inc Full time

    Job DescriptionJob DescriptionTitle: SRE / Site Reliability EngineerLocation: TX/Dallas Hybrid/OnsiteDuration: 1 YearSkillsHelp build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams.Apply automation and software to any tasks or parts of the system that would benefit from...


  • Dallas, United States Diverse Lynx Full time

    Job Title: Site Reliability Engineer Location: Dallas, TX//Onsite Duration: Full Time-Only Job Description Responsible for ensuring the reliability of systems, minimizing downtime, and maintaining service-level objectives (SLOs). Developing, automating, and implementing automation tools to streamline processes, deploy applications, and manage...


  • Dallas, United States Motion Recruitment Full time

    Job Description Our client, an independent services business that focuses on delivering a unified operating model for cloud, data, IoT and managed services, is looking for a Site Reliability Engineer who will be accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. This...


  • Dallas, United States Signify Health Full time

    How will this role have an Impact? Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We're seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and...


  • Dallas, United States Saxon Global Full time

    As a member of the Production Support/SRE team you will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You'll excel if you have enthusiasm for digging deep, and a flare for technical communication, prioritization . You will work directly...


  • Dallas, United States Signify Health Full time

    Job DescriptionJob DescriptionHow will this role have an Impact?Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We're seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team...


  • Dallas, United States Dice Full time

    Dice is the leading career destination for tech experts at every stage of their careers. Our client, Galaxy i Technologies, Inc., is seeking the following. Apply via Dice today! Site Reliability Engineer Location: Dallas TX Onsite Full Time Skill: Site Reliability Engineer Ensures supported applications are functioning and available by minimizing downtime...


  • Dallas, United States VIZIO Full time

    About the Team: VIZIO releases firmware & software for millions of customers in a time efficient manner. Our goal is to maintain 99.9% uptime for our customers. We are seeking a Site Reliability Engineer to join our expanding organization. The Site Reliability Engineer will report to the Manager, DevOps Security and will play a crucial role in enhancing the...


  • Dallas, United States Motion Recruitment Partners LLC Full time

    Our client, a large manager service provider focused on digital solutions and transformation, is looking for a Site Reliability Engineer to join their team. This person will be responsible for monitoring their application performance, making suggestions to improve performance and stability, and taking the lead on implementing those improvements. The ideal...


  • Dallas, United States Diverse Lynx Full time

    Role : Site Reliability Engineer/Devops Engineer Location : Dallas TX (Onsite) Duration: Full-time Job Description Skill: Site Reliability Engineer Ensures supported applications are functioning and available by minimizing downtime and maximizing performance. Provides technical expertise to the stakeholders and end user ensuring continuous...


  • Dallas, United States JPMorganChase Full time

    Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, Infrastructure platforms team, you...


  • Dallas, Texas, United States JPMorganChase Full time

    Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, Infrastructure platforms team, you will solve...


  • Dallas, United States Apple Full time

    Site Reliability Engineering (SRE) Manager - Apple Service Engineering Austin, Texas, United States Software and Services Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish! Join...


  • Dallas, United States Motion Recruitment Full time

    Our client, a large manager service provider focused on digital solutions and transformation, is looking for a Site Reliability Engineer to join their team. This person will be responsible for monitoring their application performance, making suggestions to improve performance and stability, and taking the lead on implementing those improvements. The ideal...


  • Dallas, United States Motion Recruitment Full time

    Job Description Our client, a large manager service provider focused on digital solutions and transformation, is looking for a Site Reliability Engineer to join their team. This individual will oversee the functionality and performance of their application, coming up with ideas to make it more stable and efficient, and leading the implementation of those...

Site Reliability Engineering Manager

2 months ago


Dallas, United States Sharp Decisions Full time

NO 3RD PARTIES, NO C2C, NO H1B, NO RELOCATION


**CONTRACT TO HIRE***


Job Title: Manager, Site Reliability


Job Summary: As the Manager, Site Reliability Engineer (SRE), you will lead a team of SREs responsible for the availability, performance, and scalability of our services. You will work closely with development, operations, and product teams to build and maintain reliable systems, implement best practices, and ensure seamless deployment processes. Your leadership will be pivotal in fostering a culture of reliability and continuous improvement.

Key Responsibilities:

  • Team Leadership:
  • Manage and mentor a team of SREs, providing guidance, performance feedback, and professional development opportunities.
  • Foster a collaborative and inclusive team environment, encouraging innovation and knowledge sharing.
  • System Reliability:
  • Design, implement, and maintain scalable, resilient, and high-performance systems.
  • Develop and enforce reliability standards, best practices, and processes across the organization.
  • Monitor and analyze system performance and reliability metrics, identifying areas for improvement.
  • Incident Management:
  • Lead incident response efforts, ensuring timely resolution of production issues.
  • Conduct root cause analysis and post-mortems to prevent recurrence and improve system robustness.
  • Develop and maintain incident response plans, including documentation and communication protocols.
  • Automation and Tooling:
  • Drive automation initiatives to reduce manual intervention, improve efficiency, and minimize downtime.
  • Implement and maintain monitoring, alerting, and logging tools to ensure visibility into system health.
  • Develop and maintain CI/CD pipelines to streamline deployment processes.
  • Collaboration and Communication:
  • Work closely with development teams to design and implement reliable and scalable applications.
  • Collaborate with product teams to understand requirements and ensure reliability considerations are integrated into the development process.
  • Communicate effectively with stakeholders, providing regular updates on system reliability and performance.
  • Security and Compliance:
  • Ensure systems adhere to security best practices and compliance requirements.
  • Conduct regular security assessments and audits, implementing necessary improvements.
  • Stay informed about emerging security threats and technologies, adapting practices as needed.


Qualifications:

  • Education and Experience:
  • Bachelor's degree in Computer Science, Engineering, or a related field; Master's degree preferred.
  • 7+ years of experience in Site Reliability Engineering, DevOps, or related roles.
  • 3+ years of experience in a leadership or management position.
  • Technical Skills:
  • Proficiency in cloud platforms (AWS, Google Cloud Platform, Azure) and container orchestration (Kubernetes, Docker).
  • Strong scripting and programming skills (Python, Go, Bash, etc.).
  • Experience with infrastructure as code (Terraform, Ansible, etc.) and configuration management tools.
  • Knowledge of networking, security, and database management.
  • Soft Skills:
  • Excellent leadership and team management abilities.
  • Strong problem-solving and analytical skills.
  • Effective communication and interpersonal skills.
  • Ability to work in a fast-paced, dynamic environment and manage multiple priorities.