We have other current jobs related to this field that you can find below


  • Dallas, Texas, United States JPMorganChase Full time

    Job Description Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability.As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Corporate Sector, Infrastructure Platforms organization, you work with your fellow stakeholders...


  • Dallas, Texas, United States Cognizant Full time

    Senior Site Reliability Engineer (Hybrid) Cognizant stands as a prominent global entity delivering IT solutions, encompassing digital transformation, technology services, consulting, and operational support. At Cognizant, we embrace innovative thinking and explore new concepts daily. Our mission is to assist leading enterprises in reimagining their...


  • Dallas, United States Themesoft Inc. Full time

    Role: Site Reliability EngineerLocation: Dallas, TexasFull TimeSalary: $140,000 + Bonus+ BenefitsThe Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment....


  • Dallas, United States Themesoft Inc. Full time

    Role: Site Reliability EngineerLocation: Dallas, TexasFull TimeSalary: $140,000 + Bonus+ BenefitsThe Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment....


  • Dallas, United States Appspace Full time

    Your Role as a Site Reliability Engineer: Our Cloud Operations team seeks a Site Reliability Engineer who is passionate about problem-solving, automating, and maintaining Appspace’s Cloud Platform to support the needs of our Engineering and Customer Care teams. The ideal candidate will see manual work as an opportunity to exercise automation, will...


  • Dallas, United States Diamondpick Full time

    Hi,Hope you are doing well.Please find the below JD.Title: SRE EngineerLocation: Dallas, TX Type of Hire: Full TimeJob Description:The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a...


  • Dallas, United States Themesoft Inc. Full time

    The Site Reliability Engineer is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. The roleMonitor application performance, take steps to improve overall application performance...


  • Dallas, United States Dice Full time

    Dice is the leading career destination for tech experts at every stage of their careers. Our client, Galaxy i Technologies, Inc., is seeking the following. Apply via Dice today! Site Reliability Engineer Location: Dallas TX Onsite Full Time Skill: Site Reliability Engineer Ensures supported applications are functioning and available by minimizing downtime...


  • Dallas, United States Diverse Lynx Full time

    Role : Site Reliability Engineer/Devops Engineer Location : Dallas TX (Onsite) Duration: Full-time Job Description Skill: Site Reliability Engineer Ensures supported applications are functioning and available by minimizing downtime and maximizing performance. Provides technical expertise to the stakeholders and end user ensuring continuous...


  • Dallas, United States VDart Inc Full time

    Job DescriptionJob DescriptionTitle: SRE / Site Reliability EngineerLocation: TX/Dallas Hybrid/OnsiteDuration: 1 YearSkillsHelp build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams.Apply automation and software to any tasks or parts of the system that would benefit from...


  • Dallas, United States Saxon Global Full time

    As a member of the Production Support/SRE team you will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You'll excel if you have enthusiasm for digging deep, and a flare for technical communication, prioritization . You will work directly...


  • Dallas, United States Motion Recruitment Full time

    Job Description Our client, an independent services business that focuses on delivering a unified operating model for cloud, data, IoT and managed services, is looking for a Site Reliability Engineer who will be accountable for the availability, reliability, and performance of the services and platforms in a highly transactional 24x7 environment. This...


  • Dallas, United States Diverse Lynx Full time

    Job Title: Site Reliability Engineer Location: Dallas, TX//Onsite Duration: Full Time-Only Job Description Responsible for ensuring the reliability of systems, minimizing downtime, and maintaining service-level objectives (SLOs). Developing, automating, and implementing automation tools to streamline processes, deploy applications, and manage...


  • Dallas, United States Signify Health Full time

    How will this role have an Impact? Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We're seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and...


  • Dallas, United States JPMorganChase Full time

    Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, Infrastructure platforms team, you...


  • Dallas, Texas, United States JPMorganChase Full time

    Job Description There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.As a Site Reliability Engineer III at JPMorgan Chase within the Enterprise technology, Infrastructure platforms team, you will solve...


  • Dallas, United States Signify Health Full time

    Job DescriptionJob DescriptionHow will this role have an Impact?Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We're seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team...


  • Dallas, United States VIZIO Full time

    About the Team: VIZIO releases firmware & software for millions of customers in a time efficient manner. Our goal is to maintain 99.9% uptime for our customers. We are seeking a Site Reliability Engineer to join our expanding organization. The Site Reliability Engineer will report to the Manager, DevOps Security and will play a crucial role in enhancing the...


  • Dallas, United States Motion Recruitment Partners LLC Full time

    Our client, a large manager service provider focused on digital solutions and transformation, is looking for a Site Reliability Engineer to join their team. This person will be responsible for monitoring their application performance, making suggestions to improve performance and stability, and taking the lead on implementing those improvements. The ideal...


  • Dallas, United States Motion Recruitment Full time

    Our client, a large manager service provider focused on digital solutions and transformation, is looking for a Site Reliability Engineer to join their team. This person will be responsible for monitoring their application performance, making suggestions to improve performance and stability, and taking the lead on implementing those improvements. The ideal...

Senior Site Reliability Engineer

2 months ago


Dallas, United States Tech Holding Full time
Job DescriptionJob Description

About us:

Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.

The Role:

We are seeking a highly skilled and experienced Senior Site Reliability Engineer to join our growing team. You will play a critical role in ensuring the reliability, scalability, and performance of our critical infrastructure and applications. Beyond core SRE responsibilities, you will also serve as a key liaison across various teams, fostering collaboration and ensuring seamless operations.

Responsibilities:

Site Reliability Engineering:

  • Proactively identify and mitigate potential issues impacting infrastructure and applications.
  • Partner with development teams to implement best practices for building reliable and scalable systems.
  • Stay up-to-date on the latest SRE trends and technologies.

Monitoring and Observability:

  • Design, implement, and maintain robust monitoring solutions using tools like Prometheus and Grafana.
  • Develop and configure alerts within tools like PagerDuty to ensure timely notification of potential issues.
  • Analyze and troubleshoot issues using collected application and infrastructure metrics.

Incident Management:

  • Lead incident response, ensuring timely resolution and minimizing downtime.
  • Document and communicate incident details effectively to stakeholders.
  • Conduct post-incident reviews to identify root causes and implement preventative measures.

Service Level Agreements (SLAs):

  • Collaborate with product and engineering teams to define clear and measurable SLAs for our SaaS offerings.
  • Establish Service Level Objectives (SLOs) for key metrics based on SLA requirements.
  • Define Service Level Indicators (SLIs) to track progress towards achieving SLOs.
  • Monitor SLO compliance and proactively identify potential SLA breaches.

Automation:

  • Identify opportunities for automation to improve efficiency and reliability.
  • Develop and implement automation scripts using tools like Python or Bash.
  • Automate routine tasks and incident response workflows.

Cross-Team Collaboration:

  • Act as a liaison between SRE, Product, Security, Application Engineering, and Customer Operations teams.
  • Facilitate communication and information sharing across teams to ensure smooth operations.
  • Work collaboratively to define and implement solutions that meet the needs of all stakeholders.

Mentorship and Knowledge Sharing:

  • Mentor and collaborate with junior SRE engineers.
  • Share knowledge and best practices within the team.
  • Contribute to the development and documentation of internal SRE processes.

Required Skills:

  • 5-8 years of experience as a Site Reliability Engineer (SRE) or related role.
  • Experience with cloud platform GCP
  • Proven experience with monitoring tools like Prometheus and Grafana.
  • Strong understanding of incident management best practices.
  • Experience with alerting tools like PagerDuty.
  • Experience with scripting languages like Python or Bash for automation.
  • Excellent communication and collaboration skills.
  • Ability to work independently and as part of a team.
  • Strong problem-solving and analytical skills.
  • Passion for building reliable and scalable systems.

Nice to Have:

  • Experience with container orchestration platforms like Kubernetes.
  • Experience with chaos engineering principles.
  • Experience with configuration management tools like Ansible or Chef.

What we offer:

  • Remote Work Opportunities
  • Flexible Work Hours