Reliability Engineer

2 months ago


Dallas, Texas, United States Hearst Full time

About Us

Homecare Homebase, a subsidiary of Hearst Corporation, is a leading provider of healthcare software solutions. Our mission is to deliver innovative, cloud-based technologies that improve clinical, operational, and financial outcomes for homecare and hospice agencies across the United States.

Our Culture

We value a culture of caring, action, respect, excellence, and a positive attitude. Our employees are passionate about making a difference in patient care and work in a collaborative environment that fosters growth and innovation.

Job Summary

We are seeking a highly skilled System Administrator to join our mission-critical Reliability team. The successful candidate will be responsible for documenting systems, analyzing impacts of new requirements, and delivering technical solutions that align with operational needs.

Key Responsibilities

  • Available outside of business hours to respond to service incidents as part of an on-call rotation.
  • Leverage configuration management tools for infrastructure in a hybrid cloud model.
  • Support Service Operations, including incident, problem, change, and request fulfillment.
  • Monitor, administer, upgrade, and patch production infrastructure and applications per standard procedures and runbooks.
  • Contribute to standard operating procedures, documentation, and support operations tempo.
  • Support compliance program requirements, including audits.
  • Tune monitoring systems to maximize detection and reduce alert noise.

Requirements

  • Proficient with scripting (e.g., Bash, PowerShell).
  • Proficient with security best practices in server configuration, tool development, and access controls.
  • Proficient with administration of Linux or other Unix variants (Ubuntu, CentOS, RedHat, Solaris, etc.) in a production environment.
  • Proficient with networking and troubleshooting (TCP/IP, DNS, HTTP, routing, switching, firewalls, LAN/WAN, traceroute, iperf, dig, cURL, or related).
  • Proficient with administration, automation, and orchestration of large-scale Windows and Linux environments using configuration management solutions such as DSC, Ivanti, and Ansible, Puppet, or Chef.
  • Leverage systems management and automation with self-repair rather than relying on alarming and human intervention.
  • Proficient with correlation and monitoring solutions such as Splunk, Application Insights, Azure Monitor, or SCOM.
  • Proficient with Active Directory administration and able to support access management operations.
  • Strong written and verbal interpersonal skills.
  • Strong customer focus, ownership, bias for action, and the ability to dive deep.
  • Excellent problem-solving and analytical skills with attention to detail and driving issues to resolution.
  • Demonstrated ability to learn new skills and apply learned knowledge.
  • Demonstrated ability to prioritize and execute multiple tasks.
  • Support team continuous improvement by looking for ways to streamline and automate processes and improve customer satisfaction.
  • DevOps mindset practitioner and change agent.

Experience

  • 3+ years of experience in 24x7 production environments.
  • 3+ years of Windows and/or Linux administration and enterprise production experience.
  • 3+ years of Kubernetes and Cloud administration experience.
  • Experience with Healthcare industry HIPAA regulations (similar regulated industry experience considered, e.g., PCI, SOX).
  • Familiar knowledge of process and IT service management concepts such as ITIL and ITSM.
  • Experience using the ServiceNow platform in an IT Service Management or Service Operations role is desired.
  • Experience working in an Agile and/or SAFe environment.

Education/Certification/Training

  • Bachelor's degree in Computer Science, Engineering, Math, or related (equivalent experience considered).
  • Candidates with relevant certifications are preferred, including but not limited to the following: ITIL Foundations Configuration: RHCE-Ansible Kubernetes - CKA, KCSP Linux – RHCE, CompTIA Linux+, GCUX, LPI Microsoft: Azure Administrator, Azure DevOps Engineer, Azure Architect, MCSE.


  • Dallas, Texas, United States Themesoft Inc. Full time

    Site Reliability EngineerAt Themesoft Inc., we're seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Foster a culture of reliability and efficiency by sharing best...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job Title: Site Reliability EngineerAt Goldman Sachs, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key Responsibilities:Develop and implement automation tooling to improve the...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job Title: Site Reliability EngineerAt Goldman Sachs, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key Responsibilities:Develop and implement automation tooling to improve the...


  • Dallas, Texas, United States RTX Full time

    Job Title: Senior Reliability EngineerAt RTX, we are seeking a highly skilled Senior Reliability Engineer to join our Whole Life Engineering (WLE) Department. As a key member of our team, you will play a critical role in influencing hardware and systems early in their life cycle to ensure effective operation, reliability, maintainability, and economical...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our ecommerce platform.Key Responsibilities:Collaborate with cross-functional teams to identify...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.**Key Responsibilities:*** Design, implement, and maintain scalable and reliable cloud...


  • Dallas, Texas, United States Tata Consultancy Services Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Tata Consultancy Services. As an SRE Support Analyst, you will play a critical role in ensuring the stability and sustainability of our software systems.Key ResponsibilitiesDrive the stability and sustainability of our next-generation systems and discover innovative...


  • Dallas, Texas, United States Bayone Full time

    Job Title: Site Reliability EngineerBayone is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available and scalable applications deployed in Azure.Key Responsibilities:Design and implement automation tools and scripts to streamline...


  • Dallas, Texas, United States Glocomms Full time

    Job Title: Site Reliability EngineerGlocomms is seeking a highly skilled Site Reliability Engineer to join their team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the company's cloud infrastructure.Responsibilities:Design and implement scalable and highly available cloud infrastructureDevelop and...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key Responsibilities:Develop and implement incident management processes to ensure...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, reliability, and performance of our applications and infrastructure.Key Responsibilities:Design, implement, and maintain scalable and...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our ecommerce systems.Key Responsibilities:Collaborate with cross-functional teams to identify and...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerAt Diverse Lynx LLC, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, reliability, and performance of our applications and infrastructure.Key Responsibilities:Design, implement, and maintain scalable and...


  • Dallas, Texas, United States Motion Recruitment Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Motion Recruitment. As a Site Reliability Engineer, you will be responsible for ensuring the stability, scalability, and performance of our applications.About the RoleThis is a direct hire, hybrid role (3-4 days onsite) in Dallas, Texas. The...


  • Dallas, Texas, United States Motion Recruitment Full time

    Job Title: Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team at Motion Recruitment. As a Site Reliability Engineer, you will be responsible for ensuring the stability, scalability, and performance of our applications.About the RoleThis is a direct hire, hybrid role (3-4 days onsite) in Dallas, Texas. The ideal...


  • Dallas, Texas, United States Saxon Global Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Saxon Global. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based e-commerce and retail platform.Key ResponsibilitiesDesign, develop, and maintain tools to improve the reliability,...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key ResponsibilitiesDevelop and maintain automation tooling to improve the reliability of our platform and...


  • Dallas, Texas, United States Forhyre Full time

    Job OpportunityWe are seeking a highly skilled Site Reliability Engineer to join our team at Forhyre. As a key member of our engineering team, you will play a critical role in ensuring the reliability, scalability, and efficiency of our cloud-based services.Key Responsibilities:Design and implement major infrastructure components, systems, and...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job DescriptionRole: Site Reliability Engineer/DevOps EngineerLocation: Dallas, TX (Onsite)Duration: Full-timeJob Description: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications...


  • Dallas, Texas, United States Diamondpick Full time

    The roleDiamondpick is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our services and platforms in a highly transactional 24x7 environment.Key Responsibilities:Monitor application performance and take steps to improve...