Reliability Engineer

2 weeks ago


Orlando, United States OrangePeople Full time

The Systems Engineer is a critical member of the Technical Operations Team. They are responsible for end-to-end technical support of complex enterprise-scale applications which use a variety of technologies both on-premises and in the cloud. Their work includes day-to-day operations working with business units to plan, design, and implement systems as well as monitoring ongoing maintenance, enhancements, and automation. This role will apply systems reliability engineering principles, DevOps practices, and ITSM service operation disciplines that facilitate a highly efficient, highly available production environment. Responsibilities: Utilizes skills and experience provision enterprise-scale software and services both on-premises and in the cloud. This includes combining the right combination of cloud and on-prem resources for any given product/solution. Utilizes skills and experience to provide technical leadership with OS performance monitoring, tuning, and troubleshooting. Utilizes skills and experience to provide technical leadership assistance with web/application server configuration, performance monitoring, tuning, clustering, and debugging. Utilizes skills and experiences to act as a liaison to the DevOps process with the project delivery team. Evaluates new applications/systems for both operational best practices and technical feasibility against current operational standards. Participates in major incidents by providing technical leadership to interpret data from OS, applications, middleware stacks, and performance management tools. When engaged, takes responsibility for identifying the point of failure and restoring normal service operation. Works in a team responsible for Incident Management, Request Fulfillment, Problem Management, IT Operations Control, Change Evaluation, and Change Fulfillment. Assists in creating concise and accurate documentation for Level 1 and Level 2 teams so they can achieve the resolution of simple to moderate incidents/issues without escalation. Part of a 24x7 on-call rotation. Basic Qualifications: Bachelor's degree in Computer Science, Information Technology, or a similar field or related work experience. 3 years' or progressively more in-depth experience in a role supporting and/or deploying enterprise-scale solutions which demonstrated strong analytical and problem-solving skills. 2 years' experience deploying and/or supporting systems and applications in a cloud environment, with Amazon AWS and Microsoft Azure strongly preferred. Proven expertise in setting up, operating, and tuning a variety of performance management and monitoring tools such as AppDynamics, SiteScope, Splunk, New Relic, Grafana, etc. Proven experience working with multiple operating systems, including a variety of Linux distros, as well as containerized application deployment strategies such as Docker, ECS, AKS/Kubernetes, etc. Demonstrated understanding of how to configure and use code management, configuration, and deployment tools, including Chef, Rundeck, Jenkins, git, GitHub, Terraform, CloudFormation, Azure Resource Manager, etc. Demonstrated understanding of certificate management for a variety of solutions and use cases, including SSL/TLS and client certificates, for both on-prem and cloud solutions. Demonstrated understanding of full-stack application operational concepts such as Java applications & middleware, NodeJS, Angular, React, etc. Demonstrated understanding of computer networks and network infrastructure, including HTTP, TCP/IP, SNMP, DNS, routing, switching, and load balancing. Familiarity with current software development lifecycle (SDLC) concepts and best practices and CI/CD pipelines. Familiarity with IT Service Management (ITSM) processes, especially incident management, problem management, and knowledge management. ITIL Certification is desired. Familiarity with problem analysis best practices, especially Kepner-Tragoe. Strong interpersonal and communication skills with a track record that demonstrates the ability to work effectively across a wide range of constituencies in a diverse corporate environment. Excellent organizational and time management skills that enable working in a fast-paced team that is self-motivated to independently complete tasks on multiple projects simultaneously. Required Education: Bachelor's Degree. Additional Responsibilities: Participate in OrangePeople monthly team meetings, and participate in team-building efforts. Contribute to OrangePeople technical discussions, peer reviews, etc. Contribute content and collaborate via the OP-Wiki/Knowledge Base. Provide status reports to OP Account Management as requested. About us : OrangePeople is an Enterprise Architecture and Project Management solutions company. Our most valuable asset is our people: dynamic, creative thinkers, who are passionate about doing quality work. As a member of the OrangePeople team, you will have access to industry-leading consulting practices, strategies & technologies, innovative training & education. An ideal Orange Person is a technology leader with a proven track record of technical achievements and a strong process/methodology orientation.


  • Reliability Engineer

    4 weeks ago


    Orlando, United States Lockheed Martin Full time

    Job ID: 665908BR Date posted: May. 06, 2024 Program: LM-STAR PBL Description:WHAT WE'RE DOINGLM-STAR is a critical asset that ensures high mission capability rates for some of the military's most complex platforms and Lockheed Martin's PBL program is its service counterpart that sustains station availability. With our...

  • Reliability Engineer

    3 weeks ago


    Orlando, Florida, United States Lockheed Martin Full time

    Description:WHAT WE'RE DOING LM-STAR is a critical asset that ensures high mission capability rates for some of the military's most complex platforms and Lockheed Martin's PBL program is its service counterpart that sustains station availability. With our LM-STAR PBL program, the overall support responsibility rests with us as does working collaboratively...

  • Reliability Engineer

    1 month ago


    Orlando, Florida, United States Lockheed Martin Full time

    Description:WHAT WE'RE DOING LM-STAR is a critical asset that ensures high mission capability rates for some of the military's most complex platforms and Lockheed Martin's PBL program is its service counterpart that sustains station availability. With our LM-STAR PBL program, the overall support responsibility rests with us as does working collaboratively...


  • Orlando, United States HCLTech Full time

    - Design, build, and maintain scalable and resilient infrastructure using best practices and modern technologies. - Develop and implement automation tools and processes to improve efficiency and reliability. - Monitor system performance, identify bottlenecks, and implement improvements to ensure high availability. - Drive Major incidents, troubleshoot...


  • Orlando, United States HCLTech Full time

    - Design, build, and maintain scalable and resilient infrastructure using best practices and modern technologies.- Develop and implement automation tools and processes to improve efficiency and reliability.- Monitor system performance, identify bottlenecks, and implement improvements to ensure high availability.- Drive Major incidents, troubleshoot issues,...


  • Orlando, United States HCLTech Full time

    - Design, build, and maintain scalable and resilient infrastructure using best practices and modern technologies.- Develop and implement automation tools and processes to improve efficiency and reliability.- Monitor system performance, identify bottlenecks, and implement improvements to ensure high availability.- Drive Major incidents, troubleshoot issues,...


  • Orlando, United States HCLTech Full time

    - Design, build, and maintain scalable and resilient infrastructure using best practices and modern technologies.- Develop and implement automation tools and processes to improve efficiency and reliability.- Monitor system performance, identify bottlenecks, and implement improvements to ensure high availability.- Drive Major incidents, troubleshoot issues,...


  • Orlando, United States HCLTech Full time

    - Design, build, and maintain scalable and resilient infrastructure using best practices and modern technologies. - Develop and implement automation tools and processes to improve efficiency and reliability. - Monitor system performance, identify bottlenecks, and implement improvements to ensure high availability. - Drive Major incidents, troubleshoot...


  • Orlando, Florida, United States Amadeus Full time

    Job TitleSenior Service Reliability EngineerSummary of the role:As part of the Amadeus Global Operations Americas organization, the Senior Service Reliability Engineer is responsible to support revenue generating systems in production environments.Increase your chances of an interview by reading the following overview of this role before making an...


  • Orlando, United States Electronic Arts Full time

    EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE) with a strong understanding of On-premises Infrastructure, cloud computing, Database, and Virtual technologies to join us. As Site Reliability Engineer (SRE) you will support us and our complex systems running VMware vSphere v7/8x and infrastructure/applications hosted on the EA...


  • Orlando, Florida, United States Electronic Arts Full time

    EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE) with a strong understanding of On-premises Infrastructure, cloud computing, Database, and Virtual technologies to join us.As Site Reliability Engineer (SRE) you will support us and our complex systems running VMware vSphere v7/8x and infrastructure/applications hosted on the EA...


  • Orlando, Florida, United States Electronic Arts Full time

    EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE) with a strong understanding of On-premises Infrastructure, cloud computing, Database, and Virtual technologies to join us.As Site Reliability Engineer (SRE) you will support us and our complex systems running VMware vSphere v7/8x and infrastructure/applications hosted on the EA...


  • Orlando, United States Electronic Arts Full time

    EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE) with a strong understanding of On-premises Infrastructure, cloud computing, Database, and Virtual technologies to join us. As Site Reliability Engineer (SRE) you will support us and our complex systems running VMware vSphere v7/8x and infrastructure/applications hosted on the EA...


  • Orlando, Florida, United States Electronic Arts (EA) Full time

    We are a global team of creators, storytellers, technologists, experience originators, innovators and so much more. We believe amazing games and experiences start with teams as diverse as the players and communities we serve. At Electronic Arts, the only limit is your imagination.EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE)...


  • Orlando, Florida, United States Electronic Arts (EA) Full time

    We are a global team of creators, storytellers, technologists, experience originators, innovators and so much more. We believe amazing games and experiences start with teams as diverse as the players and communities we serve. At Electronic Arts, the only limit is your imagination.EA is looking for an experienced Infrastructure Site Reliability Engineer (SRE)...


  • Orlando, Florida, United States Amadeus Full time

    Job Title Senior Service Reliability Engineer Summary of the role: As part of the Amadeus Global Operations Americas organization, the Senior Service Reliability Engineer is responsible to support revenue generating systems in production environments. The engineer works closely with other SREs in the monitoring, maintenance, and support related to incident...


  • Orlando, United States Electronic Arts (EA) Full time

    We are a global team of creators, storytellers, technologists, experience originators, innovators and so much more. We believe amazing games and experiences start with teams as diverse as the players and communities we serve. At Electronic Arts, the only limit is your imagination. EA is looking for an experienced Infrastructure Site Reliability Engineer...


  • Orlando, United States Electronic Arts (EA) Full time

    We are a global team of creators, storytellers, technologists, experience originators, innovators and so much more. We believe amazing games and experiences start with teams as diverse as the players and communities we serve. At Electronic Arts, the only limit is your imagination. EA is looking for an experienced Infrastructure Site Reliability Engineer...

  • Reliability Engineer

    2 weeks ago


    Orlando, United States Orangepeople Full time

    The Systems Engineer is a critical member of the Technical Operations Team. They are responsible for end-to-end technical support of complex enterprise-scale applications which use a variety of technologies both on-premises and in the cloud. Their work includes day-to-day operations working with business units to plan, design, and implement systems as well...


  • Orlando, United States CROSSLINK Professional Tax Solutions Full time

    Job DescriptionJob DescriptionDescription:A Site Reliability Engineer (SRE) is an advanced DevOps role that combines software engineering and systems administration to ensure the scalability, performance, and reliability of large-scale, cloud-based applications and infrastructure. A SRE has the overall responsibility of taking a proactive approach in...