Principal Site Reliability Engineer

1 week ago


Charlotte, United States Brightspeed Full time
Job Description

We are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with development teams, operations, and other stakeholders to ensure that new services and features are reliable and scalable.

As a Principal Site Reliability Engineer, your duties and responsibilities will include:

Implement and maintain monitoring systems to track the performance and availability of Business-critical systems and infrastructure. Use metrics to identify trends and potential issues. Respond to system outages and performance issues, performing root cause analysis to prevent recurrence Develop scripts and tools to automate repetitive tasks, such as deployment, scaling, and monitoring Work closely with development teams, operations, and other stakeholders to ensure that new services and features are reliable and scalable Work on reducing latency and improving the speed of data transmission across the network Define and measure Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure services meet required performance and availability targets+ Conduct postmortems after incidents to identify what went wrong and what can be improved Work with Lead Application owners and internal Change Management to review code changes and support deployments Lead the team of site reliability engineers onshore/offshore, mentor them for support activities required for system reliability Must have ability to communicate and abstract the messaging to multiple target audiences including Sr business & IT leadership, technology, and business teams.

Qualifications

WHAT IT TAKES TO CATCH OUR EYE:

Master’s degree in computer science, telecommunications, or similar areas, with a minimum of 10 years software engineering experience, including a minimum of 5 years as a site reliability engineer Proven track record of managing mission critical customer facing applications for reliability 5+ years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable and high available Excellent troubleshooting and problem-solving skills, with a keen attention to detail to identify and resolve complex production issues Deep understanding of cloud computing platforms (GCP) and containerization technologies (, Docker, Kubernetes) Solid experience with core Kubernetes concepts such as Pods, Workloads, Services, Ingress/Egress, Deployments, ConfigMaps, HPA, Liveliness Probe, and Secrets Strong knowledge of infrastructure as code tools (, Terraform, Ansible, ArgoCD) and CI/CD pipelines Strong experience working with integration of code quality tool (SonarQube or Checkmarx) with CI/CD pipeline Strong experience with monitoring, logging, and observability tools like, Splunk, GCP log, Dynatrace etc. Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders Must have proven written and verbal communication skills, including presentations using tools like PowerPoint Must have ability to communicate and abstract the messaging to multiple target audiences including Sr business & IT leadership, technology and business teams

BONUS POINTS FOR:

Certifications such as Google Professional Cloud DevOps Engineer or AWS Certified DevOps Engineer 

#LI-SS1



  • Charlotte, United States Brightspeed Full time

    Job Description We are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with...


  • Charlotte, North Carolina, United States Brightspeed Full time

    Job DescriptionWe are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with...


  • Charlotte, United States Brightspeed Full time

    Job DescriptionJob DescriptionCompany DescriptionAt Brightspeed, we are reimagining how people live, work, play and connect by providing fast, reliable internet connections and an awesome customer experience in twenty states throughout the Midwest and South.Backed by funds managed by Apollo Global Management, our vision is to accelerate the upgrade of...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States JobRialto Full time

    Job Description: Looking for a forward-thinking, energetic Site Reliability Engineering Manager to join our team. PDL serves the ecommerce needs of leading and growing grocery retailers with millions of shoppers located throughout the East Coast and Midwest. PDL strives to enable our retailers to be number one in all markets they operate in by: Leading IT...


  • Charlotte, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, North Carolina, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, United States KTek Resourcing Full time

    Role: Site Reliability Engineer With SplunkLocation: Charlotte, NC (Onsite-Hybrid)Duration: Contract/Full-timeJob Description:Candidates who have expertise in creating Splunk dashboards.Also Grafana and AppDynamics experience. It should be based preferably in Charlotte (CIC building), Willing to work during non-normal hours for deployments and any Prod...


  • Charlotte, United States Ryan Consulting Group Full time

    Job DescriptionJob DescriptionThe Site Reliability Engineer is a key role which focuses on building and maintaining the tooling and infrastructure used to automate the release, deployment, and upgrade processes for workloads. This individual will work on developing the automated pipelines for cloud environments as well as providing consulting services to...


  • Charlotte, United States Sumitomo Mitsui Banking Corp Full time

    JOB SUMMARY: You will work closely with our software engineering and data teams to implement and maintain robust data pipelines and infrastructure. Your expertise in Google Cloud Platform (GCP) or Azure, container technologies like Kubernetes, or Docker, and Apache Airflow processes will be crucial in driving our success. PRINCIPAL DUTIES &...


  • Charlotte, North Carolina, United States SERC Reliability Corporation Full time

    SERC OVERVIEW:SERC Reliability Corporation (SERC) is a nonprofit regulatory authority and is one of the six Regional Entities across North America and is responsible for administering the bulk power system (BPS) reliability in all or part of the sixteen southeastern states under the Federal Energy Regulatory Commission (FERC) approved delegation agreement...


  • Charlotte, United States Syntricate Technologies Full time

    Platform/Site Reliability Engineer 6 Months Contract to Hire Charlotte, NCJOB DESCRIPTION We're looking for a Senior Platform Engineer to come help us automate everything, enable our developer teammates, and create and support world-class platforms. As a Senior Platform Engineer, you will be an integral member of the Platform Engineering team, helping the...


  • Charlotte, United States Saxon Global Full time

    Site Reliability Engineer JOB SUMMARY This position is responsible for design, development and implementation of cloud based technologies. Provide technical expertise on complex projects and advanced troubleshooting of existing Cloud technology for use by department. Such as guidance and support in the development of progress at all system layers, including...


  • Charlotte, United States SERC Reliability Corporation Full time

    Job DescriptionJob DescriptionSERC OVERVIEW:The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known...


  • Charlotte, United States SERC Reliability Corporation Full time

    SERC OVERVIEW: The electric grid is vital to our everyday lives. It is fundamental for the health, safety, and well-being of our communities, and provides the platform for our economy and our societal and technological advances. SERC's mission is to reduce risks to the reliability and security of the electric grid (also known as the bulk power system), not...

  • Digital One

    1 month ago


    Charlotte, United States Jobs for Humanity Full time

    Job Description Position Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor of Computer Science Travel Percentage : 5 - 10% Job Description As the world works and lives faster, FIS is leading the way. Our fintech solutions touch nearly every market, company and person on the planet. Our...


  • Charlotte, United States Credit Karma Full time

    Intuit Credit Karma is a mission-driven company, focused on championing financial progress for our more than 130 million members globally. While we're best known for pioneering free credit scores, our members turn to us for everything related to their financial goals, including identity monitoring, applying for credit cards, shopping for insurance and...

  • Reliability Engineer

    3 weeks ago


    Charlotte, North Carolina, United States JLL Full time

    JLL supports the Whole You, personally and professionally. Our people at JLL are shaping the future of real estate for a better world by combining world class services, advisory and technology to our clients. We are committed to hiring the best, most talented people in our industry; and we support them through professional growth, flexibility, and...


  • Charlotte, United States Eliassen Group Full time

    Our client, a leading gas and electric company, has an excellent opportunity for a Principal BI/Data Engineer to work on a 12+month contract opportunity. Work will be a hybrid on-site/remote schedule in Charlotte, NC. The Principal BI/Data Engineer will work with Data Engineers, Data Warehouse Developers, and Data Architects to create Business Intelligence...


  • Charlotte, North Carolina, United States CBRE Full time

    Reliability Engineer - AMS Enterprise Job ID 172528 Posted 28-Jun-2024 Service line GWS Segment Role type Full-time Areas of Interest Digital & Technology/Information Technology, Engineering/Maintenance Location(s) Atlanta - Georgia - United States of America, Charlotte - North Carolina - United States of America, Chicago - Illinois - United States of...