Principal Site Reliability Engineer

4 weeks ago


Charlotte, North Carolina, United States BrightSpeed Full time
Job Title: Principal Site Reliability Engineer

At Brightspeed, we are reimagining how people live, work, play, and connect by providing fast, reliable internet connections and an exceptional customer experience in twenty states throughout the Midwest and South.

We are seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will play a critical role in designing and building a world-class fiber network and creating a customer experience second to none.

Key Responsibilities:
  • Implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues.
  • Work closely with development teams, operations, and other stakeholders to ensure that new services and features are reliable and scalable.
  • Develop scripts and tools to automate repetitive tasks, such as deployment, scaling, and monitoring.
  • Respond to system outages and performance issues, performing root cause analysis to prevent recurrence.
  • Define and measure Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure services meet required performance and availability targets.
  • Conduct postmortems after incidents to identify what went wrong and what can be improved.
  • Lead the team of site reliability engineers onshore/offshore, mentor them for support activities required for system reliability.
Requirements:
  • Master's degree in computer science, telecommunications, or similar areas, with a minimum of 10 years software engineering experience, including a minimum of 5 years as a site reliability engineer.
  • Proven track record of managing mission-critical customer-facing applications for reliability.
  • 5+ years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable, and high available.
  • Excellent troubleshooting and problem-solving skills, with a keen attention to detail to identify and resolve complex production issues.
  • Deep understanding of cloud computing platforms (GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Solid experience with core Kubernetes concepts such as Pods, Workloads, Services, Ingress/Egress, Deployments, ConfigMaps, HPA, Liveliness Probe, and Secrets.
  • Strong knowledge of infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) and CI/CD pipelines.
  • Strong experience working with integration of code quality tool (SonarQube or Checkmarx) with CI/CD pipeline.
  • Strong experience with monitoring, logging, and observability tools like, Splunk, GCP log, Dynatrace, etc.
  • Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.
  • Must have proven written and verbal communication skills, including presentations using tools like PowerPoint.
  • Must have ability to communicate and abstract the messaging to multiple target audiences including Sr business & IT leadership, technology, and business teams.
What We Offer:
  • Competitive compensation and comprehensive benefits.
  • Opportunity to work with a cutting-edge technology stack.
  • Collaborative and dynamic work environment.
  • Professional development and growth opportunities.
Why Join Us?

We aspire to contemporary ways of working. Recognized as a Top Workplace by the Charlotte Observer, Brightspeed HQ is located on the 7th floor of the new Vantage South End - East Tower in Charlotte, NC. We prioritize hiring talent in the Charlotte area, whenever possible, to make it a truly vibrant destination for our hybrid workforce.

We also invest in technology that enables our entire team to stay connected. Why? Because Brightspeed recognizes the value of finding the best talent for the job, wherever they may be.



  • Charlotte, North Carolina, United States Brightspeed Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Brightspeed. As a key member of our engineering team, you will play a critical role in ensuring the reliability and scalability of our cloud-native applications.Key Responsibilities:Design and implement monitoring systems to...


  • Charlotte, North Carolina, United States BrightSpeed Full time

    We are seeking a Principal Site Reliability Engineer to join our team at Brightspeed. As a key member of our team, you will be responsible for implementing and maintaining monitoring systems to track the performance and availability of business-critical systems and infrastructure.Key responsibilities include:Implementing and maintaining monitoring systems to...


  • Charlotte, North Carolina, United States City National Bank Full time

    Job SummaryCity National Bank is seeking a highly skilled Site Reliability Principal Engineer to join our team. As a Site Reliability Principal Engineer, you will be responsible for designing, building, and managing large-scale, fault-tolerant systems. Your role will be to ensure the reliability, scalability, and maximum uptime of CNB systems in the Data...


  • Charlotte, North Carolina, United States Matlen Silver Full time

    Job Title: Site Reliability EngineerMatlen Silver is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, performance, and reliability of our Fulfillment Technology solutions.Key Responsibilities:Partner with application engineering, observability, and other...


  • Charlotte, North Carolina, United States Digital Technology Solutions Full time

    Job Title: Site Reliability EngineerAbout the Role:At Digital Technology Solutions, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our production environment.Key Responsibilities:1. Monitoring and Incident...


  • Charlotte, North Carolina, United States Matlen Silver Full time

    Job Title: Site Reliability Engineer (SRE)Duration: 6+ monthsLocation: Charlotte, NCRequired Pay Scale: $67-$70/hour W2** No C2CJob Description/Requirements:True SRE with 6+ years of experienceAWS/Cloud expertiseTriage, incident response, root cause analysis, application improvement, reliabilityLamda, ECS, APIs, Dynatrace/Datadog knowledge, gitlab,...


  • Charlotte, North Carolina, United States Capgemini Full time

    Job Title: Site Reliability EngineerLocation: Atlanta, USJob Type: PermanentJob Description:We are seeking a seasoned Site Reliability Engineer to join our team at Capgemini. As a Site Reliability Engineer, you will be responsible for advancing and enhancing reliability practices, with a strong focus on testing, monitoring, and maintaining system...


  • Charlotte, North Carolina, United States Capgemini Full time

    **Job Title:** Site Reliability Engineer**Job Summary:** We are seeking a seasoned Site Reliability Engineer to join our Trade Distribution System (TDS) software development team. The ideal candidate will have a strong background in SRE, experience with cloud infrastructure, and a passion for ensuring system performance and reliability.**Key...


  • Charlotte, North Carolina, United States Regions Bank Full time

    Job Description:At Regions Bank, we are seeking a skilled Site Reliability Engineer to join our team. This role is responsible for ensuring the dependability of our firm's most critical system applications.This position will be called upon to solve major issues, understand and remediate the points of system failure, as well as work with internal teams to...


  • Charlotte, North Carolina, United States Capgemini Full time

    Job Title: Site Reliability EngineerWe are seeking a seasoned Site Reliability Engineer to join our Trade Distribution System (TDS) software development team. The ideal candidate will have a strong focus on testing, monitoring, and maintaining system performance.Key Responsibilities:Develop and maintain operational tools for deployment, monitoring, and...


  • Charlotte, North Carolina, United States City National Bank Full time

    About the RoleCity National Bank is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and maximum uptime of our systems in the data center or cloud platform.Key ResponsibilitiesImplement solutions to improve stability, security,...


  • Charlotte, North Carolina, United States HDR Full time

    About HDRAt HDR, we are a multidisciplinary team of engineers, architects, and scientists who specialize in engineering, architecture, environmental, and construction services. Our mission is to create an unshakable foundation for progress by providing innovative solutions to complex problems.Job SummaryWe are seeking a Principal Site Civil Engineer to join...


  • Charlotte, North Carolina, United States V2soft Full time

    About V2SoftV2Soft is a global company with a strong presence in multiple regions, including North America, Europe, and Asia. Our headquarters is located in Bloomfield Hills, Michigan, and we have offices in Mexico, Italy, India, China, and Germany. Our mission is to provide high-performance technology solutions that address real business challenges. We...


  • Charlotte, North Carolina, United States Varo Bank Full time

    Job Title: Staff Site Reliability EngineerVaro Bank's SRE team is a seasoned group of experts, designing, building, and running large-scale, distributed, fault-tolerant systems that power most of our operations.We live and breathe AWS and Kubernetes, with an open-source first and result-oriented mindset.We are an automation and observability-focused team,...


  • Charlotte, North Carolina, United States Saxon Global Full time

    Requisituri:Experiență în migrația AWS Dynatrace Splunk AWS Experiență cu platforme de scară mare Java De preferat:Agile Jira AWS certificat *ideali să fi fost parte a unei migrații .Net Experiență cu tabelele de sprint și Kanban Descrierea rolului:În calitate de Site Reliability Engineer, vei contribui la dezvoltarea și implementarea...


  • Charlotte, North Carolina, United States Varo Bank Full time

    Job SummaryVaro Bank is seeking a highly skilled Staff Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services.Key ResponsibilitiesLead and mentor a team of SREs, driving best practices and fostering a culture of reliability and...


  • Charlotte, North Carolina, United States Capgemini Full time

    Job Title:Site Reliability EngineerJob Summary:Capgemini is seeking a seasoned Site Reliability Engineer to join our Trade Distribution System software development team. The ideal candidate will have a strong background in SRE, with experience in testing, monitoring, and maintaining system performance.Key Responsibilities: Develop and maintain operational...


  • Charlotte, North Carolina, United States Tandym Group Full time

    Job Title: Site Reliability EngineerJob Summary: Tandym Group is seeking a skilled Site Reliability Engineer to support a financial client based in Charlotte. The ideal candidate will have experience with DevOps tools and technologies, such as GitLab and Terraform, and a strong understanding of infrastructure as code.Responsibilities:Monitor the production...

  • Principal SRE

    4 weeks ago


    Charlotte, North Carolina, United States Apex Systems Full time

    Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our dynamic SRE team. As a subject matter expert, you will be responsible for analyzing complex data and distributed systems, anticipating problems, and finding ways to mitigate risks to the environment.Key Responsibilities:Lead the design, build,...


  • Charlotte, North Carolina, United States Wells Fargo Full time

    About this role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for ensuring the availability, scalability, and performance of our applications and services.Key Responsibilities:Design, implement, and maintain scalable and highly available systemsCollaborate with...