Site Reliability Engineer

2 weeks ago


Raleigh, United States Bandwidth Recruitment Full time

Site Reliability Engineer

(Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health Monitors to notify the team when anomalies or problems occur. Work closely with other Site Reliability Engineers, DevOps Engineers, and System Administrators to achieve common goals. Analyze system performance data using Snowflake to plan for capacity upgrades or optimizations. Ensure the system can handle expected growth in traffic and data using the tools by getting the Lags and behavior of the Application. Manage Kubernetes clusters and OpenShift environments for deploying and scaling containerized applications. Implement and manage infrastructure using Ansible and maintain version-controlled infrastructure code using Gitlab for consistency and repeatability. Use Terraform and Ansible scripts to define and provision infrastructure resources in a repeatable and automated manner. Create and maintain Ansible playbooks to automate routine tasks, configurations, and deployments. Use GitHub Actions for CI/CD activities to continuously build and deploy the code and implement CI/CD pipelines to streamline application updates. Build and maintain deployment pipelines using the Ansible Playbooks and ensure smooth and reliable deployments, rollback procedures, and create production releases using Service Now for Tracking the Records. Maintain detailed documentation on system architecture, configurations, and processes using Confluence and Share knowledge and best practices with team members. Plan for resource allocation using Red Hat OpenShift including servers, storage, and network capacity, following the Kubernetes Architecture to ensure the system is equipped to handle traffic spikes and growth. Develop and test disaster recovery plans to ensure data and service availability in case of major failures or disasters by creating the tools using the Go. Work closely with development teams to promote a DevOps culture and ensure reliability is built into software from the start by following best practices. Collaborate with other Site Reliability Engineers to share knowledge and solve complex problems on a weekly basis and touch base all the points. Monitor and manage cloud resource costs in AWS to optimize spending while maintaining performance.

Required: Master’s degree or foreign equivalent in Computer Science, Electrical Engineering, or related field of study plus 2 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

Or in the alternate will accept a Bachelor’s degree or foreign equivalent in Computer Science, Electrical Engineering or related field of study plus 5 years of experience in the job offered or related position. Must have experience 2 years of experience with: Infrastructure and networking concepts including virtualization, load balancing, and DNS. At least one of the following cloud infrastructure technologies AWS, Google Cloud, Azure. REST APIs using at least one or more of the following (JSON, XML, YAML). Designing, building, and operating large-scale production systems. Continuous Integration and Continuous Deployment (CI/CD) concepts and technologies using at least one or more of following (Jenkins, GHA, Circle). Containerization technologies (Docker, Docker Compose, Docker Swarm, Kubernetes). Configuration and management techniques in large distributed environments. Monitoring and observability techniques with at least one or more of the following tools Datadog, Sensu, New Relic, Nagios. General use of open-source databases MySQL, Postgres, Redis, Cassandra. Unix/Linux administration, troubleshooting and shell scripting. At least one or more of the following programming languages Python, Java, Go, Rust, or similar. Source control (Git, GitHub) and feature branching strategies. Automating infrastructure, testing, and deployment using tools Ansible, Chef, or Terraform. Infrastructure as Code paradigm.

Submit resumes to: Bandwidth, Inc, 2230 Bandmate Way, Raleigh, NC 27607, Attn: Kellie Sigmon, Sr. Manager People Services or apply at

www.bandwidth.com/careers/openings/ . Must reference “Site Reliability Engineer” when applying.

#LI-DNI #LI-DNP

#J-18808-Ljbffr



  • Raleigh, United States Bandwidth Full time

    Apply Now Site Reliability Engineer at Bandwidth Raleigh, NC Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie....


  • Raleigh, United States Bandwidth Full time

    Apply Now Site Reliability Engineer at Bandwidth Raleigh, NC Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie....


  • Raleigh, North Carolina, United States Bandwidth Inc. Full time

    Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health...


  • Raleigh, United States Bandwidth Full time

    Job DescriptionJob DescriptionSite Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana...


  • Raleigh, United States Bandwidth Inc. Full time

    Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health...


  • Raleigh, United States Envestnet Asset Management, Inc Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply locations Raleigh time type Full time posted on Posted 30+ Days Ago job requisition id Req 20.357 - It's fun to work in a company where people truly BELIEVE in what they're doing! We're committed to bringing passion and customer focus to the business. Job Description...


  • Raleigh, United States Qualys Full time

    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world! Site Reliability Engineer, Cloud Platform ** The successful applicant will be performing work in FedRAMP environments, and therefore, must be a U.S. Person (i.e. U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee).This...


  • Raleigh, United States Raleigh Founded Full time

    The Role Kaleido is growing rapidly and seeking a Site Reliability Engineer with a passion for enterprise blockchain software. The successful candidate will be a highly skilled DevOps Engineer who is also passionate about systems stability, security and operational efficiency. If this describes you and you are able to thrive in a fast paced, high-growth, and...


  • Raleigh, United States Qualys Full time

    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world! Site Reliability Engineer, Cloud Platform ** The successful applicant will be performing work in FedRAMP environments, and therefore, must be a U.S. Person (i.e. U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee). This...


  • Raleigh, United States Qualys Full time

    Come work at a place where innovation and teamwork come together to support the most exciting missions in the world! Site Reliability Engineer, Cloud Platform ** The successful applicant will be performing work in FedRAMP environments, and therefore, must be a U.S. Person (i.e. U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee).This...


  • Raleigh, United States Allscripts Full time

    Welcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today’s healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...


  • Raleigh, United States Veradigm Full time

    Welcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today's healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...


  • Raleigh, United States Veradigm® Full time

    Welcome to Veradigm! Our Mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our Vision is a Connected Community of Health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...


  • Raleigh, United States DSJ Global Full time

    Job Title: Reliability Engineer Industry: Chemicals/Food & Beverage Location : North Carolina DSJ Global is currently partnered with a Fortune 500 manufacturing company based out of North Carolina who are looking for their next Reliability Engineer. As the Reliability Engineer, you will be responsible for the development, direction, supervision, and day to...


  • Raleigh, United States Biogen Full time

    The Sr. Reliability Engineer I applies Reliability Engineering methodologies to optimize design requirements and performance of critical assets across the site. Originates and develops analysis methods for determining reliability of components, equipment and processes. Acquires data and analyzes the data. Prepares and communicates information to define...


  • Raleigh, United States Cisco Full time

    Location: RTP, North Carolina, US Area of Interest Job Type Professional Cloud and Data Center, Software Development Job Id 1421649 Who We Are Today’s business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer...


  • Raleigh, United States Cisco Full time

    Who We Are Today's business environment is more than that - it's a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skills to be successful. This is now a mantra for our Cisco leadership team and for us. The...

  • Software Engineer

    6 days ago


    Raleigh, United States Celonis GmbH Full time

    The Team: Site Reliability Engineering The Role: You will be part of a highly technical, collaborative and creative team, with a focus on SRE & Software Engineering We design, write and deliver software, improve availability, scalability and efficiency of our product We constantly improve our monitoring, metrics and KPIs as well as define and implement...


  • Raleigh, United States Cisco Full time

    Who We Are Today’s results-oriented business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skillsets to be successful. This is now a mantra for our Cisco...

  • Technical Leader

    1 week ago


    Raleigh, United States Cisco Full time

    *** The successful applicant will be performing work on US Government classified environments, and therefore, must be a U.S. Person (i.e., U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee). This position may also perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil. *** Who We Are...