Site Reliability Engineer
2 weeks ago
Come work at a place where innovation and teamwork come together to support the most exciting missions in the world
Site Reliability Engineer, Cloud Platform
** The successful applicant will be performing work in FedRAMP environments, and therefore, must be a U.S. Person (i.e. U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee). This position may also perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil. **
Description
Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation and improvement by applying scientific principles.
Increase the effectiveness, reliability and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.
Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies
Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis
Requirements
4+ years of relevant experience in running distributed systems at scale in production.
Expertise in one of the programming language: Java, Python or Go.
Proficient in writing bash scripts
Good understanding of SQL and NoSQL systems
Good understanding of systems programming (network stack, file system, OS services)
Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc.
Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
Able to drive results and set priorities independently
BS/MS degree in Computer Science, Applied Math or related field
Bonus Points if you have:
Experience with managing large scale deployments of search engines like Elasticsearch
Experience with managing large scale deployments of message-oriented middleware such as Kafka
Experience with managing large scale deployments of RDBMS systems such as oracle
Experience with managing large scale deployments of NoSQL databases such as Cassandra
Experience with managing large scale deployments of In-memory caching using Redis, Memcached, etc.
Experience with container and orchestration technologies such as Docker, Kubernetes etc
Experience with monitoring tools such as Graphite, Grafana and Prometheus
Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
Experience with configuration management tools such as Chef, Puppet or Ansible
In-depth experience with continuous integration and continuous deployment pipelines
Exposure to Maven, Ant or Gradle for builds
Qualys is an Equal Opportunity Employer, please see our EEO policy.
-
Site Reliability Engineer
6 days ago
Raleigh, United States Bandwidth Full timeApply Now Site Reliability Engineer at Bandwidth Raleigh, NC Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie....
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Bandwidth Full timeApply Now Site Reliability Engineer at Bandwidth Raleigh, NC Site Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie....
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Bandwidth Recruitment Full timeSite Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health...
-
Site Reliability Engineer
5 days ago
Raleigh, United States Bandwidth Inc. Full timeSite Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health...
-
Site Reliability Engineer
5 days ago
Raleigh, North Carolina, United States Bandwidth Inc. Full timeSite Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana alerts and Application Health...
-
Site Reliability Engineer
2 weeks ago
Raleigh, United States Bandwidth Full timeJob DescriptionJob DescriptionSite Reliability Engineer (Raleigh, NC) Duties: Work closely with leadership and internal partners to ensure that software meets security, SLA, performance, and capacity requirements. Set up and maintain monitoring tools and systems to detect issues using Datadog Monitors and Alert using OpsGenie. Configure Datadog and Grafana...
-
Senior Site Reliability Engineer
4 weeks ago
Raleigh, United States Envestnet Asset Management, Inc Full timeSenior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply locations Raleigh time type Full time posted on Posted 30+ Days Ago job requisition id Req 20.357 - It's fun to work in a company where people truly BELIEVE in what they're doing! We're committed to bringing passion and customer focus to the business. Job Description...
-
Site Reliability Engineer
5 days ago
Raleigh, United States Raleigh Founded Full timeThe Role Kaleido is growing rapidly and seeking a Site Reliability Engineer with a passion for enterprise blockchain software. The successful candidate will be a highly skilled DevOps Engineer who is also passionate about systems stability, security and operational efficiency. If this describes you and you are able to thrive in a fast paced, high-growth, and...
-
Sr Site Reliability Engineer
7 days ago
Raleigh, United States Allscripts Full timeWelcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today’s healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...
-
Sr Site Reliability Engineer
7 days ago
Raleigh, United States Veradigm Full timeWelcome to Veradigm, where our Mission is transforming health, insightfully. Join the Veradigm team and help solve many of today's healthcare challenges being addressed by biopharma, health plans, healthcare providers, health technology partners, and the patients they serve. At Veradigm, our primary focus is on harnessing the power of research, analytics,...
-
Expert Site Reliability Engineer
6 days ago
Raleigh, United States Veradigm® Full timeWelcome to Veradigm! Our Mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our Vision is a Connected Community of Health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...
-
Reliability Engineer
1 week ago
Raleigh, United States DSJ Global Full timeJob Title: Reliability Engineer Industry: Chemicals/Food & Beverage Location : North Carolina DSJ Global is currently partnered with a Fortune 500 manufacturing company based out of North Carolina who are looking for their next Reliability Engineer. As the Reliability Engineer, you will be responsible for the development, direction, supervision, and day to...
-
Sr. Reliability Engineer I
5 days ago
Raleigh, United States Biogen Full timeThe Sr. Reliability Engineer I applies Reliability Engineering methodologies to optimize design requirements and performance of critical assets across the site. Originates and develops analysis methods for determining reliability of components, equipment and processes. Acquires data and analyzes the data. Prepares and communicates information to define...
-
Site Reliability Engineer
5 days ago
Raleigh, United States Cisco Full timeLocation: RTP, North Carolina, US Area of Interest Job Type Professional Cloud and Data Center, Software Development Job Id 1421649 Who We Are Today’s business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer...
-
Senior Site Reliability Engineer
2 weeks ago
Raleigh, United States Cisco Full timeWho We Are Today's business environment is more than that - it's a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skills to be successful. This is now a mantra for our Cisco leadership team and for us. The...
-
Software Engineer
6 days ago
Raleigh, United States Celonis GmbH Full timeThe Team: Site Reliability Engineering The Role: You will be part of a highly technical, collaborative and creative team, with a focus on SRE & Software Engineering We design, write and deliver software, improve availability, scalability and efficiency of our product We constantly improve our monitoring, metrics and KPIs as well as define and implement...
-
Senior Site Reliability Engineer
2 weeks ago
Raleigh, United States Cisco Full timeWho We Are Today’s results-oriented business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer experience, we need great talent and the right skillsets to be successful. This is now a mantra for our Cisco...
-
Technical Leader
1 week ago
Raleigh, United States Cisco Full time*** The successful applicant will be performing work on US Government classified environments, and therefore, must be a U.S. Person (i.e., U.S. citizen, U.S. national, lawful permanent resident, asylee, or refugee). This position may also perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil. *** Who We Are...
-
Senior Site Reliability Developer
6 days ago
Raleigh, United States Oracle Full timeOracle Senior Site Reliability Developer Raleigh , North Carolina Apply Now Customers rely on Oracle Cloud Infrastructure (OCI) to power their business as they tackle some of the world’s biggest challenges. We’re looking for Senior Site Reliability Developers/Engineers who would be responsible for Advanced Operations (AO) and critical issues of...
-
Senior Site Reliability Engineer
7 days ago
Raleigh, United States Cisco Full timeLocation: RTP, North Carolina, US Area of Interest Job Type Professional Cloud and Data Center, Software Development Job Id 1421650 Who We Are Today’s business environment is more than that – it’s a period of disruption between the pandemic, global business change and internal process complexity. For us to focus on simplicity and the best customer...