Systems Reliability Engineer
3 days ago
At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine's Top Company Cultures list and ranked among the World's Most Innovative Companies by Fast Company.
We realize people do not fit into neat boxes. We are looking for curious and empathetic individuals who are committed to developing themselves and learning new skills, and we are ready to help you do that. We cannot complete our mission without building a diverse and inclusive team. We hire the best people based on an evaluation of their potential and support them throughout their time at Cloudflare. Come join us
Available Locations: London, UK
About the Role
We are looking for talented Systems Reliability Engineers to build and operate our Edge platform running in more than 320 cities in over 120 countries. Our SREs come from diverse technical backgrounds and have built up their knowledge working in different environments, but common factors across all of our reliability-focused engineers include a passion for automation, scalability, and operational excellence. We support our services in a "follow the sun" model with offices in East Asia, Europe and North America.
This is a superb opportunity to join a high-performing team and scale our high-growth network as Cloudflare's business grows. We live at the boundary between systems, network, and software, and love improving the glue that holds them together. Working with us, you will build tools to constantly improve service availability, performance, and operational velocity. You will nurture a passion for an "automate everything" approach that makes systems failure resistant and ready to scale.
SREs focus on the immediate state and functionality of the Cloudflare platform around the world, leveraging an array of monitoring, alerting and diagnostics tools while developing and enhancing the Cloudflare platform and its capabilities. We own a wide portfolio of applications and services, running a tight feedback loop of developer and operator patterns. The ideal SRE candidate has a passionate curiosity about how the Internet fundamentally works and has a strong knowledge of networking, Linux and TLS along with coding ability in Go or Python.
Requisite Skills
- Aptitude for identifying problems, owning them and working with others to solve them
- Linux systems experience
- 3 years experience in an SRE role or a role with similar functions
- Software development skills in some programming language such as Go or Python
- Understanding of distributed software systems and large scale system design tradeoffs
- Intermediate experience of common network protocols like DNS and HTTP
- Understanding of routing protocols and concepts such as BGP and IP anycast
Examples of desirable skills, knowledge and experience
- Experience with the Linux kernel and Linux software packaging
- Performance analysis and debugging
- Configuration management systems such as Saltstack, Chef, Puppet or Ansible
- Load balancing and reverse proxies such as Nginx, Varnish, HAProxy, Squid or Apache
- SQL databases
- Time series databases such as OpenTSDB, Graphite, Prometheus or Grafana
- Key/Value stores
Bonus Points
- Experience with continuous / rapid release engineering
- Strong tooling and automation development experience
- Experience working in a 24/7/365 service environment
- Experience working with large scale production distributed systems
- A history of contributing to Open Source Software
Some tools that we use
- Nginx
- PostgreSQL
- Docker
- Prometheus
- Grafana
- Consul
- Nomad
- Salt
What Makes Cloudflare Special?
We're not just a highly ambitious, large-scale technology company. We're a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.
Project Galileo: We equip politically and artistically important organizations and journalists with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare's enterprise customers--at no cost.
Athenian Project: We created Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration.
1.1.1.1: We released 1.1.1.1 to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever released. Here's the deal - we don't store client IP addresses never, ever. We will continue to abide by our privacy commitment and ensure that no user data is sold to advertisers or used to target consumers.
Sound like something you'd like to be a part of? We'd love to hear from you
This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.
Cloudflare is proud to be an equal opportunity employer. We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness. All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer.
Cloudflare provides reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you require a reasonable accommodation to apply for a job, please contact us via e-mail at hr@cloudflare.com or via mail at 101 Townsend St. San Francisco, CA 94107.
-
Site Reliability Engineer
4 weeks ago
San Francisco, United States Focal Systems Full timeLocation: San Francisco - hybrid (1-2 days per week)Salary: $165-175k + stock Company Description Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar...
-
Senior Site Reliability Engineer
3 weeks ago
San Francisco, United States Focal Systems Full timeLocation: San Francisco - hybrid (1-2 days per week)Salary: $170-190k + stockCompany DescriptionFocal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail...
-
San Francisco, California, United States Focal Systems Full time**About Us:** Focal Systems is a leading retail AI solutions company based in Silicon Valley, dedicated to automating and optimizing brick-and-mortar retail using deep learning computer vision. We are a rapidly growing startup that has more than doubled in size every year since inception.**Salary and Benefits:** This role comes with an estimated salary of...
-
San Francisco, California, United States Gridware Full timeAbout GridwareGridware is a pioneering company that develops cutting-edge technologies to enhance and protect the electrical grid, which forms the backbone of our modern society. Our mission is to ensure the reliability and safety of this critical infrastructure.We are headquartered in the Bay Area, California, and backed by top climate-tech and Silicon...
-
San Francisco, California, United States WEX, Inc. Full timeAbout WEX, Inc.">WEX, Inc. is a leading provider of business and personal payment processing solutions. Our company has a strong commitment to innovation, customer service, and operational excellence.Job Summary">We are seeking an entry-level Software Development Engineer for System Reliability to join our team. As a member of our Benefits Reliability...
-
Site Reliability Engineer
2 weeks ago
San Francisco, United States Unreal Gigs Full timeAre you passionate about building and maintaining resilient systems that ensure high availability and performance? Do you excel at automating processes, troubleshooting complex issues, and creating systems that scale smoothly? If you're ready to take on the challenge of ensuring reliable, efficient, and secure system operations, our client has the perfect...
-
Reliable Systems Architect
7 days ago
San Francisco, California, United States OpenAI Full timeWe are seeking an experienced Reliability Systems Architect to join our team at OpenAI in San Francisco.This role involves designing and implementing scalable infrastructure solutions that meet the rapidly increasing demands of our users. As a key member of our engineering team, you will collaborate with cross-functional teams to ensure the reliability,...
-
Senior Site Reliability Engineer, FedRAMP
2 days ago
San Francisco, United States Cisco Systems, Inc. Full timeSenior Site Reliability Engineer, FedRAMPLocation:Area of InterestJob TypeProfessionalNetworking, SecurityJob Id1427201Who We AreCisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud,...
-
Reliability Systems Engineer
1 month ago
San Diego, United States Booz Allen Full timeReliability Systems EngineerThe Opportunity:Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in national security? You understand your customer’s environment and how to develop the right systems for their mission. Your ability to translate real-world needs into technical specifications makes...
-
Reliability Systems Engineer
1 month ago
San Diego, United States Booz Allen Full time $84,600 - $193,000Reliability Systems EngineerAll the relevant skills, qualifications and experience that a successful applicant will need are listed in the following description.The Opportunity:Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in national security? You understand your customer’s environment and...
-
Reliability Systems Engineer
1 month ago
San Diego, United States Booz Allen Full timeReliability Systems EngineerThe Opportunity:Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in national security? You understand your customer's environment and how to develop the right systems for their mission. Your ability to translate real-world needs into technical specifications makes you...
-
Site Reliability Engineer
3 weeks ago
San Francisco, United States WEX Full timeThe WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...
-
Systems Reliability Engineer, Edge
4 weeks ago
San Francisco, United States Cloudflare, Inc. Full timeAbout UsAt Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without...
-
Reliability Systems Engineer
2 weeks ago
San Diego, United States Booz Allen Hamilton Full timeJob Number: R0208761Reliability Systems Engineer The Opportunity: Are you looking for an opportunity to combine your technical skills with big picture thinking to make an impact in national security? You understand your customer's environment and how to develop the right systems for their mission. Your ability to translate real-world needs into technical...
-
Software Engineer, Reliability
3 weeks ago
San Francisco, United States OpenAI Full timeJoin the engineering teams that bring OpenAI’s ideas safely to the world!The Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely....
-
Cloudflare Systems Reliability Engineer
4 days ago
San Francisco, California, United States Cloudflare, Inc. Full timeWe are Cloudflare, a highly ambitious and large-scale technology company with a soul. Our mission is to help build a better Internet by protecting the free and open Internet.As a key member of our team, you will play a crucial role in building and operating our Edge platform running in over 320 cities across more than 120 countries. This is an exceptional...
-
Security Systems Reliability Engineer
4 weeks ago
San Francisco, United States Cloudflare, Inc. Full timeAbout UsAt Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world's largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without...
-
Senior Site Reliability Engineer, FedRAMP
3 weeks ago
San Francisco, United States Cisco Systems Full timeWho We Are Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and...
-
Site Reliability Engineer
2 days ago
San Francisco, United States Bun Full timeBun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...
-
Site Reliability Engineer
4 weeks ago
San Francisco, United States Ellation, Inc. Full timeWho We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...