Staff Site Reliability Engineer
4 weeks ago
Responsibilities:
- Support critical applications and ensure the stability of the applications by performing proactive maintenance activities.
- Engage in automation activities.
- Support application and infrastructure based on new technologies like Kubernetes containers, Kafka, Grafana, Prometheus, Elastic etc.
- Perform root cause analysis and remediation.
- Good knowledge on Cloud and VMware infrastructure.
- Good knowledge on F5 Load Balancer, TCP layer architecture.
- Good experience on Kubernetes and Docker (preferable OpenShift, MKE vendor products).
- Basic knowledge of Ansible and YAML scripting.
- Requires working knowledge of production support processes such as incident/change/problem management, call triaging, escalation procedures and such.
- Ability to write and maintain scripts to monitor system activity including application smoke test activities during pre and postproduction implementations.
- Monitor application performance (e.g. memory, logging, latency).
- Writing SQL queries for data analytics.
- Code release into Test and Production environments using industry standard deployment tools.
- Support application deployment using Chef/Jenkins.
- Support client escalated issues specific to applications (e.g. increased latency, transactional issues, features not working as expected etc.).
- Implement and maintain performance monitoring dashboards using industry standard tools (Splunk, Thousand Eyes, Keynote, Runscope, Ghost Inspector, Evolven, Graphite etc.).
Experience:
- 6 or more years of work experience with a Bachelor's Degree or 4 or more years of relevant experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or up to 3 years of relevant experience with a PhD.
- Experience with application support organization working in 24*7 environments.
- Experience in working with RDBMS DBs, Non-SQL DBs, MySQL DML/DDL, Oracle.
- Possess exceptional analytical and problem-solving skills, oral and written communication skills.
- Basic level knowledge on Active/Active setup Application.
- Experience in Production support working in a globally distributed team.
- Working experience on Java, J2EE and Python technologies.
- Experience with ServiceNow and ticketing workflows is preferred.
- Working experience with monitoring tools like SPLUNK or any other monitoring tools/processes will be advantageous.
- Prior working experience with Card and transaction domains will be advantageous.
- Should have a technical and business mindset.
- ISO 9000 and ITIL experience will be advantageous.
- Understanding of core networking concepts such as routing, protocols, subnets, DNS, Certificates, Load balancer and firewall.
- Demonstrated proficiency in troubleshooting, root cause analysis, application design, and implementing major components for large projects.
Offer:
- Annual bonus.
- Pension plan.
- Life Assurance.
- Lunch Allowance.
- Medical Insurance.
- Health and fitness financial bonus.
- Eye care reimbursement.
- Stable employment conditions based on an employment contract.
- A wide training package (soft and technical training offer, access to the e-learning platform, possibility of co-financing courses and certification).
- and more.
-
Senior Staff Site Reliability Engineer
4 weeks ago
Chicago, IL, United States WEX Inc. Full timeSenior Staff Site Reliability Engineer Apply to locations: Chicago, IL; Bay Area, CA; San Francisco, CA. About the Role The WEX Site Reliability Engineering (SRE) team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and...
-
Site Reliability Engineer
4 weeks ago
Columbia, MD, United States Geon Technologies, LLC Full timeGeon Technologies is a rapidly growing small business that provides signal processing and sensor system integration services to the United States Government (USG) and the industry base that supports them. Geon seeks to be known for “signals, sensors, and systems”. Geon has expertise in the science and development of signal processing techniques and...
-
Staff/Senior Staff Site Reliability Engineer
4 weeks ago
Foster City, CA, United States Zoox Full timeFoster City, CA • Full-time Staff/Senior Staff Site Reliability Engineer Zoox is looking for a site reliability engineer who will be responsible for measuring and maintaining the uptime of the many services critical to the development process for autonomous vehicles. In this role, you will be heavily involved in all phases of rolling out a service from...
-
Site Reliability Engineer
4 weeks ago
Chicago, IL, United States WEX, Inc. Full timeThe WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...
-
Site Reliability Engineer
4 weeks ago
Sunnyvale, CA, United States Natcast, Inc. Full timeNatcast (short for The National Center for the Advancement of Semiconductor Technology) is a new, purpose-built, non-profit entity created to operate the National Semiconductor Technology Center (NSTC) consortium, established by the CHIPS Act of the U.S. government. Working at Natcast represents an opportunity to help extend America’s leadership in...
-
Staff Site Reliability Engineer
4 weeks ago
San Francisco, CA, United States Ellation, Inc. Full timeWho We Are We're a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...
-
Site Reliability Engineer
4 weeks ago
Miami, FL, United States Royal Caribbean Group Full timeSite Reliability Engineer Journey with us! Combine your career goals and sense of adventure by joining our incredible team of employees at Royal Caribbean Group . We are proud to offer a competitive compensation and benefits package, and excellent career development opportunities, each offering unique ways to explore the world. We are proud to be the...
-
Redwood City, CA, United States C3 AI Full timeWe are looking for an Associate Site Reliability Engineer / Site Reliability Engineer to join our team at our HQ in Redwood City, CA. Responsibilities: Maximize system uptime and availability, ensuring functional and performance SLAs. Establish end-to-end monitoring and alerting on all critical aspects. Solve complex problems for critical services...
-
Site Reliability Engineer
4 weeks ago
Washington, DC, United States Alldus International Consulting Ltd Full timeOur client is a Series A startup within the Generative AI space and they are hiring a Site Reliability Engineer to join the team. Backed by one of the leading venture capital firms in the industry, this is an exciting opportunity to join a SaaS company that is revolutionizing their industry. Responsibilities: As the Site Reliability Engineer, you will...
-
Site Reliability Engineer
4 weeks ago
Aiea, HI, United States Smxtech Full timeSMX is seeking a Site Reliability Engineer to support the USINDOPACOM J6 portfolio of programs. This position is a hybrid between Camp H.M. Smith Marine Corps Base and Joint Base Pearl Harbor-Hickam in Hawaii. This position requires a DoD TS/SCI security clearance which requires US citizenship for work on DoD contracts. Responsibilities Independently manage...
-
Site Reliability Engineer
4 weeks ago
Sunnyvale, CA, United States Apple Inc. Full timeTo view your favorites, sign in with your Apple Account. Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products —...
-
Site Reliability Engineer
1 month ago
Indianapolis, IN, United States BCforward Full timeSite Reliability EngineerBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in Remote!Position Title: Site Reliability EngineerLocation: RemoteAnticipated Start Date: 12/10/2024Please note this is the target date and is subject to change. BCforward will send official notice ahead of a confirmed start date.Expected...
-
Principal Site Reliability Engineer
4 weeks ago
Sunnyvale, CA, United States Microsoft Full timeThere has never been a more exciting time to be working in healthcare at Microsoft. Our Health & Life Sciences Solutions organization is an interdisciplinary team of product managers, designers, engineers, and clinicians who are designing, developing and deploying next-generation healthcare solutions powered by the Microsoft Cloud for healthcare...
-
Site Reliability Engineer II
4 weeks ago
Redmond, WA, United States Microsoft Full timeOverviewSecurity represents the most critical priorities for our customers in a world awash in digital threats, regulatory scrutiny, and estate complexity. Microsoft Security aspires to make the world a safer place for all. We want to reshape security and empower every user, customer, and developer with a security cloud that protects them with end to end,...
-
Site Reliability Engineer II
4 weeks ago
San Francisco, CA, United States Earnest Current Job Openings Full timeThe Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable,...
-
Staff Site Reliability Engineer
4 weeks ago
Herndon, VA, United States Fortinet Full timeWe are seeking a self-motivated and experienced Senior Site Reliability Engineer to spearhead the development and expansion of our FortiSASE OpenStack infrastructure. This role demands deep expertise in both Networking and SRE practices, with a heavy focus on automation and infrastructure as code (Ansible/Terraform). If you're a seasoned professional who...
-
Site Reliability Engineer
4 weeks ago
Chicago, IL, United States Nextpoint Full timeJoin the team designing and developing innovative software solutions to meet client needs while providing expert technical support. Who we are and what we offer at Nextpoint Nextpoint delivers transformative software and services for all law-kind. Our award-winning team is 100% focused on making it simple, fluid, and affordable for law firms of all...
-
Site Reliability Engineer
4 days ago
Phoenix, AZ, United States TEKsystems Full timeOne of our Fortune 20 financial clients is looking to train up individuals for their Site Reliability Engineering division. This would consist of a 13-week boot camp starting in February 2025 and transition into a contract to hire in May 2025. Individuals must live in/near either Phoenix, AZ or Pittsburgh, PA. Training with start 100% remote, but then...
-
Senior Site Reliability Engineer
4 weeks ago
Seattle, WA, United States Apple Full timeSenior Site Reliability Engineer - ASE Seattle, Washington, United States Software and Services Imagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Join Apple’s Cloud Service...
-
Site Reliability Engineer
4 weeks ago
Los Angeles, CA, United States CV Library Full timePosition Title: Site Reliability Engineer (SRE for Datacenter) Location: REMOTE Pay Rate: $100/hr (+benefits) Assignment Length: 3-month W2 Contract Industry: Technology The Ideal Candidate will have experience with system operations and running large-scale, massively distributed infrastructure. Responsibilities: Data monitoring and alerting, data...