Current jobs related to Senior Lead Site Reliability Engineer - Palo Alto - JPMorgan Chase & Co.


  • Palo Alto, United States Acryl Data Full time

    DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises, including Apple, CVS Health, Netflix, and Visa. Innovated jointly with a thriving open-source community of 13,000+ members, DataHub's metadata graph provides in-depth context of AI and data assets with best-in-class scalability and extensibility.The company's enterprise SaaS...


  • Palo Alto, United States Iopa Solutions Full time

    Overview Do you thrive at the intersection of engineering, reliability, and leadership? Want to shape the reliability strategy of a high-growth SaaS company that operates at true global scale? We’re searching for a Head of Site Reliability Engineering to take ownership of our reliability vision, build and mentor a high-performing SRE organisation, and...


  • Palo Alto, United States Assured Full time

    Join to apply for the Staff Site Reliability Engineer role at Assured This range is provided by Assured. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $180,000.00/yr - $210,000.00/yr Assured is on a mission to modernize insurance. Claims processing (i.e. should we pay this claim?),...


  • Palo Alto, United States Tesla Full time

    Technical Lead, Site Reliability Engineer, FleetnetJoin to apply for the Technical Lead, Site Reliability Engineer, Fleetnet role at TeslaTechnical Lead, Site Reliability Engineer, Fleetnet1 week ago Be among the first 25 applicantsJoin to apply for the Technical Lead, Site Reliability Engineer, Fleetnet role at TeslaWhat To ExpectWe are a small team of...


  • Palo Alto, United States Tesla Full time

    Site Reliability Engineer, HPC InfrastructureJoin to apply for the Site Reliability Engineer, HPC Infrastructure role at TeslaWhat To Expect Tesla's Supercomputing/AI infrastructure team works directly with the high-performance computing and machine learning infrastructure on which our ML algorithms run; this includes virtual simulations, Autopilot hardware...


  • Palo Alto, United States FLUIX Full time

    FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure. We are based in Silicon Valley. We specialize in providing AI-driven solutions for data centers and power providers, leveraging cutting-edge Machine Learning (ML) and Artificial Intelligence (AI) technologies. Our mission is to double America’s compute capacity...


  • Palo Alto, United States Theklicker Full time

    Company Description theklicker is an online platform specializing in electronic product price comparison, enabling users to browse prices across multiple booking sites effortlessly. We are dedicated to being a one-stop solution for purchasing electronic products. With a focus on delivering the best user experience, theklicker empowers users to make informed...


  • Palo Alto, United States x.ai Full time

    Site Reliability Engineer - Kubernetes Platform About xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on...


  • Palo Alto, United States Pantera Capital Full time

    About xAIxAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational...


  • Palo Alto, United States Archetype AI Full time

    Get AI-powered advice on this job and more exclusive features. About Archetype AI Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming...

Senior Lead Site Reliability Engineer

3 hours ago


Palo Alto, United States JPMorgan Chase & Co. Full time

Overview Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Senior Site Reliability Engineer at JPMorgan Chase within the (insert LOB or sub LOB), youwork with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for the services in your application and product lines. You will ensure those NFRs are accounted for in your products’ design and test phases, that your service level indicators are effectively measuring customer experience, and that service level objectives are defined with stakeholders and implemented in production. Job responsibilities Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt Works toward becoming an expert on the applications and platforms in your remit while understanding their interdependencies and limitations Evolves and debug critical components of applications and platforms Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth Makes significant contributions to JPMorgan Chase’s site reliability community via internal forums, communities of practice, guilds, and conferences Required qualifications, capabilities, and skills Formal training or certification in software engineering concepts with 5+ years of applied experience. Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc. Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines Ability to communicate data-based solutions with complex reporting and visualization methods Recognized as an active contributor of the engineering community Continues to expand network and leads evaluation sessions with vendors to see how offerings can fit into the firm’s strategy Preferred qualifications, capabilities, and skills Ability to anticipate, identify, and troubleshoot defects found during testing Strong communication skills with ability to mentor and educate others on site reliability principles and practices #J-18808-Ljbffr