Site Reliability Engineer

1 week ago


San Leandro, United States United Software Group Full time
Site Reliability Engineer/Production Support

San Leandro, CA or Charlotte, NC - Day 1 Onsite - Hybrid

12 Months

Description:
  • 5-10 years of experience in Production support/SRE teams with continued focus on improving Platform health
  • Experience working in Micro service architecture.
  • Hands-on Java coding Exp and able to analyse and trouble shoot production issues by reading stack trace and exceptions.
  • Familiar with Agile or other rapid application development practices
  • Hands-on expertise in building monitoring dashboards and setting up alerts using Splunk.
  • Hands-on experience in writing Oracle SQL queries and MongoDB queries.
  • Experience with distributed (multi-tiered) systems, algorithms, and relational databases.
  • Must have working knowledge of APM tools such as Splunk, ELK, Grafana, Prometheus etc.
  • Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka is a plus
  • Working knowledge of CICD is a plus - Source control like Git/Bit bucket , Continuous Integration - Jenkins / UCD Release etc.
  • Ability to work with Engineering teams across the ecosystem such as Security , Networking & Infrastructure challenges which can impact platform health & resiliency.
  • Shell Scripting / DevOps tools like Ansible with good knowledge of YAML file to write playbooks .
  • Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / Open Shift.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Expectations:
  • You will be a core member of a SRE support team, will be utilizing the latest technology tools to write code, test cases, working with API specs and automate to maintain the resiliency, performance and availability of Digital Sales & Marketing platforms.
  • Strong & relevant experience in supporting Web/API platforms built using Java/java script Stack (Spring/Spring boot, JavaScript -Angular/react)
  • Proficiency in dealing with Legacy infrastructure along with cloud infrastructure (on prem & 3rd party) such as PCF or Azure.
  • Identifying opportunities to adopt to new technologies while improving the efficiency by removing toil and continues to drive efficiency & optimization.
  • Proactive monitoring of app performance through Splunk, App dashboards, App dynamics & Dynatrace etc.
  • Represent Platform engineering teams during production outages and collaborate with engineering teams to resolve production outages. Collaborate with stake holders across engineering function to own/derive RCA & work towards permanent resolution.
  • Plan, support, execute and comply with governance programs/processes in support of a strong control environment in your functional area. Leverage process documentation to improve operational controls and identify and remediate process deficiencies.
  • Proactively identify, communicate, mitigate and escalate risk originating from non-compliance of processes, operational errors, and data integrity issues in all applicable processes.
  • Ability to influence SRE practices within and outside teams to enable a strong DevOps culture within the organization
  • Responsible for working with Engineering teams to maintain the SLAs & SLOs. Constantly looking out for opportunities to improve platform metrics & communicate the same to stakeholders.


Tech Stack : Java/J2EE ( Spring, spring boot, python, shell scripting).

Exposure and proficiency in different API styles such as SOAP, REST, Micro services etc.

  • San Francisco, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Francisco, United States WEX Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • San Francisco, United States Bun Full time

    Bun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...


  • San Jose, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Unreal Gigs Full time

    Are you passionate about building and maintaining resilient systems that ensure high availability and performance? Do you excel at automating processes, troubleshooting complex issues, and creating systems that scale smoothly? If you're ready to take on the challenge of ensuring reliable, efficient, and secure system operations, our client has the perfect...


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • san francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, California, United States WEX Inc Full time

    The WEX Site Reliability Engineering team is looking for a motivated Site Reliability Engineer to join our Benefits Reliability organization. As a member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Learning and Development: Participate in training and mentorship programs to...


  • San Francisco, United States Focal Systems Full time

    Location: San Francisco - hybrid (1-2 days per week)Salary: $165-175k + stock Company Description Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar...


  • San Jose, United States VDart Full time

    Job Title: Lead Site Reliability Engineer Location: San Jose, CA (2 Days Hybrid) Term: Contract Job Description: Responsibilities: Please look for 14 years of hands-on Coding/scripting (Ansible), Python, and Cloud Computing About the Role • We seek a highly skilled and dynamic Site Reliability Engineer - Consultant In this role you will • Maintain and...


  • San Francisco, United States Perplexity AI Full time

    Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet. You will be responsible for leading the design, implementation, and scaling of the infrastructure and systems that support our web and mobile products. The ideal candidate should have experience in designing...


  • san antonio, United States Dunhill Professional Search & Government Solutions Full time

    Site Reliability EngineerU.S. CITIZENSHIP REQUIRED Profile/Role DescriptionLooking for highly motivated Site Reliability Engineer (SRE). Resource will provide integration and operational support for production and lower environment cloud and web based applications with JBoss, MQ, Apache, VMWare, F5 Load balancer and other network and cloud components....


  • san antonio, United States Dunhill Professional Search & Government Solutions Full time

    Site Reliability EngineerU.S. CITIZENSHIP REQUIRED Profile/Role DescriptionLooking for highly motivated Site Reliability Engineer (SRE). Resource will provide integration and operational support for production and lower environment cloud and web based applications with JBoss, MQ, Apache, VMWare, F5 Load balancer and other network and cloud components....


  • San Antonio, United States Dunhill Professional Search & Government Solutions Full time

    Site Reliability EngineerU.S. CITIZENSHIP REQUIRED Profile/Role DescriptionLooking for highly motivated Site Reliability Engineer (SRE). Resource will provide integration and operational support for production and lower environment cloud and web based applications with JBoss, MQ, Apache, VMWare, F5 Load balancer and other network and cloud components....


  • San Jose, United States NInfo Systems, Inc. Full time

    Company DescriptionNInfo Systems Inc. is a Certified minority-owned national IT Recruiting and Solutions provider with two decades of experience. It works with Fortune 500 corporations, mid-sized companies, Boutique Consulting companies, startups, SME-level organizations, Federal/ State agencies, and tier-one vendors.Role: Senior Reliability Engineer, Hybrid...


  • San Francisco, CA, United States Earnest Current Job Openings Full time

    The Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable,...


  • San Ramon, United States Litmus7 Full time

    Location - San Ramon, CA - ONSITE - NO REMOTEA Site Reliability Engineer is a professional who acts as a warrior to monitor, protect customer applications, taking charge on operational tasks to ensure the efficient functioning of a system. They are responsible for monitoring, automating, and improving the reliability, performance, and availability of any...