Site Reliability Engineer

3 weeks ago


San Leandro, United States RIT Solutions, Inc. Full time

5-10 years of experience in Production support/SRE teams with continued focus on improving Platform health
Experience working in Micro service architecture.
Hands-on Java coding exp and able to analyze and trouble shoot production issues by reading stack trace and exceptions.
Familiar with Agile or other rapid application development practices
Hands-on expertise in building monitoring dashboards and setting up alerts using Splunk.
Hands-on experience in writing Oracle SQL queries and MongoDB queries.
Experience with distributed (multi-tiered) systems, algorithms, and relational databases.
Must have working knowledge of APM tools such as splunk, ELK, Grafana, Prometheus etc
Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka is a plus
Working knowledge of CICD is a plus - Source control like Git/Bitbucket , Continuous Integration - Jenkins / UCD Release etc .
Ability to work with Engineering teams across the ecosystem such as Security , Networking & Infrastructure challenges which can impact platform health & resiliency.
Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks .
Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / OpenShift.
A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.

Expectations:
You will be a core member of a SRE support team, will be utilizing the latest technology tools to write code, test cases, working with API specs and automate to maintain the resiliency, performance and availability of Digital Sales & Marketing platforms.
Strong & relevant experience in supporting Web/API platforms built using Java/java script Stack (Spring/Spring boot, Javascript -Angular/react)
Proficiency in dealing with Legacy infrastructure along with cloud infrastructure (on prem & 3rd party) such as PCF or Azure.
Identifying opportunities to adopt to new technologies while improving the efficiency by removing toil and continues to drive efficiency & optimization.
Proactive monitoring of app performance through splunk, App dashboards, App dynamics & Dynatrace etc.
Represent Platform engineering teams during production outages and collaborate with engineering teams to resolve production outages. Collaborate with stake holders across engineering function to own/derive RCA & work towards permanent resolution.
Plan, support, execute and comply with governance programs/processes in support of a strong control environment in your functional area. Leverage process documentation to improve operational controls and identify and remediate process deficiencies.
Proactively identify, communicate, mitigate and escalate risk originating from non-compliance of processes, operational errors, and data integrity issues in all applicable processes.
Ability to influence SRE practices with in and outside teams to enable a strong DevOps culture with in the organization
Responsible for working with Engineering teams to maintain the SLAs & SLOs. Constantly looking out for opportunities to improve platform metrics & communicate the same to stakeholders.
Tech Stack : Java/J2EE ( Spring, spring boot, python, shell scripting).
Exposure and proficiency in different API styles such as SOAP, REST, Micro services etc.



  • San Diego, United States ObjectWin Technology Full time

    Job Title: Site Reliability Engineer Location: San Diego, CA or Remote in CA Duration: 6 Months Description: It is an exciting time to be part of SIEs CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make PlayStation highly reliable,...


  • San Francisco, United States Vertisystem Full time

    Duration: 6 months contract Pay rate: $90/hr on W2 Job Summary: It is an exciting time to be part of the organization’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make the organization highly reliable, scalable, operable and...


  • San Francisco, United States Vertisystem Full time

    Duration: 6 months contractPay rate: $90/hr on W2Job Summary:It is an exciting time to be part of the organization’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make the organization highly reliable, scalable, operable and...


  • San Francisco, United States Vertisystem Full time

    Duration: 6 months contractPay rate: $90/hr on W2Job Summary:It is an exciting time to be part of the organization’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make the organization highly reliable, scalable, operable and...


  • San Francisco, United States Vertisystem Full time

    Duration: 6 months contract Pay rate: $90/hr on W2 Job Summary: It is an exciting time to be part of the organizations CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make the organization highly reliable, scalable, operable and...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates only Job Title: Site Reliability Engineer Location: San Diego, CA (Open to other locations in California) Job Description: It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates only Job Title: Site Reliability Engineer Location: San Diego, CA (Open to other locations in California) Job Description: It is an exciting time to be part of SIEs CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates onlyJob Title: Site Reliability EngineerLocation: San Diego, CA (Open to other locations in California)Job Description:It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates onlyJob Title: Site Reliability EngineerLocation: San Diego, CA (Open to other locations in California)Job Description:It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team...


  • San Diego, United States ACL Digital Full time

    W2 Contract/ Local candidates only Job Title: Site Reliability Engineer Location: San Diego, CA (Open to other locations in California) Is this the role you are looking for If so read on for more details, and make sure to apply today. Job Description: It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs...


  • San Diego, United States Talent Software Services Full time

    Site Reliability Engineer - Senior (NE) Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. Po...


  • San Diego, United States PEAK Technical Staffing USA Full time

    Hiring Senior Site Reliability Engineer;primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabilities. NOTE: Must have build out experience with Kubernetes.This position...


  • San Diego, United States PEAK Technical Staffing USA Full time

    Hiring Senior Site Reliability Engineer; primary responsibilities will include contributing to the implementation and delivery of the end-to-end automation platform, to support continuous integration and continuous delivery (CI/CD), with a focus on developer self-service capabilities. NOTE: Must have build out experience with Kubernetes. This position...


  • San Diego, United States Talent Software Services Full time

    Site Reliability Engineer - Senior (NE) Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. Position Summary: As a member of the CICD and Cloud Reliability team you'll work at the heart of the...


  • San Diego, United States Talent Software Services Full time

    Site Reliability Engineer - Senior (NE) Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. Position Summary: As a member of the CICD and Cloud Reliability team you'll work at the heart of...


  • San Francisco, California, United States Observable Full time

    Observable is seeking a full-time infrastructure and site reliability engineer to help improve, administrate, and grow Observable systems as we scale to meet our customer's needs.What you will doPerform site reliability and ops work for Observable production and staging environments. (Manage servers Tweak WAF rules Optimize SQL queries And more)Design and...


  • San Francisco, United States hims & hers Full time

    About the Role: We are seeking a Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things. You Will: Design and implement SRE practices ensuring availability,...


  • San Diego, CA, United States Talent Software Services Full time

    Site Reliability Engineer - Senior (NE) Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. Position Summary: As a member of the CICD and Cloud Reliability team you'll work at the heart of...


  • San Diego, CA, United States Talent Software Services Full time

    Site Reliability Engineer - Senior (NE) Job Summary: Talent Software Services is in search of a Site Reliability Engineer - Senior (NE) for a contract position in San Diego, CA. The opportunity will be one year with a strong chance for a long-term extension. Position Summary: As a member of the CICD and Cloud Reliability team you'll work at the heart of...


  • San Ramon, United States The LaSalle Group Full time

    LaSalle Network has partnered with a well-established software provider that's based in San Ramon, CA, who's in need of a well-rounded, Site Reliability Engineer (SRE) - Grafana Observability - with a strong background in Grafana and related tools such as Prometheus and Telegraf. The ideal candidate will play a crucial role in accelerating the transition of...