Site Reliability Engineer, SRE Team
Job Description
Join our team as a Site Reliability Engineer and contribute to designing robust and scalable system architectures. You'll collaborate with development teams to ensure reliable and efficient systems. Responsibilities include: - Designing and implementing scalable system architectures. - Establishing and refining SLOs with stakeholders. - Coding in Python/Go. - Inducing application failure and implementing recovery strategies. - Debugging applications using metrics and adding necessary traces/metrics. - Participating in on-call duties for continuous support. - Leading changes in engineering practices. - Working possible night shifts (on-call).
Qualifications
We are looking for a Site Reliability Engineer with: 1. 3+ years of SRE experience. 2. Experience with Kubernetes, Helm, and Cloud providers. 3. Proficiency in Python/Go coding. 4. A strong understanding of application failure handling. 5. Ability to debug applications using metrics. 6. Familiarity with traces implementation. 7. Willingness to be on-call with flexible hours. 8. A collaborative team player with good communication skills. 9. GCP knowledge is a plus.
Benefits
We offer: - Unlimited PTO - Hobby & team building budget allowance - Employee Support Program - Loss of family member financial aid - Employee Resource Groups
Apply Now
