SITE RELIABILITY ENGINEER, Remote

Employer: SalesConsulting
Domain:
  • IT Software
  • Job type: full-time
    Job level: peste 5 years of experience
    Location:
  • BUCHAREST
  • Updated at: 01.12.2021
    Short company description

    Sales Consulting activates on the HR market since 1998, having a national coverage in several areas of expertise: recruiting and selection assessment center HR market mapping/due diligence projects, personnel leasing, payroll.

    Sales Consulting has 2 fully operational branches: Cluj-Napoca (also the head-office) and Bucharest.
    We are developing various projects (Recruitment, Training and Consultancy) in some of the most varied type of industries:
    - AUTOMOTIVE/PRODUCTION/ENGINEERING (Specialists and Middle & Top Management positions)
    - OUTSOURCING (BPO/SSC/CC)
    - IT & C (C++, C#, .NET, Java, Linux etc)
    - FMCG (sales & purchasing positions; all levels)
    - PHARMA (all levels)

    Requirements

    − Proven work experience as an SRE;
    − Hands-on experience in any cloud infrastructure provider, GCP preferably;
    − Hands-on experience in working with DevOps tools and processes;
    − Hands-on experience in Kubernetes and understand K8S architecture;
    − Hands-on experience in Linux and Docker;
    − Can do scripting and coding;
    − Basic knowledge about APIs, development processes, and agile methodology;
    − Knowledge about Git;
    − Knowledge about CI/CD;
    − Strong interest in large-scale distributed systems;
    − Knowledge about Chaos Engineering;
    − Good spoken and written English is required;
    − Bachelor’s Degree in Computer Sciences or other related fields is good to have;
    − Strong analytical skills;
    − Good communication skills;
    − Solution-oriented and proactive in reaching out to stakeholders;
    − Continuously willing to learn and improve.

    Responsibilities

    − Fix and analyze application issues and performance incidents in the production;
    − Deep knowledge of the services running and proactively work with stakeholders to resolve issues;
    − Apply DevOps practices and automate repeated tasks to drive down operational overhead;
    − Define reliability metrics (SLO/SLI), measure SLOs and apply error budget for releases;
    − Analyze, assess and keep track of infrastructure capacity;
    − Create and define Incident Response templates, work as a point of contact for stakeholders and perform root cause analysis;
    − Create and define standard operating procedures;
    − Work with security SME to ensure the security of our services;
    − Implement logging, tracing, and create required dashboards;
    − Implement best practices to ensure software releases are consistent and repeatable to shorten time-to-market;
    − Work with SRE lead to implement best SRE practices;
    − Overall responsible for keeping CI/CD functioning as expected;
    − Work closely with development teams during the development process offering education and guidance on SRE practices;
    − Participate in reviews for new features, products, and infrastructure components;
    − Share the responsibility of being on-call;
    − Proactively test the robustness of a system by adopting Chaos Engineering principles.

    Other info

    Only eligible candidates will be contacted.