SRE (Site Reliability Engineer)

Employer: K2View
Domain:
  • Engineering
  • IT Hardware
  • IT Software
  • Job type: full-time
    Job level: 1 - 5 years of experience
    Location:
  • BUCHAREST
  • Updated at: 28.11.2021


    K2view is looking for SRE (Site Reliability Engineer) positions as follows:

    K2view is looking for a Site Reliability Engineer (SRE) to join our SRE team and help maintain superb performance of K2view mPaaS and PaaS offering focusing on service Availability, Reliability, Visibility and performance. K2view service runs on one of the major IaaS providers more specifically AWS, Azure & GCP clouds.

    As a member of the SRE team, you’ll be working closely with K2view DevOps, Tier 1, R&D, COE (Center of Excellence) and K2view Customers. You will be part of the group taking care of K2view mPaaS/PaaS Availability, Reliability & Performance, while designing and building cutting edge monitoring and alerting services to be able to achieve 99.9% of availability.

    Job Summary

    At K2View, we’re passionate about big data systems and data management platforms. We count on our site reliability engineers (SREs) to empower our Customers with a rich infrastructure and monitoring tools to maintain high availability, Reliability, and stellar performance level to pursue their objectives. As we expand our customer deployments, we are currently seeking an SRE to deliver insights from massive scale data in real time. Our SRE’s are responsible for creating, configuring and maintaining monitoring environments and tools. They are experts in analyzing production systems metrics, identifying the root cause of systems performance issues and taking reactive/proactive actions to remain the system in healthy state.

    Responsibilities

    Provide 24*7 monitoring of customers production systems
    Creating monitoring dashboards and setting thresholds for tracking overall systems health

    • Provide SLA Infrastructure and Dashboards for service availability
    • Provide Generate periodic system health reports.
    • Identify system trends and prevent production failures
    Become part of the team building the Monitoring Infrastructure, Measure and optimize production system performance
    For critical production issues run initial triaging & Open escalation bridge
    Investigation, Recording and analysis of production Errors
    Run daily production processes
    Where applicable, restore the system to operational state
    Support Tier1 group investigations if required
    Support K2view offering mPaaS (managed PaaS) installation for new customers
    Manage deployment for on-going Change Requests
    Run various production Investigation such as Cassandra, Golden Gate & Kafk

    Requirements:
    • Bachelor’s degree in computer science or other highly technical, scientific discipline
    • Some experience in Linux and Windows operating systems
    • Ability to program (structured and Object Oriented) with one or more high level languages (Java – advantage)
    • Ability to analyze/debug large and complicated systems
    • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
    • Some Experience in SQL/NoSQL Databases such as PostgreSQL, SQLite etc.
    • MUST: Fluent English speaking/writing skills
    • Availability for at least 5 shifts a week (including night shifts, Friday and Saturday shifts)
    • Huge Advantage: Previous experience as SRE member for SaaS/PaaS offering running on one of the major Clouds: AWS, Azure or GCP.