Site Reliability Engineer

This job is no longer active!

View all jobs Booking Holdings active

View all jobs Site Reliability Engineer active on

View all jobs Engineering active on

View all jobs IT Hardware active on

View all jobs IT Software active on

Employer: Booking Holdings
  • Engineering
  • IT Hardware
  • IT Software
  • Job type: full-time
    Job level: peste 5 years of experience
  • Brasov
  • Iasi
  • nationwide
    Updated at: 10.05.2023
    Job remote: Remote(from home)
    Short company description

    Booking Holdings Romania is a Center of Excellence based in Bucharest, Romania and was created to support the increasing business demands of the Booking Holdings Brands. The Center of Excellence provides access to specialized and highly skilled talent, leading industry best practices, and collaboration opportunities across all of our Brands.

    As part of our Booking Holdings Romania team, you will have the opportunity to be a part of the world’s leading provider of online travel, with a mission of making it easier for everyone to experience the world through six-primary consumer-facing brands:, Priceline, Agoda, KAYAK, OpenTable and


    The core premise for SRE lies in treating operations as a software problem where operations are concerned with addressing availability, scalability, latency and efficiency for’s systems & services. At its core the SRE is tasked with engineering efforts to solve complex problems, requiring a strong aptitude to develop software systems that will minimize (i.e. through automation) human labor and increase system & service reliability. A Booking Reliability Engineering team has full vertical ownership of a system, from the server configuration up to the application interfaces. This enables the team to have full control on a service, and avoid situations where different teams own different areas of a system and some parts fall between the cracks.

    SRE can wear several hats; at times an SRE might be part of the product development team themselves and other times will act as a consultant to support a product development team to implement the Booking Reliability Engineering best practices. As systems & services grow in size and complexity so too does the operational overhead. It is a fundamental principle of SRE to break this relationship between operational toil, system size and complexity. This also requires the team to limit operations work enforcing engineering development efforts that is at the heart of Booking Reliability Engineering.

    Ultimately the fundamental software engineering skills coupled with strong systems and networking knowledge will guide the SRE to create more reliable systems & services that are highly available, which scales with growth and that is efficient and latency sensitive.


    Solid experience in at least one programming language. We use Java, Python, Go, Ruby, Perl;
    Experience with building, operating and maintaining scalable distributed systems, and with operations automation; 
    Experience with Infrastructure as Code technologies;
    Knowledge of cloud computing fundamentals;
    Solid foundation in Linux administration and troubleshooting;
    Understanding of Service level agreements and objectives;
    Additional experience in OpenStack, Kubernetes, Networking, Security or Storage is desirable;
    Monitoring / observability technologies like Prometheus, Graphite, Grafana, Kibana, Elasticsearch are a plus; 
    Good interpersonal skills 
    Proficient command of the English language, both written and spoken



    Building software applications
    Is responsible to build software applications by using relevant development languages and applying knowledge of systems, services and tools appropriate for the business area
    Is responsible to write readable and reusable code by applying standard patterns and using standard libraries
    Is responsible to refactor and simplify code by introducing design patterns when necessary
    Is responsible to ensure the quality of the application by following standard testing techniques and methods that adhere to the test strategy
    Is responsible to maintain data security, integrity and quality by effectively following company standards and best practices

    End to End System Ownership
    Is responsible to own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated
    Is responsible to reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs
    Is responsible to reduce risk and obtain customer feedback by using continuous delivery and experimentation frameworks
    Is responsible to independently manage an application or service by working through deployment and operations in production

    Software Systems Design
    Has sufficient knowledge to evaluate possible architecture solutions by taking into account cost, business requirements, technology requirements 
    Has sufficient knowledge to describe the implications of changing an existing system or adding a new system to a specific area, by having a broad, high-level understanding of the infrastructure and architecture of our systems
    Has sufficient knowledge to help grow the business and/or accelerate software development by applying engineering techniques (e.g. prototyping, spiking and vendor evaluation) and standards
    Has sufficient knowledge to meet business needs by designing solutions that meet current requirements and are adaptable for future enhancements 

    Technical Incident Management
    Is responsible to address and resolve live production issues by mitigating the customer impact within SLA
    Is responsible to improve the overall reliability of systems by producing long term solutions through root cause analysis
    Is responsible to keep track of incidents by contributing to postmortem processes and logging live issues

    Automation and toil reduction
    Is responsible to ensure that infrastructure stays current by reducing technical debt, searching for bottlenecks and preparing for scaling
    Is responsible to reduce  cost of operations and maintenance by leveraging new technologies, automation, and partner with vendors to ensure we stay current
    Is responsible to reduce human labour by writing small software features that address availability, scalability, latency and efficiency

    Monitoring and Alerting improvements
    Is responsible to review and verify performance of production systems and network infrastructure by continuously monitoring appropriate observability metrics, business KPIs and capacity planning
    Is responsible to improve application reliability by partnering with development teams to advise on setting appropriate observability metrics

    Architectural Guidance
    Has basic knowledge to advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape
    Has basic knowledge to set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholder

    Critical Thinking
    Is responsible to systematically identify patterns and underlying issues in complex situations, and to find solutions by applying logical and analytical thinking.
    Is responsible to constructively evaluate and develop ideas, plans and solutions by reviewing them, objectively taking into account external knowledge, initiating 'SMART' improvements and articulating their rationale.

    Continuous Quality and Process Improvement
    Is responsible to identify opportunities for process, system and structural improvements (i.e performance gains) by examining and evaluating current process flows, methods and standards.  
    Is responsible to design and implement relevant improvements by defining adapted/new process flows, standards, and practices that enable business performance.

    Effective Communication
    Is responsible to deliver clear, well-structured, and meaningful information to a target audience by using suitable communication mediums and language tailored to the audience
    Is responsible to achieve mutually agreeable solutions by staying adaptable, communicating ideas in clear coherent language and practicing active listening
    Is responsible to ask relevant (follow-up) questions to properly engage with the speaker and really understand what they are saying, by applying listening and reflection techniques

    Other info


    Contributing to a high scale, complex, world renowned product and seeing real-time impact of your work on millions of travelers worldwide
    Working in a fast-paced and performance driven culture
    Technical, behavioral and interpersonal competence advancement via on-the-job opportunities, experimental projects, hackathons, conferences and active community participation
    Competitive compensation and benefits package 
    Vast amounts of data to validate your ideas and the opportunity to experiment with real users. is proud to be an equal opportunity workplace and is an affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We strive to move well beyond traditional equal opportunity and work to create an environment that allows everyone to thrive.