Spark/Scala/Python Data Engineer

Angajator: Luxoft Romania
Domeniu:
  • IT Software
  • Tip job: full-time
    Nivel job: 1 - 5 ani experienta
    Orase:
  • BUCURESTI
  • Actualizat la: 25.03.2019
    Scurta descriere a companiei

    Luxoft is a global IT service provider with more than 12800 expert skilled software engineers onboard that create high-end business solutions for the world’s largest brands. Luxoft’s global client base consists primarily of large multinational corporations.
    We serve over 150 clients from our delivery centers in North America, Mexico, Western and Eastern Europe, Asia Pacific, and South Africa. Luxoft is listed on the New York Stock Exchange (NYSE:LXFT). With deep domain expertise in the finance, technology, automotive, telecom, travel & aviation and energy, the company consistently goes beyond its clients’ expectations by bringing together technology, talent, innovation, and the highest quality standards.

    Reasons to join us
    • Attractive salary and benefits package
    • We invest into your professional training including business domain knowledge, and allow you to grow your professional career.
    • We encourage creative-thinking into an open-minded work environment. Frequently the relaxation rooms are the place where the most ambitions ideas are born.
    • We are not just professional teams, we are also friends that have fun working together
    • If you are an active person and you feel motivated by the creation/development of the software solutions, then this is the place to be, you will not get bored.



    Cerinte

    Experience in implementations of end-to-end data processing chains, Big data architectures (Hadoop cluster, noSQL databases, Elastic search) mastering languages ​​and frameworks for distributed data processing (Spark / Scala).
    • Basic knowledge and interest in the development of ML algorithms
    • Knowledge of the ingestion framework
    • Knowledge of Spark and its different modules
    • Mastery of Scala and / or Python
    • Knowledge of the AWS or GCP ecosystem
    • Knowledge of the ecosystem of NOSQL databases
    • Knowledge in the construction of APIs of data products
    • Knowledge of Dataviz tools and libraries
    • Popularization of complex systems
    • Expertise in data testing strategies
    • Strong problem solving, intelligence, initiative and ability to withstand pressure
    • Excellent interpersonal skills and a great sense of communication (ability to go into detail)

    Responsabilitati

    During project definition
    • Design of data ingestion chains
    • Design of data preparation chains
    • Basic ML algorithm design
    • Data product design
    • Design of NOSQL data models
    • Design of data visualizations
    • Participation in the selection of services / solutions to be used according to the uses
    • Participation in the development of a data toolbox

    During the iterative realization phase
    • Implementation of data ingestion chains
    • Implementation of data preparation chains
    • Implementation of basic ML algorithms
    • Implementation of data visualizations
    • Using ML framework
    • Implementation of data products
    • Exposure of data products
    • Setting up NOSQL databases
    • Implementation in distributed mode of treatments
    • Use of functional languages
    • Debugging distributed processes and algorithms
    • Identification and cataloging of reusable elements
    • Contribution on data processing problems

    During integration and deployment
    • Expertise in the implementation of end-to-end data processing chains
    • Mastery of distributed development

    Alte informatii

    Contribute to the business value of data-oriented products based on Datalake on-premise or cloud environments, by implementing end-to-end data processing chains, from ingestion to exhibition APIs and data visualization.
    As responsibilities, you will maintain the high quality of data transformed in the Datalake, smooth operation of the data processing chains and optimization of the use of the resources of the on-premise or cloud clusters.