Skip to main content

Data Engineer

Experience: 5-7
Posted: 01 Feb,2022
Location: United Arab Emirates
ROLE PURPOSE: 


The Data Engineer is responsible for development and maintenance of data pipelines. Data engineer develops the required pipelines closely working with Data Analyst and Data Modeler roles to ensure that the required data is loaded to Data Lake / Data Warehouses. This role is also responsible for day to day operations and support. They ensure that data is timely and completely loaded. The role will help data engineering team in the planning, evaluating, and selecting the right data engineering capabilities for the Bank. The role serves as an advisor to the data engineering team to develop and deliver enterprise-wide solutions for complex data analytic challenges.


Key Accountability of the role


  • Building and optimizing ‘big data’ data pipelines architectures and data sets; both batch-oriented and real time.
  • Work hands on with tools such as PySpark, Scala, Python & Hive SQL for high volume distributed data processing.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Spark and ‘big data’ technologies.
  • Design recommend best approach suited for data movement from different sources to HDFS using Kafka
  • Provide expertise and hands on experience on custom connectors using the Kafka core concepts and API.
  • Create stubs for producers, consumers and consumer groups for helping onboard applications from different languages/platforms. 
  • Leverage Hadoop ecosystem knowledge to design, and develop capabilities to deliver our solutions using Spark, Scala, Python, Hive, Kafka and other tools in the Hadoop ecosystem.


Specialist Skills / Technical Knowledge, Technical Competencies Required for this role:


  • Extensive experience working with Big Data tools and building data solutions for advanced analytics. Experience on Cloudera is a plus
  • Practical knowledge of data ingestion using ETL tools (Informatica BDM), as well as more recent big data tools (NiFi, Talend, etc.)
  • Expertise on NOSQL databases like Kudu, HBase, MONGO etc.
  • Hands on experience on major components of Hadoop Ecosystem like HDFS, HIVE, Impala, Oozie, Sqoop, Spark and YARN.
  • Extensive experience on streaming processing & analytics using KAFKA, Spark streaming ,Flume or Striim
  • Experience in programming languages and tools (Java, Python, Scala and shell scripting)
  • Experience in automation tools like provisioning using Jenkins
  • Good knowledge in big data cloud offerings and Data visualization tools (Power BI, Tableau etc.,)


Previous experience:


  • At least 5+ years of progressively responsible relevant experience in data engineering, including creating partnerships, implementing data engineering solutions, and understanding the underlying technologies needed to enable data governance across a midsized or large organization
  • Experience of data ingestion using ETL tools (Informatica BDM), as well as more recent big data tools (NiFi, Talend, etc.)
  • Experience in computer programming, query languages, and internet-based data visualization platforms such as Power BI are also preferred
  • Demonstrated experience with implementing and supporting data engineering solutions, policies, and procedures as well as successfully executing programs that meet or exceed expectations in a dynamic environment; experience creating tools and capabilities to assist with data discovery & collaboration, ensure data quality, and to load, clean, enrich, manage, and share data and metadata from a variety of sources
  • Familiar with big data technologies (e.g. Cloudera), Data Engineering (e.g. Informatica), Data Streaming (Kafka), CDC, etc.



Required Skills

Skill
Years
Months
Data engineering
7
0
Big Data tools
5
0
Python
5
0
ETL Informatica
5
0
Data Warehouse/MDM concepts
5
0
NoSQL
5
0
Hadoop Ecosystem(Hive, Impala, HDFS, YARN, Pig, Oozie )
5
0
Cloudera
3
0