Figura professionale: big data developer

Nome Cognome	: s. d.	Età	: 29
Cellulare/Telefono	: Riservato!	E-mail	: Riservato!
CV Allegato	: Riservato!	Categoria CV	: Developer / Web dev. / Mobile dev.
Sede preferita	: italy

Accesso Full al database con 29.989 CV a partire da € 5,00 ABBONATI SUBITO!

Sommario

big data developer

Esperienze

I am a big data developer with 3 years of exp having working skills on Hadoop, Yarn, Hive, Spark-scala, Sqoop, Oozie, Maestro scheduler, Linux,Map Reduce

Career Synopsys

• Total 2.10 years in Big Data Application Development using Spark, Scala, Sqoop and Hadoop, Hadoop Eco systems: HDFS, HIVE, PIG, Maestro and Java.
• Hands-on experience in Scala programming and Spark components like Spark-core
and Spark-SQL
• Worked on creating the RDDs, DFs for the required input data and performed the data transformations using Spark-core.
• Hands on experience in getting data from different sources to HDFS.
• Hands on experience creating different types of data into Data Frames.
• Hands on experience in writing HIVE scripts using HQL.
• Hands on experience in PIG scripts using PIG Latin language to process data.
• Having knowledge on Map Reduce.
• Having knowledge on Sqoop.
• Having knowledge on Maestro scheduler.
• Having good knowledge on Shell scripting.
• Basic knowledge on Nifi.
• Basic knowledge on Splunk.
• Excellent problem solving skills with a strong technical background and result oriented team player with excellent communication and interpersonal skills.
Professional Experience
• Infosys from February 2017 to Current Date.

Technical Skills

• Primary Skills       : Spark, Hadoop (HDFS, Map Reduce), Hive, Pig, Sqoop
• Languages        : Scala, Java, Pig Latin
• Database       : Oracle, BigSQL, MySQL, SQL
• IDE           : Eclipse
• Scripting Language : Shell Script
• Scheduler       : Maestro
• Visualization – Tools   : Advanced Query tool, SQL Developer
• Operating Systems    : Windows XP/7/8, Linux.

Educational Qualification

COURSE
INSTITUTION
BOARD/UNIVERSITY
PERIOD
AGGREGATE (%)

B. Tech
Aditya Engineering College, Surampalem

JNTUK

2012-2016

77.67%

Intermediate
(MPC)

Narayana Junior college,
Kakinada
Board of
Intermediate
Education,
A. P

2010-2012

92.27%

SSC

Akshara School, Kakinada

CBSE

2009-2010

9.4 CGPA

ProjectS
#1.
   Client           : MetLife
   Project Title           : EBS
   Team           : 2
   Environment           : Spark, Spark SQL, Scala, SQL Server, shell scripting.

Description:
       EBS is a project where Informatica pulls data from SFDC and send it to Big Data at RDZ. Big Data kicks its process when the trigger file, control file and data files are received. All the file check validations are done. After all the transformations are done, the data is stored in hive, pointing to HDFS locations. The data is synced to bigsql and down streaming process is done by QlikView team.

#2.
   Client           : MetLife
   Project Title           : Internal Audit
   Team           : 2
   Environment           : Spark, Spark SQL, Scala, HUE, Hive, shell scripting.

Description:
       A monthly file (nearly 50GB size) is sent from the source team to Big Data location. A full load(append) is done from Big Data side every month to load the data in HDFS location on which Hive table is created. Using ORC file format, the data is compressed and stored in HDFS IDZ location. On top of IDZ , Hive table is created from which Business fetches the final data where the table is synched to BigSql.

#3.
   Client           : MetLife
   Project Title           : Compliance App
   Team           : 2
   Environment           : Spark, Spark SQL, Scala, shell scripting, Pig, Hive, Bigsql

Description:
       Compliance app is a group of nine admin systems (Annuity Host, ERL, MRPS, CDI, Smart app, WMA, VRPS, PMACS, SBR). The process is to load the data files based on the trigger files we received for the nine admin systems in to HDFS.  There are three types of Load that takes place. They are:
1.       Base load
2.       Full load
3.       Delta load
To build the work flow to load the data files into HDFS locations and into hive tables. We need to create hive tables with optimized compressed format and load the data into the tables.
To write hive script for full load and write the shell script to create a workflow. We use Pig/spark for the delta loads and shell script to invoke the hive for the full load/ History processing. Then, schedule the jobs in Maestro for the daily run.
Initially for delta load, we were using Pig scripts. Now, we converted the Pig script to Spark script for better enhancements.

#4.
   Client           : MetLife
   Project Title           : Argentina
   Team           : 4
   Environment           : Spark, Spark SQL, Scala, SQL Server, shell scripting.

Description:
       In this project using 3rd party (Soft Tec) the data will drop in Hadoop environment. It consists of 3 admin systems. The process is to load the data files based on the trigger files we received from the business in to Linux.  There are two types of Load that takes place. They are:
• Base Load
• Delta Load
We are building the work flow to load the data files into HDFS locations and into hive tables. Before loading the data, we are validating the data and sending files for the ETL team for Data Quality. Once we receive the files from ETL our spark scripts check for the dedup logic on top it. Once we done with the deduplication, we are inserting the data into hive tables with optimized compressed format. We use spark scripts for the delta and base loads. Then, schedule the jobs in Maestro for the daily run.
This project is still in new CR stage from the business.
#5.
   Client           : MetLife
   Project Title           : EV FIRST
   Team           : 2
   Environment           : Shell scripting.

Description:
       In this project source countries (Japan, Korea, Mexico, Poland, Chile) sends the data through MFT to Big Data Location. At the Big Data end, file validations, archival take place. Once the validations are successful, the file is sent to DQ Team and then to ETL where the final data is stored in IDS (Oracle database) tables. Big Data synchs the table to Bigsql and ETL fetches the table from there.
#6.
   Client    : MetLife
   Project Title    : Monitoring Space utilization on HDFS
   Team    : 1
   Environment : Shell Scripting.
Description:
       The main objective of the project is to monitor the space utilization on HDFS. These scripts are scheduled in Maestro and runs every 6 hours each day. If the space in HDFS is more than a limit (70%), a email is sent to the team to check which project has utilized huge amount of space in HDFS. This script is used by Billing Team for report generation for every Quarter to check who is using more space in the cluster and does the billing accordingly.
#7.
   Client    : MetLife
   Project Title    : Blueprism
   Team    : 2
   Environment : Shell Scripting, Sqoop, Scala, Hive.
Description:
       BluePrism is a Source application with SQL Server as its Database. Big Data will extract the data from BluePrism Environments (QA & PreProd or PreProd & Prod combinations), merge two sources into one and load the data into Hive Database. Big Data also archives the data into corresponding history tables either mothly or adhoc basis based on trigger file received from BluePrism. This is a weekly extract from SQL server using Sqoop and then load the data through Scala. Jobs have been scheduled in Maestro.

Languages Known: English, Hindi, Telugu and Bengali