shubham kumar

shubham kumar

Mentor
4.5
(2 reviews)
US$12.00
For every 15 mins
2
Sessions/Jobs
ABOUT ME
Senior Data Engineer with 4+ years of experience
Senior Data Engineer with 4+ years of experience

I am a star data engineer and have worked on lots of data engineering technologies during my tenure.
In my experience, I have worked for ValueFirst client, in which I have processed 6 terabytes of raw SMS data to extract attributes like gender, income, location etc from them.
Next, I have worked for Goals101 client, in which I have build rule based recommendation engine for RBL bank.
Rest all my experience is in building products for the company.
My skillset includes : BigData, SQL, NoSQL, Cloud( AWS, Azure), Python, Scala, Java

Kolkata (+05:30)
Joined December 2019
EXPERTISE

REVIEWS FROM CLIENTS

shubham's profile has been carefully vetted and approved as a Codementor. Connect with shubham now, and leave a review for them once you're done!
SOCIAL PRESENCE
GitHub
Logstofinal
MapReduce code for making logs data from sms data
Java
0
0
Sms-analysis-with-mapreduce
Text mining of SMS logs data using MapReduce and finally applying machine learning on rest of the attributes which are not identified by text mining
Java
0
1
EMPLOYMENTS
Data Engineer
Expedia Group
2019-10-01-Present
Building data pipeline for booking trends application
Building data pipeline for booking trends application
Java
Apache Kafka
AWS Lambda
Java
Apache Kafka
AWS Lambda
Senior Data Engineer
Noodle.ai
2018-08-01-2019-10-01
- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode. - Worked on centralised ...
- Build data transporter tool to migrate data between any 2 sources using Kafka connect in batch or stream mode. - Worked on centralised alert monitoring system to monitor data alert - Build data deploy tool to deploy stage schemas to production
Python
Scala
PostgreSQL
View more
Python
Scala
PostgreSQL
Apache Spark
Apache Kafka
View more
Big Data Engineer
VideoTap
2015-12-01-2018-07-01
Worked on VideoTap product: -Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server -Coded Spark job in Scala to compute an...
Worked on VideoTap product: -Setup Hadoop Cluster of 4 DataNodes and 1 MasterNode on azure server -Coded Spark job in Scala to compute analytics (user wise, location wise) from 13 GB data -Created processors in apache nifi for dataflow between cassandra and hdfs -Worked on video indexer api of azure to identify context from news. I have also worked on 2 priority services projects here: 1) Value First- Need to extract attributes like gender,income,location etc(total 17 attributes) from raw SMS data. -Text Mining of 6 TB SMS data on a 6 node Hadoop Cluster to evaluate gender,age,location,income(total 17 attributes) of 5 crores unique mobile number -I have setup Hadoop and Spark Cluster with 5 datanodes and one namenode. -Setup Hbase cluster with 1 masternode and 4 regionserver -Coded the rules for Text Mining in Spark and I have used Hive for analyzing the data. -Build Machine Learning Model on top of Spark for gender using Logistic Regression and income,age using Neural Network in Scala -Integrated Apache Phoenix with HBase for quering data fast. -Exposure-Spark(MLlib,Sql), Scala. Hive, MapReduce, HBase, Apache Phoenix, Machine Learning(Logistic Regression, Neural Network) 2) Goals101-I was involved in building a rule based recommendation engine product which could be used by RBL Bank to increase the number of transactions and amount of transactions of their customers by sending offers at correct point of time -Used Cassandra for storing large volume of bank's customer, transaction and statement data -I have coded all spark jobs using scala on top of Hadoop file system and finally stored the processed output in Cassandra -I have setup Hadoop,Spark and Cassandra Cluster on 3 EC2 instances and finally deployed our code over there -Also I have coded 2 event driven Python REST API in flask for real time transactions.
Scala
MapReduce
Machine Learning
View more
Scala
MapReduce
Machine Learning
Cassandra
HBase
Apache Spark
Apache Hadoop
Apache Hive
View more