Mateus Picanco

5.0

(3 reviews)

US$15.00

For every 15 mins

Sessions/Jobs

First 15 mins free for your first session

ABOUT ME

Full-stack Data Scientist with solid experience in Machine Learning, Natural Language Processing and extensive training in mentorship

Hello everyone!

I'm Mateus, a full-stack Data Scientist from Brazil with a background in Digital Signal Processing. I graduated from Brown University in 2018 with a Bachelor of Science in Electrical Engineering and have been working in the Data domain ever since. I have experience with the full scope of the Data Science process, from building data pipelines and developing models to evaluating A/B tests for Data Products.

I'm open to any mentorship opportunities related to data, especially when it comes to Machine Learning, Natural Language Processing, Python, and Elasticsearch.

I am currently a Data Scientist at Microsoft and have previously worked at the largest investment bank in Latin America and at Telefonica.

Throughout my career, I developed skills and projects in various segments, including Customer Segmentation, Next Best Offer models, and detection of rare events.

I have over 4 years of experience in programming with Python, Machine Learning (especially when applied to CRM and Product Analytics), and Natural Language Processing. I also have solid experience implementing data orchestration and Elasticsearch-based analytics.

Finally, I have been a mentor for both for-profit and non-profit organizations since I was 16 years old. I'm extensively trained in mentorship and tutoring and have mentored all kinds of people in many topics, from essay writing to machine learning.

Portuguese, English

Atlantic Time (Canada) (-04:00)

Joined May 2020

EXPERTISE

Python

4 years experience

- Object-oriented programming; - API development and deployment: FastAPI, Flask - Development of packages for Pypi and offline serving; -...

- Object-oriented programming; - API development and deployment: FastAPI, Flask - Development of packages for Pypi and offline serving; - Data Visualization and Dashboarding: Plotly, Dash, Streamlit - Unit and functional testing: Pytest - Data analysis toolkit: NumPy, Pandas, Seaborn, Matplotlib - Statistics and simulation: Statsmodels,SciPy - Machine Learning: Scikit-learn, XGBoost, CatBoost - Deep Learning: PyTorch - Digital Signal Processing: Librosa

NumPy Pandas Matplotlib Statsmodels Scipy XGBoost CatBoost Amazon sagemaker

Machine learning

4 years experience

- Fake names detection in and lead qualification for lead collection campaigns, - Anomaly detection in log-files; - Sentiment analysis in...

- Fake names detection in and lead qualification for lead collection campaigns, - Anomaly detection in log-files; - Sentiment analysis in financial news, topic modeling in customer complaints; - Incident prevention models in IT infrastructure and services; - Customer Segmentation based on spending behavior and unsupervised learning;

Python XGBoost CatBoost

Python XGBoost CatBoost Spacy Gensim Nltk PyTorch Amazon sagemaker

Elasticsearch

2 years experience

- Implementing data collection and ingestion using the entire Elastic Stack (Logstash, Beats, and Elasticsearch); - Modelling data in Ela...

- Implementing data collection and ingestion using the entire Elastic Stack (Logstash, Beats, and Elasticsearch); - Modelling data in Elasticsearch for analytics and search; - Data Visualization in Kibana (including custom Vega-lite visualizations); - Anomaly Detection using Machine Learning plugin;

Kibana LogstashBeats

Kibana LogstashBeatsBash

Apache Spark

2 years experience

- Implementing data processing pipelines, including data cleaning and aggregation; - Building tables in Redshift and Athena from raw data...

- Implementing data processing pipelines, including data cleaning and aggregation; - Building tables in Redshift and Athena from raw data; - Automating and scheduling Spark jobs through Apache Airflow on AWS

Apache Airflow Amazon S3 Amazon EMR

Apache Airflow Amazon S3 Amazon EMR Apache NiFi Apache Hadoop

SQL

3 years experience

- Data analysis and reporting with SQL in many dialects, especially Postgresql, Redshift and Oracle databases; - Advanced data aggregatio...

- Data analysis and reporting with SQL in many dialects, especially Postgresql, Redshift and Oracle databases; - Advanced data aggregation, such as window-functions, CTEs and complex joins;

PostgreSQL Amazon Redshift Amazon athena

PostgreSQL Amazon Redshift Amazon athena Oracle

AWS

1 year experience

- Developing APIs with Amazon Lambda and API Gateway; - Building data and machine learning pipelines with Elastic Map Reduce (EMR) and Sa...

- Developing APIs with Amazon Lambda and API Gateway; - Building data and machine learning pipelines with Elastic Map Reduce (EMR) and SageMaker; - Building data lake structures with AWS Big Data stack (Glue, emr, s3, redshift and lake formation);

AWS DynamoDB AWS API Gateway Amazon EC2

AWS DynamoDB AWS API Gateway Amazon EC2 Amazon S3 AWS Lambda Amazon Redshift Amazon athena Amazon EMR

REVIEWS FROM CLIENTS

5.0

(3 reviews)

RR100

November 2021

Great data science tutor. Explains concepts clearly and sticks with you until you solve your project. I highly recommend his expertise.

Hank H'ng

March 2021

Great session again :)

Hank H'ng

February 2021

Super knowledgeable and helped me set up my python env and explained everything clearly :) Would recommend.

EMPLOYMENTS

Data Scientist

Microsoft

2021-09-01-Present

Data Scientist working on Analytics and experimentation for the Microsoft Stream product. My work focuses on Scorecard definition and eva...

Data Scientist working on Analytics and experimentation for the Microsoft Stream product. My work focuses on Scorecard definition and evaluation for new flights and product iteration, as well as in-depth analyses of user behavior to drive product changes.

Python

Azure

Machine learning

Python

Azure

Machine learning

Product design

Data analytics

Data Scientist

BTG Pactual

2020-09-01-2021-09-01

- Developed segmentation and scoring aimed at identifying the clients with the highest propensity for acquiring credit products in the BT...

- Developed segmentation and scoring aimed at identifying the clients with the highest propensity for acquiring credit products in the BTG+ user base. - Developed an unsupervised model for customer segmentation based on credit card spending preferences in different merchant categories (MCCs). Results drive targeted campaigns; - Designed and built numerous ETL pipelines using Apache NiFi, Airflow, and Spark on AWS. Pipelines populate the BTG+ Data Lake for various purposes, including reporting, dashboards, and modeling; - Designed and implemented the architecture and pipelines for a data quality assessment framework based on Apache Spark, Airflow, and AWS Glue; - Designed, trained, and deployed a BERT-based Sentiment Analysis model for classifying news related to the Stock Market in Portuguese. The model is part of the BTG Index.

Python

Amazon S3

Machine learning

Python

Amazon S3

Machine learning

DynamoDB

Apache Spark

Amazon Redshift

AWS Lambda

Apache Airflow

Apache NiFi

Data Scientist

Telefonica

2019-02-01-2020-08-01

- Built and maintained data pipelines based on Elastic.co stack (Elasticsearch, Logstash, Kibana). Pipelines process around 4TB of data p...

- Built and maintained data pipelines based on Elastic.co stack (Elasticsearch, Logstash, Kibana). Pipelines process around 4TB of data per month; - Designed and implemented a data self-service platform based on Elasticsearch for IT Operators. Data stores and reporting tools are used by approximately 130 people and provide access to 80 different curated data sources; - Developed an ML model for predicting the occurrence of IT incidents in Telefonica's Online Charging System. For the to the two-month period in which it went into production, it reported a f-1 score of around 84% and prevented 3 critical incidents or possible outages in the system since deployed; - Developed a model for identifying spammy and fake names in lead collection campaigns for the Marketing department. Deployed the model as a REST API using FastAPI; - Built a custom parser using PySpark for handling event and configuration data stored in XML files following the 3gpp industry specification. - Wrote and executed procurement processes for software acquisition and services using RFPs (Request for Proposal); - Wrote material for internal, team-led workshops with other teams across the company;

Python

MySQL

Flask

Python

MySQL

Flask

Elasticsearch

Kibana

Logstash

Apache Spark

PROJECTS

Data PagesView Project

N/A

2020

This project is a proof of concept for data exploration of enterprise data by non-technical users using full-text search engine capabilit...

This project is a proof of concept for data exploration of enterprise data by non-technical users using full-text search engine capabilities. It aims to illustrate a way to foster data-driven decision-making without the intervention of technical teams, a concept known as Data Self-Service. It was inspired by Looqbox, a Brazilian startup. Data Pages consists of guided access to analysis previously made available by technical teams as specifications in a data directory. The search engine capabilities provide a cleaner interface to find relevant data about the company, an alternative to list-based directories and generic dashboards.

Python

Heroku

Pandas

Python

Heroku

Pandas

Elasticsearch

Streamlit

Project Atlas - São PauoView Project

2021

A feature store project aimed at developing geospatially referenced features regarding the city of São Paulo, including features related ...

A feature store project aimed at developing geospatially referenced features regarding the city of São Paulo, including features related to crime, real state, income, shopping activity, and much more. The project has been released on Kaggle and contains over 200 features at different levels of interest for use.

Geospatial Technology

Apache Spark

PostGIS

Geospatial Technology

Apache Spark

PostGIS

Apache sedona