Ahmed Shabab Noor

Data Engineer II @ Pathao

Building scalable data products & backend systems

Dhaka, Bangladesh ·

Data Engineer specializing in scalable data platforms, real-time backend services, and advanced analytics. Skilled in building robust pipelines, ML solutions, and data-driven applications to deliver actionable insights. Love exploring new technologies and learning new things. Apart from work, I enjoy photography, food, and puzzles.

Experience

Pathao

Oct 2022 - Present

On-demand digital platform and mobility company

Data Engineer II

Jul 2024 - Present · 7 mos

Full-time

  • Designed and developed a dynamic funnel analytics system for user activity across verticals.
    • Data aggregator merges data from various sources including Firebase, Pub/Sub, and transactional DBs (Postgres/MySQL) with deduplication at user-second granularity.
    • Dynamic set of queries generate funnels with different visualizations; 500M+ data points processed (and growing).
    • Smart queries can validate user input when running without dedicated backend or front-end validation.
    • Saved ~$400K USD annually in the first year (vs. market alternatives) and 120+ man-hours/month.
  • Completely redesigned and rebuilt the transliteration system as a standalone Go library.
    • Created a microservice for text transliteration with a real-time testing interface
    • Supports bidirectional Bangla-English transliteration.
    • Integrated into Food Search (15.3× more Bangla search coverage, 15.1× more order confirmations) and Ride Location Search (Bangla coverage improved from 78.8% to 96.2%, 17.4% gain).
  • Developed a personalization microservice in Go for allowing users to have personalized user experiences accross verticals.
    • Developed personalization algorithm for food home-screen restaurant collections, showing users restaurants that better match their preferences. This led to users taking 32.53% less time when navigating to checkout page.
    • Developed data pipelines with ETL to generate preference data and user profiles.
    • Enabled smart caching for faster data retrieval and lowered latency.
  • Developed data pipelines for generating restaurant, dish, and cuisine features for the food search auto-suggestion system.
  • Built a POC food search auto-suggestion microservice demonstrating feasibility and ROI.
  • Built food search performance and benchmark dashboard for KPI tracking and system performance monitoring.
  • Applied genetic algorithm to find optimal weights for internal string matching algorithms, which are used in address parsing service, increasing address parser area-level coverage by 2%.
  • Enhanced missing place finder tool with new features.
    • Added multi-token support for finding missing complex place/road names.
    • Developed Google Sheets utility package for generating beautified address data reports.
    • Implemented rich email notifications for pipeline failure and successe.
  • Upgraded address history pipeline to integrate partial orders data, adding ~2.7M new users' addresses and ~20M total addresses. Added custom date range support for address history ingestor.
  • Developed driver location ping analysis tools with map visualizations.
  • Created monitoring dashboards for ETL pipelines and ETL system health checking.
  • Assisted internal fraud analytics and investigations.
  • Regularly review and merge pull requests from team members on company self-hosted Gitlab.
Go Python SQL dbt BigQuery Redis PostgreSQL Firebase NLP ETL Docker Kubernetes Data Modeling Dashboarding

Data Engineer I

Mar 2023 - Jun 2024 · 1 yr 4 mos

Full-time

  • Designed and implemented machine learning models for accurate ETA prediction in Dhaka and Chittagong.
    • Created prediction models for both pickup ETA and drop-off ETA
    • Improved Chittagong’s within-threshold rides from 20% to 60%.
    • Built specialized ETA models for holidays (Eid, Ramadan).
    • Built comprehensive dashboards for monitoring ETA performance and debugging ETA related issues.
  • Contributed as a team member in developing a robust address parsing service for courier operations.
    • Implemented an algorithm that uses customer historical data to perform address match and accurately parse addresses, which improved hub-level coverage by 10%, accuracy by 1.5%.
    • Improved overall accuracy by 7.5% which led to cost savings of ~102K BDT per month.
    • Built data pipelines for generating courier customer profile and address history.
    • Built dockerized ETL pipeline for transferring data from BigQuery to Redis with automated email notifications, capable of handing millions of rows concurrently.
    • Optimized data storage for Redis, reducing memory consumption by 72%, saving $120 USD per month per VM.
    • Created CLI helper tools for data validation, testing, and debugging, and scripts for automation.
    • Improved logger usability substantially.
  • Developed a Python-based Bangla-to-English transliteration algorithm for address parsing. After integrating with address parsing service, improved hub-level coverage by 8%, district-level by 1%. (Overall, for addresses written in Bangal and English)
  • Developed missing place finder, a dockerized tool for finding places data that are missing in our system, using natural language processing on user provided address data, finding 784 new landmarks in 2 major cities.
  • Managed fraud detection systems for food vertical.
    • Developed and maintained tools for helping identify fraudelent activities in food vertical.
    • Maintained device connectivity system and assisted operations team with finding fraudelent entity groups.
    • Created dashboard for checking fraudelent user status and action taken against them.
    • Created comprehensive documentation for legacy fraud detection systems that were handed over.
  • Conducted suspicious activity analysis to find possible DDoS (Distributed Denial of Service) attacks.
    • Analyzed traffic patterns and user behavior to identify anomalies indicative of DDoS attacks.
    • Applied machine learning algorithms to cluster users for identifying potentially suspicious user groups.
    • Developed tools for generating 2D and 3D visualizations that show user clusters and key insights for easier understanding and decision-making.
  • Redesigned and implemented data pipeline for managing user behavioral data originating from Firebase.
    • Developed dbt data models for cleaning and transforming Firebase data making it easier to analyze.
    • Developed a session generator that can fill missing Google Analytics session identifiers.
    • Extended the pipeline to process data generated from different sources, i.e., Pathao user app, driver app, courier agent app, courier merchant app/web-panel.
    • Processes roughly 50~60 million rows per day (and growing).
  • Conducted A/B test analysis to help stakeholders make data-driven decisions.
    • Performed A/B test analysis on user engagement metrics on 3rd party service providers, helping stakeholders make decisions to save $1,000 USD per month.
    • Performed A/B test analysis on Google Maps vs internal ETA identifying ~750K BDT/month (6K USD) potential savings.
  • Developed modular and reusable data models using dbt, while focusing on data quality, maintainability, and scalability. Additionally, optimized queries to reduce data scanning for cost savings.
  • Developed dashboards for location search performance monitoring.
  • Developed machine learning solutions for address data deduplication for in-house map service.
  • Taken data protection measures by hashing sensitive data.
Python SQL dbt GCP BigQuery Redis Docker Machine Learning Data Modeling Dashboarding

Data Engineering Intern

Oct 2022 - Feb 2023 · 5 mos

Internship

  • Built command-line tools for data dictionary workflows and log management.
  • Developed first ever machine learning solution for ride pickup ETA prediction for Dhaka city.
  • Researched and evaluated solutions for data governance (column-level access, advanced search and tagging).
  • Contributed to enhancements of the in-house map system.
  • Worked on fraud detection using machine learning.
Python SQL Machine Learning Data Governance

Technologies & Tools

Programming Languages

Python Python Go Go SQL SQL JavaScript JavaScript TypeScript TypeScript Java Java C C PHP PHP Bash Bash

Cloud & Infrastructure

Google Cloud Platform Google Cloud Platform Docker Docker Kubernetes Kubernetes Helm Helm GitLab CI GitLab CI GitHub Actions GitHub Actions

Data Engineering & Analytics

dbt dbt BigQuery BigQuery Redash Redash Looker Studio Looker Studio Tableau Tableau

Databases & Datawarehousing

BigQuery BigQuery PostgreSQL PostgreSQL MySQL MySQL Redis Redis Neo4j Neo4j Firebase Firebase

Data Science & Machine Learning

Pandas Pandas NumPy NumPy Scikit-learn Scikit-learn Jupyter Jupyter Matplotlib Matplotlib Plotly Plotly Dash Dash

Developer Tools & Platforms

Git Git VS Code VS Code JetBrains JetBrains IDEs Postman Postman Linux Linux macOS macOS Notion Notion Jira Jira

Publications

Attention-Based Scene Graph Generation: A Review

IEEE - 2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA)
Feb 6, 2023

A novel method for imbalanced data classification based on label reassignment

IEEE Region 10 Conference (TENCON)
Dec 20, 2022

Diabetes Mellitus prediction using Transfer Learning

International Conference on Machine Intelligence and Emerging Technologies (MIET 2022)
Jun 11, 2023

The impact of data locality on the performance of cluster-based under-sampling

International Conference on Machine Intelligence and Emerging Technologies (MIET 2022)
Jun 11, 2023

Education

United International University

Bachelor of Science in Computer Science and Engineering

Sep 2019 - Jan 2023 · 3 yrs 4 mos

CGPA 3.99 out of 4.00, Summa Cum Laude

  • Graduated with highest honors, completing a 4-year (12-trimester) degree in just 10 trimesters (3 years 4 months), with a CGPA of 3.99.
  • Selected as an Undergraduate Teaching Assistant in the Department of Computer Science and Engineering for outstanding academic performance. Assisted in:
    • Database Management Systems Lab (Fall 2021)
    • Artificial Intelligence Lab (Fall 2021)
    • Algorithms Lab (Fall 2021)
    • Data Structures Lab (Spring & Summer 2022)
    • Advanced Object Oriented Programming Lab (Spring 2022)
    • Final Year Design Project (Summer 2022)
    • System Analysis and Design Lab (Fall 2022)
  • Awarded 100% scholarship for academic excellence in 9 trimesters and 50% scholarship in 1 trimester.
  • Specialized in Data Science, Machine Learning, and Software Engineering.
  • Participated in multiple research projects resulting in 4 peer-reviewed publications.
  • Served as the 4th Vice President of the Robotics Club (2021-2022 session).
Computer Science Machine Learning Data Science Software Engineering Research Leadership Public Speaking Communication