top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

E-Commerce Data Curation: Electronics Products

Project type

Data Analytics

Date

May 2025

Location

Boston

Project Type

Data Curation

Location

Remote

At Civic Bloom Enterprise, we specialize in turning raw, unstructured data into intelligent assets for AI-driven companies. Our latest project showcases our ability to blend precision, automation, and scalability—delivering clean, analytics-ready data tailored for real-world AI models.

For this initiative, we curated a high-integrity subset of electronics products from the Brazilian E-Commerce Public Dataset by Olist. Using Python-powered ETL pipelines and pandas-based data workflows, we automated the merging, filtering, and cleaning of two large CSV datasets—olist_products_dataset.csv and olist_order_items_dataset.csv.

Key highlights:

✅ 152 unique, de-duplicated electronics products curated from over 100,000 raw entries

✅ 69.6% of duplicate or irrelevant rows removed using logic-driven filtering

✅ 0% error rate, verified through validation scripts and manual QA sampling

✅ Features extracted: product_id, product_category_name, product_name, price, freight_value

✅ Output ready for use in pricing models, product recommendation engines, and AI classification tasks

All transformations were handled programmatically to ensure reproducibility, consistency, and scalability—a perfect starting point for AI startups building lean prototypes or robust production systems.

You can explore the datasets below:

📥 Raw Dataset: https://docs.google.com/spreadsheets/d/1NwTObHZM-oXH7OxNXz-PNJgvTeZUtELe0Izob12bfMI/edit?gid=1512980922#gid=1512980922
📊 Final Curated Dataset: https://docs.google.com/spreadsheets/d/1Wq-E4INLA5oUxd4a7gQx-8FW8asqJl3fCVrA5JxF-xw/edit?gid=889330701#gid=889330701

bottom of page