Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
E-Commerce Data Curation: Electronics Products
Project type
Data Analytics
Date
May 2025
Location
Boston
Project Type
Data Curation
Location
Remote
At Civic Bloom Enterprise, we specialize in turning raw, unstructured data into intelligent assets for AI-driven companies. Our latest project showcases our ability to blend precision, automation, and scalability—delivering clean, analytics-ready data tailored for real-world AI models.
For this initiative, we curated a high-integrity subset of electronics products from the Brazilian E-Commerce Public Dataset by Olist. Using Python-powered ETL pipelines and pandas-based data workflows, we automated the merging, filtering, and cleaning of two large CSV datasets—olist_products_dataset.csv and olist_order_items_dataset.csv.
Key highlights:
✅ 152 unique, de-duplicated electronics products curated from over 100,000 raw entries
✅ 69.6% of duplicate or irrelevant rows removed using logic-driven filtering
✅ 0% error rate, verified through validation scripts and manual QA sampling
✅ Features extracted: product_id, product_category_name, product_name, price, freight_value
✅ Output ready for use in pricing models, product recommendation engines, and AI classification tasks
All transformations were handled programmatically to ensure reproducibility, consistency, and scalability—a perfect starting point for AI startups building lean prototypes or robust production systems.
You can explore the datasets below:
📥 Raw Dataset: https://docs.google.com/spreadsheets/d/1NwTObHZM-oXH7OxNXz-PNJgvTeZUtELe0Izob12bfMI/edit?gid=1512980922#gid=1512980922
📊 Final Curated Dataset: https://docs.google.com/spreadsheets/d/1Wq-E4INLA5oUxd4a7gQx-8FW8asqJl3fCVrA5JxF-xw/edit?gid=889330701#gid=889330701