# MISM 258 standard ETL snippet import pandas as pd from geopy.distance import geodesic def calculate_distance(row): origin = (row['origin_lat'], row['origin_lon']) dest = (row['dest_lat'], row['dest_lon']) return geodesic(origin, dest).km

df['distance_km'] = df.apply(calculate_distance, axis=1)

Subject: Final Project Analysis: Optimizing Operational Efficiency through Predictive Modeling Date: [Current Date] Prepared for: Professor [Name], Heinz College Prepared by: [Your Name/Team Name] 1. Executive Summary This report synthesizes the core methodologies and outcomes of the MISM 258 (Data Analytics & Business Intelligence) capstone project. Using a real-world dataset from a mid-sized e-commerce logistics firm, we applied predictive modeling (Logistic Regression & Random Forest) to forecast shipment delays. The key finding indicates that implementing the proposed Random Forest model can reduce misclassification costs by 22% compared to the company’s current heuristic model. Recommendations include integrating real-time weather data into the feature set and retraining the model bi-weekly. 2. Introduction 2.1. Course Context MISM 258 focuses on the end-to-end process of business intelligence: from data warehousing and ETL (Extract, Transform, Load) to advanced analytics and dashboard visualization. The core tenet is transforming raw data into actionable strategic assets.

The Random Forest model significantly improves recall —identifying 78% of actual late shipments versus 65% for logistic regression. This is critical for proactive customer notification.