Overhauling a Data Swamp and Streamlining a Retail Data Landscape for Growth

Client

One of Europe's Top 10 Retailers

Year

2024

Stack

Databricks, GitHub, Terraform, CDKTF, Apache Spark, Azure Data Lake Storage, Delta Lake, Python

⸻ Business Impact

Data as a Competitive Edge in the Retail Business

Data is vital in driving retail operations and capturing profit in today’s digital landscape. However, a cobweb of legacy platforms often leads to a data swamp that only adds costs instead of offering insights due to a lack of integrated and trustworthy data.

We worked together with one of Germany’s leading retail companies to revamp their data management. By delivering a sustainable data management strategy, a new data platform, and an upgraded compliance mechanism, we enabled them to harness their data more effectively, reduce IT costs, and future-proof their operations with seamless Cloud Integration.

Key Results

A future-proof platform

A sustainable data lake that allows our customers to implement future projects independently.

A cost-effective compliance mechanism

An effective GDPR compliance mechanism that significantly reduces operational complexity and cost.

New visibility of KPI and data tracking

A data platform that’s capable of enriching transactional data with master data through high-performance batch processes, offering KPI tracking and actionable insights.

A 90% saving per daily data processing run

After our project, the new implementation led to a 90% reduction in end-to-end costs of a daily data processing run, which entails a data pipeline run from extraction to storing data in the final report tables.

⸻ STARTING POINT

Strategic Challenge Identification and Thoughtful Planning: The Foundation for Delivering Lasting Value

Background

Our client generated vast amounts of data from a large amount of retail stores. However, many of the collected data points were not automated and required manual entry, which exposed the business operation to human errors.

Moreover, data was fragmented due to varying formats across different business units and regions. This created a data swamp that produced untrustworthy insights, unable to generate any actionable guidance for operational managers.

The existing legacy IT infrastructure was non-unified and had high compliance costs.

Market Context

In a competitive retail environment, every single percentage of profit margin counts. Having real-time visibility into store performance is vital to optimizing operations and avoiding potential risks. Rapid, accurate adjustments and strict compliance are essential for maintaining a competitive edge and ensuring financial stability.

Product/Service Analysis

To address these challenges, especially with a large-scale customer, we first identified the correct entry point, which must include the immediate pain point, instead of implementing an immediate overhaul.

⸻ MILESTONES

Creating a Credible Reporting System

To have a trustworthy reporting system, we first need to clean out the data swamp and turn it into a sustainable data lake. That means establishing a unified data standard, automating and eliminating exposure to human error, and staying agile for any operational adjustment.

In this process, we established and meticulously documented a new data engineering framework that streamlines, automates, and standardizes data processing and is ready to scale. This set us up for changes on an organizational level that are ready to be scaled to a new data platform.

GDPR: Reducing the High Operational Cost of Compliance

Our customer was dealing with a high cost of GDPR compliance since certain types of transactions contain identifiable data. These transaction data are subjected to deletion upon request.

With a large number of transactions occurring, identifying these takes up time and resources, leading to high costs associated with manual deletion. We designed an architecture that allows our customers to leverage encryption and keys to avoid costs, keep non-identifiable transactional data, and stay with GDPR compliance.

Implementing Best Practices for Significant Cost Saving

Since our customer generates a large volume of data on a daily basis, it is important to optimize how data is stored and processed. We implemented a set of best practices in data processing, which include optimizing data models and incremental data processing compared to the previous implementation. This led to a 90 percent reduction in the cost per data run.

⸻ Conclusion

A New Foundation for Data-driven Growth

We developed and implemented a streamlined data platform featuring:

➞  Data Integration

We replaced the data swamp with a data lake, instilling a unified data engineering standard, and developed a single, coherent data platform ready for new data projects.

➞  Scalable Architecture

The platform’s scalable design accommodates future growth and increasing data volumes, maintaining high performance as the customer’s needs evolve.

➞  Future-proofed Solution

We built a future-proofed solution, optimizing operations not only for present needs but also for long-term growth. By aligning the infrastructure with the overall data strategy, we ensured flexibility and scalability, allowing for easy adaptation to future advancements.

➞  Compliance

A robust data governance framework was established to uphold data regulations and standards, mitigate compliance risks, and safeguard the integrity of information.

➞  Centralized Data Management

Creating a centralized Data Core, improved data accessibility and management, and broke down the barriers created by fragmented legacy systems.