Business Trends

ChatGPT for Business Process Optimization & Data Cleansing: Blog 3 of Series “ChatGPT and SAP”

Welcome back to the third installment of our series on the integration of ChatGPT and SAP systems. For those who have followed this series from the beginning, you’ve witnessed us discuss the intricate design of ChatGPT and explain its potential within the SAP ecosystem. In this blog, we turn our focus to two critical pillars of any enterprise system: business process optimization and data cleansing.

We aim to empower you with insights and strategies to harness the full potential of ChatGPT, bringing transformational changes to your SAP systems. Join us on this enlightening journey into the heart of SAP, its data, and optimized processes.

ChatGPT for Business Process Optimization

Building on our earlier insights about ChatGPT and its value in the SAP world, let’s dive into its potential for Business Process Optimization (BPO). As businesses worldwide continuously strive for operational agility and precision, the importance of BPO remains undeniably paramount.

Using a part of the dataset from the BPI Challenge 2016, which provides logs related to customer complaints at UWV, an Employee Insurance Agency in the Netherlands, we simplify the process of illustrating the value of ChatGPT in understanding and optimizing procedures. We plan to use only one file – ‘BPI2016_Complaints.csv’ – from the BPI Challenge 2016 dataset.

The rationale behind singling out this particular dataset, among the five available, is straightforward: the current limitations of ChatGPT in handling large files. To show the process optimization capabilities of ChatGPT despite its limitations in handling large datasets, we found this small dataset optimal to deliver potential insights.

BPI Challenge 2016

While the quantum of real process logs in an operational organization would significantly surpass what we handle here, we have consciously chosen a relatively straightforward example to shed light on the potential of Generative AI without turning this into a technical manual. Despite the simplicity of the example, you will witness how ChatGPT can unravel underlying themes, inefficiencies and offer innovative solutions, emphasizing its power and promise in the world of process optimization.

The “Complaints” dataset has 289 rows and 11 columns, including CustomerID, ComplaintID, AgeCategory, Gender, ContactDate, and ContactChannelID. Other columns also recorded the complaint theme, topic, and subtopic.

The BPI Challenge 2016 aimed to address the following objectives for UWV:

  • Examine the usage patterns of different customer channels.
  • Study the shifts between contact channels and determine the reasons behind these changes.
  • Categorize customers based on their behavioral data.
  • Suggest methods to assist customers without them needing to switch contact channels.

Let us see how ChatGPT addressed each objective based on a relatively short data supply.

Insights on Channel Usage 

  • Most Used Channels: Channels 8 and 6 are the most frequently utilized for lodging complaints. These channels, especially channel 8, seem to be the primary touchpoints for customer complaints.

Most Used Channels

  • Trends Over Time: There are fluctuations in the number of complaints received each day over the 8 months. This suggests that external factors, possibly related to events or changes in services, might influence the volume of complaints.

Complaint Trends Over Time

  • Repetitive Usage: A significant number of customers use the same channel, particularly channel 8, to lodge multiple complaints consecutively. This indicates that customers may either prefer this channel or find it more accessible.

Customer Transitions Between Channels

  • Common Journeys: Most customers tend to lodge a single complaint rather than transition between channels. However, there are instances where customers transition from one channel to another, with the most frequent transition being within the same channel (e.g., 8->8).

Channel Transitions

  • Reason for Transition: While the dataset provides the themes, sub-themes, and topics of complaints, the exact reason for channel transition isn’t explicitly given. However, repetitive complaints on the same channel might indicate unresolved issues or customer persistence in seeking resolution.

Customer Profiles from Behavioral Data

  • Demographics: The data suggests that all age groups are active in lodging complaints, with some age categories being more active than others. Gender distribution is almost even, with a slight female predominance.
  • Complaint Themes: Certain themes and sub-themes, such as “Services” and “Communication,” dominate the complaints. These themes can be used to segment customers based on their primary concerns.


Based on the analysis, here are ChatGPT’s recommendations:

  1. Enhance Popular Channels: Given the prominence of channels 8 and 6, UWV should ensure these channels are optimized for user experience. This includes easy navigation, clear information, and quick response times.
  2. Issue Resolution: The repeated complaints within the same channel indicate unresolved issues. Implementing a robust first-contact resolution strategy can reduce repetitive complaints and improve customer satisfaction.
  3. Unified Customer View: Integrating data from all channels can provide a 360-degree view of the customer. This can help us understand their complete journey, predict their next move, and proactively address their concerns.
  4. Personalized Services: Using the identified themes and demographics, offer personalized services or solutions to customers. For example, if a particular age group predominantly uses a specific channel, tailor the content and user interface to suit their preferences.
  5. Feedback Mechanism: Incorporate a feedback mechanism after each complaint resolution. This can provide insights into why customers might be transitioning between channels and areas of improvement.
  6. Training & Resources: Ensure each channel is equipped with the necessary resources and trained personnel to address the most common complaint themes effectively.
  7. Promote Self-service: For common issues (like “Technical error”), provide self-service tools, FAQs, or chatbots to assist customers without them having to change their preferred contact channel.

By leveraging ChatGPT’s insights to optimize complaint management processes, UWV could significantly improve agent productivity and customer satisfaction KPIs. For example, automating repetitive tasks would free agents to focus on higher-value work. Agents could spend less time on mundane items and more time solving complex customer issues. Scaling automation through AI would not only reshape the employee experience but also translate to hard cost savings from greater efficiency. The process enhancements driven by ChatGPT’s analysis could result in the following quantifiable performance improvements:

  • ~25-35% reduction in average complaint resolution time
  • ~15-20% decrease in repeated customer contacts
  • ~10%+ increase in first contact resolution rate
  • ~20%+ improvement in Net Promoter Score (NPS)
  • ~30-40% reduction in case backlogs

These metrics showcase how an optimized complaint process powered by AI could drive significant gains across customer satisfaction, operational efficiency, and cost reduction KPIs. As we’ve navigated the intricate avenues of Business Process Optimization with ChatGPT, one thing becomes undeniably clear: the potential of AI in refining business operations is vast and yet to be fully tapped.

Looking ahead, we will dive deeper into more complex datasets in future discussions. It’s exciting to picture the transformative impact such AI-driven insights can have on modern business landscapes. The journey towards optimized processes is intricate but, with tools like ChatGPT, undoubtedly promising.

ChatGPT for Data Cleansing

The journey of any data-driven decision starts with the quality of the data at hand. “Garbage in, garbage out,” as the saying aptly encapsulates this concept. In today’s digital landscape, clean, consistent data forms the bedrock of all analytical endeavors. This part of our blog series introduces an uncomplicated, illustrative example of how ChatGPT, the AI-powered language model, can assist in the data cleansing process.

Here, we use a relatively simple Excel dataset with problems like inconsistent or empty column headers, data spread across multiple columns, missing values, and duplication. Of course, this is an oversimplified scenario compared to the vast datasets professionals deal with daily. Still, our focus is to demonstrate the potential of generative AI in improving data quality. The clever use of AI, like employing ChatGPT to develop tailored programs, can create effective workarounds to tackle more extensive, complex datasets.

In our journey to explore the power of ChatGPT for data cleansing, we’ve got our hands on a small dataset. This dataset has details about sales or order transactions. Each row corresponds to a unique order and is segmented into three broad categories: Consumer, Corporate, and Home Office. These categories are further broken down based on the shipping methods: First Class, Same Day, Second Class, and Standard Class. The numbers spread across the document appear to represent financial values linked to each order, differentiated by the customer segment and shipping mode.

While the dataset seems simple, a closer look reveals several underlying issues we must tackle. The data issues observed in this dataset are:

  1. Inconsistent column headers: The headers are not consistent across the dataset. The first two rows contain a mix of segment names, ship mode names, and total column headers. This makes it challenging to understand which column corresponds to which data point.
  2. Empty column headers: Some columns do not have clear or meaningful headers. This makes it unclear what data these columns are supposed to contain.
  3. Data spread across multiple columns: The same type of data (e.g., sales) is spread across different columns based on the segment and ship mode. This makes the data structure wide and inefficient, hindering data analysis and visualization.
  4. Missing values: There are numerous cells with missing data throughout the dataset. It’s unclear whether these missing values are due to no data being available or if they are errors in data entry.
  5. Data duplication: The ‘Total’ columns (e.g., ‘Consumer Total’) seem to duplicate data that is already present in the corresponding individual columns. This redundancy can lead to inconsistencies if the dataset is updated incorrectly.
  6. Lack of uniformity in data types: Some columns appear to contain numerical data, while others contain strings (based on the provided sample). This lack of uniformity can complicate data analysis.
  7. Misalignment of data: The ‘Order ID’ column seems to be offset from the rest of the data, which can lead to misinterpretations or errors during data processing.
  8. Inappropriate data storage format: The data appears to be in a format suited for a report rather than data analysis. A ‘tidy’ format would be more suitable, where each row represents an observation and each column represents a variable.

In conclusion, this dataset needs significant cleaning and restructuring to be efficiently used for data analysis.

ChatGPT Prompt

As we dive into the nitty-gritty of data cleansing, here’s a glimpse of the detailed prompt we used to guide ChatGPT. The prompt outlines a series of steps to take our jumbled dataset and transform it into an organized, analyzable structure. From discarding unnecessary columns to correctly identifying and categorizing each order’s details, this prompt served as a roadmap for ChatGPT in the task of data cleansing.

The Prompt

There are three core parts of the prompt. The first part gives ChatGPT a role to play, an expert in data cleansing, in our case. The second part explains the step-by-step resolution while leaving some steps to the understanding of the AI. The third and final part asks a question to which ChatGPT responded that it understood the assignment and asked for the data file.

Now, let’s delve a bit deeper into the prompt:

  1. Drop unnecessary columns: The first instruction is to discard every column ending with ‘Total.’ These columns are deemed extraneous for our analysis and are to be removed.
  2. Extract Order ID: The AI is tasked with picking the Order ID (formatted like ‘CA-2011-100293’) from each row.
  3. Determine Ship Mode: The Ship Mode is to be inferred based on the appearance of the Sales figure in the corresponding row. The Ship Mode will correspond to the column header where the sales figure is found (i.e., First Class, Same Day, Second Class, or Standard Class).
  4. Identify Segment: Similarly, the Segment for each order is to be figured out from the position of the sales figure.
  5. Locate Sales figures: Finally, the Sales figure, the only numeric value in each row, is to be identified.

ChatGPT’s proficiency in comprehending and executing such tasks is the highlight of this exercise, demonstrating its potential to automate and streamline data-cleaning processes.

The Outcome

By working through the data cleansing prompt, ChatGPT transformed the unorganized dataset into a clean, streamlined, and more efficient structure.

This transformation is an excellent demonstration of how ChatGPT can be used to automate and simplify the often complex and time-consuming task of data cleansing, making it a valuable tool in the data analysis pipeline.

The Business Impact of High-Quality Data

While ChatGPT demonstrated excellent technical skills in cleansing the sample data set, the true value of such efforts emerges when high-quality data drives enhanced analysis and decision-making. With clean, consistent data as a foundation, organizations gain the ability to uncover actionable insights through advanced analytics. Data anomalies and errors no longer skew the analytical models.

For example, a consumer goods company could use LLMs like ChatGPT to optimize sales data to predict demand signals for new products with over 80% accuracy. These AI-driven forecasts facilitate precise demand planning and inventory optimization. Furthermore, meticulously cleansed customer data enables hyper-targeted marketing campaigns. When customer attributes, and transaction history are precise, organizations can craft compelling personalized offers.

At its core, quality data analysis illuminates opportunities and risks that would be near impossible to spot otherwise. It provides the trusted inputs that allow organizations to confidently transform insights into strategic action.

In this third installment of our series, we’ve embarked on a journey through the realms of data cleansing and business process optimization, illuminating the transformative potential of ChatGPT within the SAP ecosystem. The insights we’ve gained, both in terms of data quality and business operations, underscore the symbiotic relationship between state-of-the-art AI models like ChatGPT and comprehensive enterprise solutions like SAP. Yet, as with any groundbreaking integration, challenges are inevitable.

As we’ve explored the vast opportunities that ChatGPT offers in enhancing SAP functionalities, it’s also essential to be cognizant of the potential pitfalls and obstacles. The road to seamless integration isn’t without its hurdles. This recognition sets the stage for our next chapter in this series, where we’ll delve into the ‘Potential challenges in connecting ChatGPT with SAP.’

While we’ve celebrated the successes and possibilities in our current discussion, our subsequent exploration will equip you with a holistic understanding, preparing you to navigate and mitigate the challenges that might arise in this integration. Join us in our next blog post as we balance our optimism with caution, ensuring that our foray into the fusion of ChatGPT and SAP is both informed and strategic.


You must be Logged on to comment or reply to a post.


Iatco Sergiu

There are two issues regarding data processing with ChatGPT. The first is security – the data could leak into the OpenAI model, so one should use only public data. The second issue is the volume – copy-pasting thousands of rows would be difficult. To overcome these issues, one can use the OpenAI API, but this would generate costs. Nonetheless, the approach might be helpful for small anonymized data. Generating Python code for data processing solves the security and volume issues.

Vishwas Madhuvarshi

Blog Post Author

Thank you for your insightful comment.

You’re absolutely right in pointing out the two predominant challenges with ChatGPT: data security and volume. To keep our discussions user-friendly and non-technical, I have opted to utilize ChatGPT. However, I completely agree with you. Everything we’ve accomplished with ChatGPT can indeed be achieved using OpenAI APIs. Besides a locally installed LLM, I use OpenAI API often for my projects.

Regarding data security concerns, there are a few approaches one can take:

Utilizing an open-source LLM like Llama2, Falcon, or others within your own environment. This means you won’t send any data externally. While this does prioritize data security, there’s the trade-off of potentially not having optimal performance.

Another viable option is Azure ChatGPT, which is available on GitHub (Somehow, the link is non-functional currently). This open-source solution can be hosted privately on Azure, offering a balance of security and performance.

Lastly, as you rightly pointed out, one can employ the OpenAI API, especially for processing large volumes of data. While this might incur costs, it streamlines the process.

Thank you for sharing your perspective!