Introduction

With a strong interest in data science and a background in marketing, I embarked on this Kaggle project to not only practice data analysis but also to focus on developing my Python skills. The data set simulates the results of A/B testing for two different online marketing campaigns: a Control Campaign and a Test Campaign. My objective was to determine which campaign was more effective across various performance metrics using Python for data manipulation and visualization.

For a more detailed walkthrough, you can visit my Kaggle notebook here.

Data Preparation

Importing Libraries and Reading Data

I started by importing the necessary Python libraries and reading in the data sets for both campaigns. The data was initially separated by semicolons, so I adjusted the delimiter parameter to correctly read the data.

Python code:

import pandas as pd
import matplotlib.pyplot as plt

cg = pd.read_csv(“/kaggle/input/ab-testing-dataset/control_group.csv”, delimiter=”;”)
tg = pd.read_csv(“/kaggle/input/ab-testing-dataset/test_group.csv”, delimiter=”;”)

Data Cleaning

The ‘Date’ columns in both data sets were in string format. I converted them into datetime objects to facilitate further time-based analysis.

Python code:

cg[‘Date’] = pd.to_datetime(cg[‘Date’], format=’%d.%m.%Y’)
tg[‘Date’] = pd.to_datetime(tg[‘Date’], format=’%d.%m.%Y’)

During my data exploration, I found that the Control Campaign data set contained NaN values. Understanding that in the context of an online campaign, NaN values likely indicate a lack of results rather than missing data, I opted to replace these with zeros.

Python code:

columns_to_fill = [‘# of Impressions’, ‘Reach’, ‘# of Website Clicks’, ‘# of Searches’, ‘# of View Content’, ‘# of Add to Cart’, ‘# of Purchase’]
cg[columns_to_fill] = cg[columns_to_fill].fillna(0)

Merging Data Sets

To easily compare both campaigns, I merged the two data sets into one using the Pandas concat method.

Python code:

df = pd.concat([tg, cg], ignore_index=True)

Data Analysis

Summary Statistics

I began by generating summary statistics for both campaigns. I used the groupby and describe methods from Pandas, which provided an overview of mean, standard deviation, and other statistical measures for each campaign.

grouped_df = df.groupby(‘Campaign Name’)
transposed_stats = grouped_df.describe().T

Comparing Averages

I also created a pivot table to compare the means of various metrics between the two campaigns.

Python code:

pivot_means = pd.pivot_table(df, index=[‘Campaign Name’], values=[‘Spend [USD]’, ‘# of Impressions’, ‘Reach’, ‘# of Website Clicks’, ‘# of Searches’, ‘# of View Content’, ‘# of Add to Cart’, ‘# of Purchase’], aggfunc=’mean’)

Data Visualizations

I used Matplotlib to create bar charts that visually represent the differences in performance metrics between the two campaigns. These charts further emphasized the insights derived from the summary statistics and pivot table.

Even though there were more clicks from the test campaign, the control group generated more items added to the cart.

Even while the Test Campaign had more purchases, the differences in the number of purchases between the control campaign and the test campaign was low.

The following chart was used to quickly see the differences between the campaign in all of the important KPI’s.

As a result, the Test Campaign:

  • Had 418 less items added to cart on an average day
  • Had 34,974 less impressions on an average day
  • Had 1.56 less purchases on a average day
  • Had 197 more searches on an average day
  • Had 85 less content views on an average day
  • Had 711 more website clicks on an average day
  • Reach 35353 less people on an average day
  • Cost $274.63 more on an average day

Conclusion

After a thorough analysis, I concluded that while the Test Campaign generated more searches and website clicks on an average day, it did so at a higher average daily cost than the Control Campaign. Despite the increased on-site searches and website clicks, the Control Campaign produced more items added to cart, more impressions, more viewed content, more reach, and—most importantly—1.56 more purchases on an average day at a lower cost.

This project serves as part of my portfolio in Data Analysis, where I aim to combine my background in Marketing with my growing skills in Python and Data Science. For those interested in a more detailed explanation and code walkthrough, feel free to visit my Kaggle notebook.

Let's get in touch.

The next big thing is only a click away.

Mario Ortiz Marketing Get in Touch!