﻿ Maths and Stats behind Web Analytics - Beginners Guide

Maths and Stats behind Web Analytics – Beginners Guide

Last Updated: August 20, 2022

The following article is an excerpt from my best selling book: Maths and Stats for Web Analytics and Conversion Optimization which I am sharing here for the first time, to benefit the wider audience:

A long time ago in a galaxy far, far away a person was questioned for his unreasonable use of stats techniques in website analysis by an authority figure. Unfortunately, that person was me.

Maths is not really a problem for me until it turns into sets, probability, regression, standard deviation, etc or in other words a more specialized field of statistics.

So before I move forward, here is a small warning. I am just a student of statistics and not a full-blown statistician. Consequently, my methods could be prone to statistical errors.

However, feel free to jump on me if you can catch an error. I dare you ;)

Why you need to know the maths and stats behind web analytics

It is not just about what you should know but is more about what you are expected to know especially if you deal with businesses day in, day out.

Here are a few questions for your consideration:

Q1. When your website conversion rate jumps from 10% to 12% then is it 2% rise in conversion rate or 20% rise in conversion rate?

Q2. Can you double your sales by simply doubling your marketing budget?

Q3. Should you focus on a large number of low-value customers instead of a few high-value customers to maximize profit?

If you are not sure about the answers to my questions then it is perfectly normal.

However, if I happen to be your client or boss then it is not normal. Sorry. You have disappointed me and I doubt your reports now.

Marketer: Boss, Conversion Rate has improved by 2% in the last 4 months. 4 months ago it was 10%. Now it is 12%.

Boss: You have made a complete mess of the figures. Sales are all-time low. Off you go. Grrrr.

The grumpy boss may not know there has been a negative correlation between conversion rate and revenue and there is no 2% rise in conversion rate. But he sure knows that his sales are going down and that he is less profitable. And now he has this moral obligation to do the dreaded ‘cost-cutting’.

Guess whose job is on the line next?

The corporate world is not very forgiving to mistakes made by employees/consultants/agencies.

If we report that the jump in conversion rate from 10% to 12% means there is a 2% rise in conversion rate, our entire marketing report becomes questionable. We instantly create a shadow on the rest of our analysis. The thought that will instantly pop up in the mind of the recipient of our report will be “what else he has done wrong?”

Learning maths and stats is an excellent way to develop your logical and critical thinking. It makes you a better marketer and off course a better analyst. No one can then easily question your reasoning skills and you become a true NINJA.

Following is one example from everyday life:

Which is a better deal?

If you calculate and compare the unit prices then you will find that ‘3 ads placement for \$40.12’ is a better deal.

So here we go.

Percentage of change

This metric is used to calculate the percentage of rise or fall in relation to the old value.

We use % change when there is an old value and a new value.

Percentage of change = [ (New Value – Old Value)/Old value ] * 100

For example:

• Percentage of change = [(350-200)/200]*100 = 75%
• Percentage of change = [(42-20)/20]*100 = 110%
• Percentage of change = [(12-10)/10]*100 = 20%

Google Analytics calculate % change (for every type of metric) when you compare the data with the past data.

If pages/session in July was 2 and in August was 4, then you won’t report that pages/session has improved by 2. You report that pages/session has improved by 100% (i.e. the percentage of change from the old value).

Similarly, we calculate % of change for avg. visit duration, %new sessions, bounce rate, etc.

Percentage of difference

This metric is used to calculate the difference between two values in percentage. Use this metric when neither value is more important than the other.

Percentage of difference = [|difference between two values|/average of the two values] *100

For example:

Percentage of difference = [|200-350|/(200+350)/2)] * 100 = (150/275)*100 = 54.54%

Percentage of difference = [|10-12|/(10+12)/2)] * 100 = 18.18%

Note: Ignore the minus sign if the result is negative.

Many people make the mistake by assuming that percentage of change and percentage of difference are the same thing. They are not as I have just shown you.

We use % change when there is an old value and a new value and we need to know the percentage of rise or fall in relation to the old value. Most of the time we use percentage of change in reporting.

Percentage of error

This metric is used to calculate the percentage of magnitude of the error when comparing approximate value to an exact value.

Percentage of error = [|approximate value – exact value|/|exact value|] * 100

For example:

• I estimated 200 conversions in July but got 110 conversions.
So my percentage of error is [|200-110|/110] * 100 = 81.82%
• I estimated 300 conversions in August but got 200 conversions.
So my percentage of error [|300-200|/200] * 100 = 50%

A practical way you would use percentage of error is when you are running an experiment. You want percentage of error to be as low as possible.

Note: Ignore the minus sign if the result is negative.

Percentage points

We use percentage points when we subtract one percentage from another to imply that the change is not relative.

For example: conversion rate jumps from 10% to 12%.

So is it 20% rise in conversion rate or 2% rise in conversion rate?

It is actually 20% rise in conversion rate or 2 percentage points rise in conversion.

It can’t be 2% rise in conversion rate.

Mean

It is also known as arithmetic mean or population mean. It is simply an average of the numbers.

Mean is a type of average. Whereas average could be mean, median, mode or variance.

Mean is denoted by Greek letter µ (“mu”).

This metric is used a lot in Google Analytics by the name of ‘average’: ‘Average time on page’, ‘average time on site’, ‘site average’…..

Mean = sum of numbers /count of numbers

For example let us suppose a website has got 5 web pages:

Now bounce rate of the site = (35+40+0+48+100)/5 = 223/5 = 44.6%

Is 44.6% a true bounce rate?

Now bounce rate of the site = (35+40+0+48+100)/5 = 223/5 = 44.6%

No.

Look at the distribution of bounce rate across all the web pages.

Two web pages, page 3 and page 5 have extreme values of 0% and 100%.

We call such values as ‘outliers’ in statistics. Outliers have the sadistic ability to skew ‘averages’.

So if we take out these two extreme values from our calculations then we can get more accurate bounce rate of the site:

(35+40+48)/3 = 123/3 = 41%

Similarly,

Now average time on the site = (350+400+500+480+36000)/5 = 37730/5 = 7546 = 2 hrs 6 minutes

Again the outlier ‘36000’ is skewing our average metric.

So if we take it out and then re-calculate the average time on site, we would get (350+400+500+480)/4 = 432.5 = 7 minutes 12 seconds

Therefore whenever we analyze ‘average’ metrics we always:

1. Look at the distribution
2. Identify the outliers (i.e. extreme values)
3. Discount outliers from the averages’ calculations.

If you don’t do this then you will get muddy analytical insight from your average metrics like ‘average time on the site’ to be 2hrs 6 minutes.

Unit price

It is equal to cost/quantity.

Which is a better deal?

Calculating and comparing unit prices is a good way of finding the ‘best deal:

So, we calculate the unit price in each case:

\$40.12/3 = \$13.37 per ad placement

\$30.65/2 = \$15.32 per ad placement

If we go for the ‘2 ads placement for \$30.65′ deal we will end up paying more.

Consequently, the best deal for us is ‘3 ads placement for \$40.12′

Gross profit, operating profit, net profit, bottomline profit…

A lot of marketers make the mistake of reporting these metrics without understanding what these metrics really are and how they are calculated.

Check out this classic video which is on the misreporting of the profit metric:

In simplest terms:

Profit = Sales Revenue – Cost

Revenue = price of the product(s) * quantity sold

Gross Profit = Sales Revenue – Direct Cost

Direct cost can be something like cost of manufacturing a product

Operating Profit = Sales Revenue – Operating Cost.

It is the profit before interest and taxes.

Operating cost is the ongoing cost of running a business, product or system.

It can include both direct and indirect costs.

Net Profit – also known as net income, net earnings or bottomline. It is the profit after interest and taxes.

Net Profit = Sales Revenue – Total cost (this includes any direct and indirect cost + interest + taxes)

Profit margin

It is also known as net profit margin, net margin, net profit ratio

Profit Margin = (Net Profit/ Revenue) * 100

A low-profit margin indicates a higher risk, that a decline in sales will erase the profit and result in a net loss.

The law of diminishing returns and your marketing budget

According to the law of diminishing returns, if you keep adding more of one unit of production to a productive process while keeping all other units constant, you will at some point produce lower per-unit returns.

For example, if you keep pumping more money into a Facebook campaign without changing the present form of the campaign, at some point you will reach the point of diminishing returns and once you cross this point, your conversion rate will go down and cost per acquisition will go up.

So when you are thinking of increasing the budget of a campaign by a considerable amount, think of putting more ads and targeting more keywords.

This way, you will change multiple units of production and can stay away from the point of diminishing returns.

How to determine the point of diminishing returns

To determine the point of diminishing returns you need to gradually add more of one unit of production in the production process.

If you rapidly add units, you will never know when you crossed the point of diminishing returns and start losing money. So gradually increase your budget.

Understand that just doubling the budget of a high performing campaign may not result in a proportional increase in performance. You need to do a lot more than just increasing the budget then.

Consider running more ads, targeting more keywords or new markets to stay away from the point of diminishing returns.

The law of diminishing returns and multichannel marketing

Understand that no one campaign is solely responsible for conversions and sales if you are doing multi-channel marketing. Different marketing channels work together to create sales and conversions.

Some marketing channels help more in assisting conversions than completing conversions. We call such channels as ‘assisted marketing channels‘.

While other marketing channels work more in completing conversions.

So if you over-invest in a particular marketing channel while overlooked the role of assisted marketing channels, you will reach the point of diminishing returns faster than you think.

Because you are adding more of one unit of production (here budget) to one marketing channel while keeping other units constants (i.e. not investing a proportional amount in assisted marketing channels).

Please see the related post for more details: How to allocate Budgets in Multi Channel Marketing

Law of diminishing returns and last click keywords optimization

Just like we have assisted marketing channels, we have got assisting keywords. These keywords help more in assisting conversions than completing conversions.

Similarly, we have got last click keywords. These are the keywords people searched for just before completing a conversion and are attributed conversions in a last click conversion model.

An average PPC marketer spends his lifetime optimizing for ‘last click keywords’ assuming that only these keywords make up the whole conversion funnel. He completely ignores the role played by assisting keywords.

So in case of PPC, if you keep optimizing for last click keywords while ignoring first and middle clicks keywords (collectively known as assisting keywords) you will at some point produce lower per-unit returns.

This means your cost per acquisition at some point will start rising and your profit on sales will start declining.

Then the only way to remain within your CPA targets is by tweaking (add, pause, delete, change bids) last click keywords. But this is a sub-optimal way of optimizing a PPC campaign as you are optimizing only a small part of the conversion process.

So in order to strengthen your PPC campaigns, you also need to bid on keywords that initiate or assist conversions. This way you can stay away from the point of diminishing returns and remain within your CPA targets much longer.

Law of diminishing returns and last click CPA optimization

The CPA that you see in your Google Adwords report or Google Analytics report is not your actual cost per acquisition. Sorry to disappoint you. It is the cost per last click conversion.

So if you ignore first and middle click keywords and optimize PPC campaigns on the basis of cost per last click conversions than you won’t get optimal results and sometimes even lose money.

This is because if a keyword is not completing a sale, it may be initiating a sale or assisting a sale and if you stop bidding on it because its cost per last click conversion (the so-called CPA reported by Google Adwords) is too high or it is not completing any conversion then you may lose money.

80/20 rule

According to Pareto Principal (also known as the 80–20 rule), 80% of the effects come from 20% of the causes which means:

So what you need to do is, to determine that 20 % of everything and work relentlessly on them.

You can’t sell each and every product of your client in each and every location of your country, so why spread your marketing efforts and resources too thin by trying to be visible everywhere for everything you sell.

Let us suppose that your target market is US. So your average customer can be anywhere from US.

Let us also suppose that after analyzing one year data, you found out that people from New York City bought 2 times more than an average visitor to your website. They tend to spend 30% more than average per order.

So now you know where your best customers live.

Your cost per acquisition will be high if you target the whole of the US through search marketing or any other ad campaigns.

So it is pretty obvious that your total spend is going to be higher for acquiring average clients.

By directing your marketing efforts in acquiring more profitable clients, you can increase your revenue and profit even without increasing traffic or spending more on content creation and marketing.

Now the big questions that come up are why people from New York City are our best clients? what they are purchasing and what we can do so that they buy more?

If you can get answer to these questions, you can increase your sales within few weeks without increasing website traffic or spending more on content marketing. This should be our aim as a marketer.

Please see the related post for more details: How to use Web Analytics 2.0 to improve your conversions

Statistical significance and marketing decisions

Statistically significant result is the result which is unlikely to have occurred by chance.

Statistically insignificant result is likely to have occurred by chance.

When someone says “is your result statistically significant?” then it means he is really asking “What is the likelihood that your result has not occurred by chance”.

Consider the following hypothetical scenario:

Do you think you should be investing more in campaign ‘B’ because its conversion rate is highest?

I would suggest, not.

The sample size in case of campaign ‘B’ (4 transactions out of 20 visits) is too small to be statistically significant.

Had campaign B got 1 transaction out of 1 visit, it conversion rate would be 100%.

Will that make its performance even better?

No.

Do you think you should now be investing in campaign ‘A’ because it has higher conversion rate?

Are you really sure that the difference between the conversion rates of campaign ‘A’ and Campaign ‘C’ is statistically significant?

In order to determine whether the difference is statistically significant or not, you need to conduct a statistical test (like Z test) to calculate the ‘confidence’  that difference in the conversion rates of the two campaigns is statistically significant.

I am not talking about everyday confidence, but this statistical confidence:

I am the confidence you need, to play with statistical significance

It is the confidence that the result has not occurred by a random chance.

Statistical significance can be considered to be the confidence one has in a given result.

Confidence depends upon the signal to noise ratio and the sample size.

So confidence that the result has not occurred by a random chance is high if signal is large and/or sample size is large and/or noise is low.

Let us assume that after conducting a statistical test we came to the conclusion that the difference in the conversion rates of the two campaigns can’t proved to be statistically significant.

Under these circumstances, we cannot draw the conclusion that campaign ‘C’ is not performing better. So what can we do then? Well, we need to collect more data to compute the statistical significance of the difference in the conversion rates of the two campaigns.

At this stage investing more money in campaign ‘A’ may not produce optimal results as you may think it will.

You can see yourself conducting more such statistical tests as your statistical knowledge increases.

Data segmentation and inference

The statistical conclusion is known by the technical name of  ‘statistical inference‘.

The statistical inference is the process of drawing conclusions from data which is subject to random variation. One example of statistical inference is observational errors.

You assumed that conversion rate of campaign ‘B’ is highest only on the basis of your observation. This is your statistical inference which is wrong.

Statistical inferences are often drawn from a random sample taken from a set of entities (values, potential measurements). This set of entities is known as statistical population.

The set of campaigns above is an examples of statistical population from which statistical inferences (like which is the highest performing campaign) are drawn.

The subset of the statistical population is called sub population. For example, if you consider a PPC campaign as statistical population then its ad groups can be considered as sub populations.

To understand the properties of a statistical population, statisticians first separate the population into distinct sub populations (provided they have distinct properties) and then try to understand the properties of individual sub-populations.

For the same reason, analytics experts recommends to segment analytics data before you draw statistical inferences from it.

So if you want to understand the performance of a PPC campaign, then you should first try to understand the performance of its individual ad groups.

Similarly, if you want to understand the performance of an ad group you should first try to understand the performance of the keywords and ad copies in that ad group.

I hope it is clear now, why data segmentation is so important in web analytics.

Please see the related posts for more details:

Correlation and causation

Correlation measures the relationship between two variables.

Let us suppose ‘A’ and ‘B’ are two variables.

If, as ‘A’ goes up, ‘B’ goes up then ‘A’ and ‘B’ are positively correlated.

However if as ‘A’ goes up, ‘B’ goes down then ‘A’ and ‘B’ are negatively correlated.

For example:

Here pages/visit is increasing over time but the goal conversion rate is going down.

So here the user engagement negatively correlates with conversions.

When a user engagement negatively correlates with conversion then the engagement becomes distraction.

I have talked more about this distraction in the post: How to separate User Engagement from Distraction

You need to look out for such negative correlations.

The guy above (with the grumpy boss) is busy focusing on the conversion rate. He probably didn’t realize that the conversion rate may be negatively correlated with revenue and so instead of focusing on conversion rate, he should be focusing on improving revenue.

Correlation coefficient

The correlation coefficient is used to measure the strength of the correlation.

Its value ranges from -1 to 1.

• -1 means a perfect negative correlation.
• 0 means no relationship exists between the two variables.
• 1 means a perfect positive correlation.

What is causation

Causation is the theory that something happened as a result. For example, rise in temperature increases the sale of cold drinks.

Correlation doesn’t imply causation

You can always find some relationship between two variables/events if you really want to.

However the mere presence of a relationship between two variables/events doesn’t imply that one causes the other. For example, in the graph above there seems to be a relationship between pages/visit and conversion rate.

But we can’t conclude that increase in user engagement has resulted in a decrease in conversion rate at this stage without more analysis.

ROI calculations for SEO

Types of SEO ROIs

1. Anticipated ROI
2. Actual ROI – Immediate
3. Actual ROI – long term

Anticipated SEO ROI
= (Anticipated Revenue from SEO efforts – Proposed Cost of the SEO Project)/proposed cost of the SEO project (measured in percentage)

The three things that you need to know in advance in order to calculate anticipated ROI:
1. Average monthly visits
2. Average order value
3. E-commerce conversion rate of the website

Actual SEO ROI (Immediate)

= (Total E-Commerce Revenue through SEO + Total Goal Value through SEO) – Total cost of running the SEO campaign/ Total cost of running the SEO campaign (measured in percentage)

Here,

Total Goal Value = Assisting Conversion Value + Last Interaction Conversion Value

Actual SEO ROI (long term)
= Immediate ROI*12 (measured in percentage)

{client will continue to get seo benefits at least for the next one year even without any SEO}

Related Article: SEO ROI Analysis – How to do ROI calculations for SEO

What does the following ROIs mean?

i. ROI of 0%

ii. ROI of 100%

iii. ROI of 1000%

iv. ROI of -100%

ROI of 0% => It means no profit, no loss. You spent ‘x’ and earned ‘x’ in revenue.

ROI of 100% => It means you spent ‘x’ and earned ‘2x’ in revenue.

ROI of 1000% => It means you spent ‘x’ and earned ‘11x’ in revenue.

ROI of -100% => It means you spent ‘x’ and earned 0 in revenue.

Proposed sale by SEO

It is your total SEO fees which you promise to return back to your client by generating twice the amount of sales on his website.

If you generate sales which is less than your total SEO fees, then your client will get a negative return on his investment. If you generate as many sales as your total SEO fees, then your client will get 0% return on his investment.

In order to generate a reasonably positive ROI (like 100% ROI or more), you must generate sales which is at least twice the amount of your total SEO fees.

Additional orders required to generate proposed sale

No. of orders required for proposed sale = proposed sale/average order value

For example, if proposed sale = \$20000 and average order value is \$100.

Then No. of orders required for proposed sale = \$20000/\$100 = 200

Additional traffic required to generate proposed sale

Additional traffic required to generate proposed sale = no. of orders required to generate proposed sale/ e-commerce conversion rate

Please see the related posts for more details:

"How to use Digital Analytics to generate floods of new Sales and Customers without spending years figuring everything out on your own."

Here’s what we’re going to cover in this training…

#1 Why digital analytics is the key to online business success.

​#2 The number 1 reason why most marketers are not able to scale their advertising and maximize sales.

#4 ​Why you won’t get any competitive advantage in the marketplace just by knowing Google Analytics.

#5 The number 1 reason why conversion optimization is not working for your business.

​#7 How to learn and master digital analytics and conversion optimization in record time.

My best selling books on Digital Analytics and Conversion Optimization

Maths and Stats for Web Analytics and Conversion Optimization
This expert guide will teach you how to leverage the knowledge of maths and statistics in order to accurately interpret data and take actions, which can quickly improve the bottom-line of your online business.

Master the Essentials of Email Marketing Analytics
This book focuses solely on the ‘analytics’ that power your email marketing optimization program and will help you dramatically reduce your cost per acquisition and increase marketing ROI by tracking the performance of the various KPIs and metrics used for email marketing.

Attribution modelling is the process of determining the most effective marketing channels for investment. This book has been written to help you implement attribution modelling. It will teach you how to leverage the knowledge of attribution modelling in order to allocate marketing budget and understand buying behaviour.

Himanshu Sharma

• Founder, OptimizeSmart.com
• Over 15 years of experience in digital analytics and marketing
• Author of four best-selling books on digital analytics and conversion optimization
• Nominated for Digital Analytics Association Awards for Excellence
• Runs one of the most popular blogs in the world on digital analytics