Maths and Stats behind Web Analytics – Beginners Guide
A long time ago in a galaxy far, far away a person was questioned for his unreasonable use of stats techniques in website analysis by an authority figure.
Unfortunately that person was me. Maths is not really a problem for me till it turns into sets, probability, regression, standard deviation etc or in other words a more specialized field of statistics.
So before I move forward, here is a small warning. I am just a student of statistics and not a full blown statistician. Consequently my methods could be prone to statistical errors. However feel free to jump on me if you can catch an error. I dare you
Why you need to know the Maths and Stats behind Web Analytics?
It is not just about what you should know but is more about what you are expected to know esp. if you deal with businesses day in, day out.
Here are few questions for your consideration:
Q1. When your website conversion rate jumps from 10% to 12% then is it 2% rise in conversion rate or 20% rise in conversion rate?
Q2. Can you double your sales by simply doubling your marketing budget?
Q3. Should you focus on large number of low value customers instead of few high value customers to maximize profit?
Q4. If campaign ‘A’ conversion rate is 10% and campaign ‘B’ conversion rate is 20%, does it mean campaign ‘B’ is performing better than campaign ‘A’?
Q5. Average time on your website is 5 minutes. Does that mean website visitors actually spend 5 minutes on an average?
I am sure many marketers/analyst will make mistakes while answering these questions. I am sure because I made these mistakes too.
It is not like I was born with statistical knowledge. It is something which I have acquired over time and is still acquiring. So if you are not sure about the answers to my questions then it is perfectly normal.
However if I happen to be your client or boss then it is not normal. Sorry. You have disappointed me and I doubt your reports now.
The grumpy boss may not know there has been a negative correlation between conversion rate and revenue and there is no 2% rise in conversion rate.
But he sure knows that his sales are going down and that he is less profitable. And now he has this moral obligation to do the dreaded ‘cost cutting’. Guess whose job is on the line next?
The corporate world is not very forgiving to mistakes made by employees/consultants/agencies. So if we report that the jump in conversion rate from 10% to 12% means there is a 2% rise in conversion rate, our entire marketing report becomes questionable.
We instantly create a shadow on the rest of our analysis. The thought that will instantly pop up in the mind of the recipient, of our report will be “what else he has done wrong?”
Learning Maths and Stats is an excellent way to develop your logical and critical thinking. It makes you a better marketer and off course a better analyst. No one can then easily question your reasoning skills and you become a true NINJA.
Following is one example from everyday life:
Which is a better deal?
3 ads placement for $40.12 or 2 ads placement for $30.65
If you calculate and compare the unit prices then you will find that ‘3 ads placement for $40.12’ is a better deal.
So here we go.
% Change (i.e. the percentage of rise or fall)
This metric is used to calculate the percentage of rise or fall in relation to the old value. We use % change when there is an old value and a new value.
% change = [ (New Value – Old Value)/Old value ] * 100
For example:
Visits 
Conversions 
Conversion Rate 

July 
200 
20 
10% 
Aug 
350 
42 
12% 
% Change 
75% 
110% 
20% 
% change = [(350200)/200]*100 = 75%
% change = [(4220)/20]*100 = 110%
% change = [(1210)/10]*100 = 20%
Google Analytics calculate % change (for every type of metric) when you compare the data with the past data.
So if pages/session in July was 2 and in August was 4, then you wont report that pages/session has improved by 2. You report that pages/session has improved by 100% (i.e. the percentage of change from the old value).
Similarly we calculate % of change for avg. visit duration, %new sessions, bounce rate etc.
% difference (i.e. percentage difference between two values)
This metric is used to calculate the difference between two values in percentage. Use this metric when neither value is more important the other.
% difference = [difference between two values/average of the two values] *100
For example:
Visits 
Conversion Rate 

July 
200 
10% 
Aug 
350 
12% 
% Change 
75% 
20% 
% Difference 
54.54% 
18.18% 
% difference = [200350/(200+350)/2)] * 100 = (150/275)*100 = 54.54%
% difference = [1012/(10+12)/2)] * 100 = 18.18%
Note: Ignore the minus sign if the result is negative.
Many people make mistake by assuming that % change and % difference are same thing. They are not as i have just shown you.
We use % change when there is an old value and a new value and we need to know the percentage of rise or fall in relation to the old value. Most of the time we use % change in reporting.
% Error
This metric is used to calculate the percentage of magnitude of the error when comparing approximate value to an exact value.
% error = [approximate value – exact value/exact value] * 100
For example:
Approximate conversions 
Actual conversions 
% error 

July 
200 
110 
81.82% 
Aug 
300 
200 
50% 
I estimated 200 conversions in July, but got 110 visits. So my % of error is [200110/110] * 100 = 81.82%
I estimated 300 conversions in August, but got 200 visits. So my % of error [300200/200] * 100 = 50%
A practical way you would use % error is when you are running an experiment. You want % error to be as low as possible.
Note: Ignore the minus sign if the result is negative.
Percentage Points
We use percentage points when we subtract one percentage from another to imply that the change is not relative. For example:
Conversion rate jumps from 10% to 12%. So is it 20% rise in conversion rate or 2% rise in conversion rate?
It is actually 20% rise in conversion rate or 2 percentage points rise in conversion. It can’t be 2% rise in conversion rate.
Mean
It is also known as arithmetic mean or population mean. It is simply an average of the numbers. Mean is a type of average. Whereas average could be mean, median, mode or variance. Mean is denoted by Greek letter µ (“mu”). This metric is used a lot in Google Analytics by the name of ‘average': ‘Average time on page’, ‘average time on site’, ‘site average’…..
Mean = sum of numbers /count of numbers
For example let us suppose a website has got 5 web pages:
Bounce Rate 

Page 1 
35% 
Page 2 
40% 
Page 3 
0% 
Page 4 
48% 
Page 5 
100% 
Now bounce rate of the site = (35+40+0+48+100)/5 = 223/5 = 44.6%
Is 44.6% a true bounce rate?
No.
Look at the distribution of bounce rate across all the web pages. Two web pages, page 3 and page 5 have extreme values of 0% and 100%. We call such values as ‘outliers’ in statistics. Outliers have the sadistic ability to skew ‘averages’. So if we take out these two extreme values from our calculations then we can get more accurate bounce rate of the site:
(35+40+48)/3 = 123/3 = 41%
Similarly,
Average time on page (in seconds) 

Page 1 
350 
Page 2 
400 
Page 3 
500 
Page 4 
480 
Page 5 
36000 
Now average time on the site = (350+400+500+480+36000)/5 = 37730/5 = 7546 = 2 hrs 6 minutes
Again the outlier ‘36000’ is skewing our average metric. So if we take it out and then recalculate the average time on site, we would get (350+400+500+480)/4 = 432.5 = 7 minutes 12 seconds
Therefore whenever we analyze ‘average’ metrics we always:
 Look at the distribution
 Identify the outliers (i.e. extreme values)
 Discount outliers from the averages’ calculations.
If you don’t do this then you will get muddy analytical insight from your average metrics like ‘average time on the site’ to be 2hrs 6 minutes.
Unit Price
It is equal to cost/quantity
Which is a better deal?
3 ads placement for $40.12 or 2 ads placement for $30.65
Calculating and comparing unit prices is a good way of finding the ‘best deal:
So, we calculate the unit price in each case:
$40.12/3 = $13.37 per ad placement
$30.65/2 = $15.32 per ad placement
so if we go for the ‘2 ads placement for $30.65′ deal we will end up paying more. Consequently the best deal for us is ‘3 ads placement for $40.12′
Gross Profit, Operating Profit, Net Profit, Bottomline Profit…
Lot of marketers make the mistake of reporting these metrics without understanding what these metrics really are and how they are calculated. Check out this classic video which is on the misreporting of the profit metric:
In simplest terms,
Profit = Sales Revenue – Cost
Revenue = price of the product(s) * quantity sold
Gross Profit = Sales Revenue – Direct Cost
Direct cost can be something like cost of manufacturing a product
Operating Profit = Sales Revenue – Operating Cost.
It is the profit before interest and taxes. Operating cost is the ongoing cost of running a business, product or system. It can include both direct and indirect costs.
Net Profit – also known as net income, net earnings or bottomline. It is the profit after interest and taxes.
Net Profit = Sales Revenue – Total cost (this includes any direct and indirect cost + interest + taxes)
When we talk about business bottomline, we are actually talking about the ‘net profit’.
Profit Margin
It is also known as net profit margin, net margin, net profit ratio
Profit Margin = (Net Profit/ Revenue) * 100
Low profit margin indicates higher risk, that a decline in sales will erase the profit and result in net loss.
The law of diminishing returns and your Marketing Budget
According to the law of diminishing returns,
if you keep adding more of one unit of production to a productive process while keeping all others units constant, you will at some point produce lower per unit returns.
So for example if you keep pumping more money into a Facebook campaign without changing the present form of the campaign, at some point you will reach the point of diminishing returns and once you cross this point, your conversion rate will go down and cost per acquisition will go up.
So when you are thinking of increasing the budget of a campaign by considerable amount, think of putting more ads and targeting more keywords. In this way you will change multiple units of production and can stay away from the point of diminishing returns.
How to determine the point of diminishing returns
To determine the point of diminishing returns you need to gradually add more of one unit of production in the production process. If you rapidly add units, you will never know when you crossed the point of diminishing returns and start losing money. So gradually increase your budget.
Understand that just doubling the budget of a high performing campaign may not result in proportional increase in performance. You need to do a lot more than just increasing the budget then. Consider running more ads, targeting more keywords or new markets to stay away from the point of diminishing returns. So now i guess I have answerd my question: Can you double your sales by simply doubling your marketing budget?
The law of diminishing returns and Multichannel Marketing
Understand that no one campaign is solely responsible for conversions and sales if you are doing multi channel marketing. Different marketing channels work together to create sales and conversions. Some marketing channels help more in assisting conversions than completing conversions. We call such channels as ‘assisting marketing channels‘. While other marketing channels work more in completing conversions.
So if you over invest in a particular marketing channel while overlooked the role of assisting marketing channels, you will reach the point of diminishing returns faster than you think. Because you are adding more of one units of production (here budget) to one marketing channels while keeping other units constants (i.e. not investing proportional amount in assisting marketing channels).
Please see the related post for more details: Thinking of investing more in a marketing channel? Think Twice.
Law of diminishing returns and Last Click Keywords Optimization
Just like we has assisting marketing channels, we have got assisting keywords. These keywords help more in assisting conversions than completing conversions. Similarly we have got last click keywords. These are the keywords people searched for just before completing a conversion and are attributed conversions in a last click conversion model.
An average PPC marketer spends his life time optimizing for ‘last click keywords’ assuming that only these keywords make up the whole conversion funnel. He completing ignore the role played by assisting keywords. So in case of PPC, if you keep optimizing for last click keywords while ignoring first and middle clicks keywords (collectively known as assisting keywords) you will at some point produce lower per unit returns.
This means your cost per acquisition at some point will start rising and your profit on sales will start declining. Then the only way, to remain within your CPA targets is by tweaking (add, pause, delete, change bids) last click keywords. But this is sub optimal way of optimizing a PPC campaign as you are optimizing only a small part of the conversion process.
So in order to strengthen your PPC campaigns you also need to bid on keywords that initiate or assist conversions. In this way you can stay away from the point of diminishing returns and remain within your CPA targets much longer.
Law of diminishing returns and Last Click CPA Optimization
The CPA that you see in your Google Adwords report or Google Analytics report is not your actual cost per acquisition. Sorry to disappoint you. It is the cost per last click conversion. So if you ignore first and middle click keywords and optimize PPC campaigns on the basis of cost per last click conversions than you won’t get optimal results and sometimes even loose money.
This is because if a keyword is not completing a sale, it may be initiating a sale or assisting a sale and if you stop bidding on it because its cost per last click conversion (the so called CPA reported by Google Adwords) is too high or it is not completing any conversion then you may lose money.
Please see the related post for more details: Attribution Modeling Case Study – Introducing Effective Click Optimization
80/20 Rule
According to Pareto Principal (also known as the 80–20 rule), 80% of the effects come from 20% of the causes which means:
1. 80% of your sales come from 20% of your visitors.
2. 80% of your output come from 20% of your input
3. 80% of your sales come from 20% of your products
4. 80% of your profit come from 20% of your products
So what you need to do is, to determine those 20% of everything and work relentlessly on them. You can’t sell each and every product of your client in each and every location of your country, so why spread your marketing efforts and resources too thin by trying to be visible everywhere for everything you sell.
Let us suppose that your target market is US. So your average customer can be anywhere from US. Let us also suppose that after analyzing one year data, you found out that people from New York City bought 2 times more than an average visitor to your website. They tend to spend 30% more than average per order. So now you know where your best customers live.
New York City Clients (Best Clients) 
Average Clients (can be anywhere from US) 

No. of transactions/year 
4000 
2000 
Average order value 
$70 
$40 
Total Revenue 
$280000 
$80000 
Total Spend 
$36000 
$50000 
Gross Profit 
$244000 
$30000 
Your cost per acquisition will be high if you target whole of the US through search marketing or any other ad campaigns. So it is pretty obvious that your total spend is going to be higher for acquiring average clients.
So by directing your marketing efforts in acquiring more profitable clients, you can increase your revenue and profit even without increasing traffic or spending more on content creation and marketing. Now the big questions that comes up is, why people from New York City are our best clients, what they are purchasing and what we can do, so that they buy more. If you can get answer to these questions, you can increase your sales within few weeks without increasing website traffic or spending more on content marketing. And this should be our aim as a marketer. I hope this answers my question: Why best customers generate more profit than average or low value customers?
Please see the related post for more details: How to use Web Analytics 2.0 to improve your conversions
Statistical Significance and Marketing Decisions
Statistically significant result is the result which is unlikely to have occurred by chance. Statistically insignificant result is likely to have occurred by chance. So when someone says “is your result statistically significant?” then it means he is really asking “What is the likely hood that your result has not occurred by chance”.
Consider the following hypothetical scenario:
Visits 
Transactions 
ECommerce Conversion Rate 

Campaign A 
1820 
150 
8.25% 
Campaign B 
20 
4 
19.25% 
Campaign C 
780 
41 
5.24% 
Do you think you should be investing more in campaign ‘B’ because its conversion rate is highest?
I would suggest, not. The sample size in case of campaign ‘B’ (4 transactions out of 20 visits) is too small to be statistically significant. Had campaign B got 1 transaction out of 1 visit, it conversion rate would be 100%. Will that make its performance even better? No.
Do you think you should now be investing in campaign ‘A’ because it has higher conversion rate?
Are you really sure that the difference between the conversion rates of campaign ‘A’ and Campaign ‘C’ is statistically significant.?
In order to determine whether the difference is statistically significant or not, you need to conduct a statistical test (like Z test) to calculate the ‘confidence’ that difference in the conversion rates of the two campaigns is statistically significant. I am not talking about every day confidence, but this statistical confidence:
 I am the confidence you need, to play with statistical significance
It is the confidence that the result has not occurred by a random chance. Statistical significance can be considered to be the confidence one has in a given result. Confidence depends upon the signal to noise ratio and the sample size. So confidence that the result has not occurred by a random chance is high if signal is large and/or sample size is large and/or noise is low.
Let us assume that after conducting a statistical test we came to the conclusion that the difference in the conversion rates of the two campaigns can’t proved to be statistically significant. Under these circumstances we cannot draw the conclusion that campaign ‘C’ is not performing better. So what we can do then. Well we need to collect more data to compute statistical significance of the difference in the conversion rates of the two campaigns. At this stage investing more money in campaign ‘A’ may not produce optimal results as you may think it will.
You can see yourself conducting more such statistical test as your statisical knowledge increases.
Data Segmentation and Inference
The statistical conclusion is known by the technical name of ‘Statistical inference‘. The Statistical inference is the process of drawing conclusions from data which is subject to random variation. One example of statistical inference is observational errors. You assumed that conversion rate of campaign ‘B’ is highest only on the basis of your observation. This is your statistical inference which is wrong.
Statistical inferences are often drawn from random sample taken from a set of entities (values, potential measurements). This set of entities is known as statistical population. The set of campaigns above is an examples of statistical population from which statistical inferences (like which is the highest performing campaign) are drawn. The subset of statistical population is called sub population.
For example:
if you consider a PPC campaign as statistical population then its ad groups can be considered as sub populations. To understand the properties of statistical population, statisticians first separate the population into distinct sub populations (provided they have distinct properties) and then try to understand the properties of individual subpopulations.
For the same reason, analytics experts recommends to segment analytics data before you draw statistical inferences from it. So if you want to understand the performance of a PPC campaign, then you should first try to understand the performance of its individual ad groups.
Similarly if you want to understand the performance of an ad group you should first try to understand the performance of the keywords and ad copies in that ad group. I hope it is clear now, why data segmentation is so important in web analytics.
Please see the related posts for more details:
 Is your conversion Rate Statistically Significant?
 What Matters more: Conversion Volume or Conversion Rate – Geek Case Study
Correlation and Causation
Correlation measures relationship between two variables. Let us suppose ‘A’ and ‘B’ are two variables. If as ‘A’ goes up, ‘B’ goes up then ‘A’ and ‘B’ are positively correlated. However if as ‘A’ goes up, ‘B’ goes down then ‘A’ and ‘B’ are negatively correlated. For example:
Here Pages/visit is increasing over time but the Goal conversion rate is going down. So here the user engagement negatively correlates with conversions. When a user engagement negatively correlates with conversion then the engagement becomes distraction. I have talked more about this distraction in the post: How to separate User Engagement from Distraction
You need to look out for such negative correlations. The guy above (with the grumpy boss) is busy focusing on the conversion rate. He probably didn’t realize that the conversion rate may be negatively correlated with revenue and so instead of focusing on conversion rate, he should be focusing on improving revenue.
Correlation Coefficient
The correlation coefficient is used to measure the strength of the correlation. Its value ranges from 1 to 1.
1 means perfect negative correlation. 0 means no relationship exist between the two variables. 1 means perfect positive correlation.
What is Causation
Causation is the theory that something happened as a result. For example, rise in temperature increases the sale of cold drinks.
Correlation doesn’t imply causation
You can always find some relationship between two variables/events if you really want to. However mere presence of relationship between two variables/events doesn’t imply that one causes the other.
For example in the graph above there seems to be a relationship between pages/visit and conversion rate. But we can’t conclude that increase in user engagement has resulted in decrease in conversion rate at this stage without more analysis.
ROI Calculations for SEO
Types of SEO ROIs
1. Anticipated ROI
2. Actual ROI – Immediate
3. Actual ROI – long term
Anticipated SEO ROI
= (Anticipated Revenue from SEO efforts – Proposed Cost of the SEO Project)/proposed cost of the SEO project (measured in percentage)
The three things that you need to know in advance in order to calculate Anticipated ROI:
1. Average monthly visits
2. Average order value
3. Ecommerce conversion rate of the website
Actual SEO ROI (Immediate)
= (Total ECommerce Revenue through SEO + Total Goal Value through SEO) – Total cost of running the SEO campaign/ Total cost of running the SEO campaign (measured in percentage)
Here,
Total Goal Value = Assisting Conversion Value + Last Interaction Conversion Value
Actual SEO ROI (long term)
= Immediate ROI*12 (measured in percentage)
{client will continue to get seo benefits at least for the next one year even without any SEO}
What does following ROIs mean?
i. ROI of 0%
ii. ROI of 100%
iii. ROI of 1000%
iv. ROI of 100%
ROI of 0% => It means no profit, no loss. You spent ‘x’ and earned ‘x’ in revenue.
ROI of 100% => It means you spent ‘x’ and earned ‘2x’ in revenue.
ROI of 1000% => It means you spent ‘x’ and earned ‘11x’ in revenue.
ROI of 100% => It means you spent ‘x’ and earned 0 in revenue.
Proposed Sale by SEO
It is your total SEO fees which you promise to return back to your client by generating twice the amount of sales on his website. If you generate sales which is less than your total SEO fees, then your client will get a negative return on his investment. If you generate as much sales as your total SEO fees, then your client will get 0% return on his investment. In order to generate a reasonably positive ROI (like 100% ROI or more), you must generate sales which is at least twice the amount of your total SEO fees.
Additional orders required to generate proposed Sale
No. of orders required for proposed sale = proposed sale/average order value
For example, if proposed sale = $20000 and average order value is $100.
Then No. of orders required for proposed sale = $20000/$100 = 200
Additional traffic required to generate proposed sale
Additional traffic required to generate proposed sale = no. of orders required to generate proposed sale/ ecommerce conversion rate
Please see the related posts for more details:
Now it is your turn. How do you use statistics in your analysis? Please share your thoughts and insights.
Other Posts you may find useful: Analyze backlinks through Link Building Dashboards
Tweet Follow @analyticsnerd