This post summarizes the programming logic behind our products Ripple and Wave on how we analyze and calculate AWS usages and cost.
In order to get insight and collect data of an AWS account usages, the best way is to enable AWS Cost & Usage Report (CUR) and a trunk of big data in CSV format will be dumped into an S3 bucket under the payer AWS account. These data contains all of the usage details and service dimensions, such as AmazonEC2 Running Hours, Instance Type, Reserved Instance (RI) record, CPU utilization, Storage I/O, etc.
Background
In 2018, we published Mobingi Ripple and Wave, two product offerings that calculate and analyze AWS cost. These two services use the same code logic to calculate accounts usages & cost including properly applying RI to instances, re-allocating RI across linked accounts (under same AWS organization), handling spot instances or on-demand instances, calculating data transfer and DynamoDb, as well as all products which AWS offers.
The Challenge
In order to get insight and collect data of an AWS account usages, the best way is to enable AWS Cost & Usage Report (CUR) and a trunk of big data in CSV format will be dumped into an S3 bucket under the payer AWS account. These data contain all of the usage details and service dimensions, such as AmazonEC2 Running Hours, Instance Type, Reserved Instance (RI) record, CPU utilization, Storage I/O, etc.
Of a normal AWS customer account, the data size is usually around hundreds of MB in which one CSV file would contain the usages of the whole month. Some heavy AWS customer accounts may generate up to several Gigabytes of such data, separated into multiple CSV files with each of the size around 1G. In the CSV file, each line is named LineItem and represents the usage dimensions of certain service for one clock-hour. For example:
identity/lineitemid | identity/timeinterval | lineitem/blendedcost | lineitem/blendedrate | lineitem/currencycode | lineitem/lineitemdescription | lineitem/lineitemtype | lineitem/operation | lineitem/productcode |
---|---|---|---|---|---|---|---|---|
mluxyrxfepz4mzyk3 | 2018-12-18T00:00:00Z/2018-12-19T00:00:00Z | 0.0030109446 | 0.1 | USD | $0.10 per GB-month of data storage | Usage | APN1-TimedStorage-ByteHrs | AmazonECR |
(Imagine millions of LineItems like above included in one account’s usage)
AWS introduces Redshift data warehouse or QuickSight to analyze or visulize its big data report. However, two biggest challenges appear to be:
- Require programming/engineer knowledge to implement, and complicated to get started or maintain
- Inflexible when it comes to re-calculate services with RI (AmazonEC2, AmazonRDS, etc)
The Re-calculation of EC2
Consider this situation:
A MSP or an enterprise company needs to issue invoices to its customers or group companies.
The payer account A has 3 linked accounts B, C, and D.
Account B owns some EC2 RI but didn't consume all.
Account C and D don't own any RI but consumed the RI left by account A, and in a result received lower cost than they supposed to.
The payer A receives billing invoice by AWS, and then wants to issue invoices to B, C and D for their correct usage:
B: Invoice of usages with RI applied.
C: Invoice of usages without any RI.
D: Invoice of usages without any RI.
In the above case, the invoices received from AWS will not work. Whether we use the unblended cost or blended cost_, the total is always lower than they suppose to be charged. It comes to the situation of re-calculating EC2 costs by ourselves.
In our company, we call it True Unblended cost calculation.
The concept is fairly simple:
EC2 instance running hours x instance hourly price = $cost
But when it comes to applying the concept into calculation, you’ll soon realize that there are dozens of factors that can affect the formula:
- Instance Type
- Size-flexibility (Only Linux/Unix RI with regional scope)
- Availability Zone
- Operation (different operation has different hourly pricing)
- UsageType (On-demand, spot, or disk usage..)
- and more..
We will need to consider all dimensions when applying the formula to calculate. And since all data are defined in the original CUR CSV, it’s good to know what each column in the CSV represents so we can better filter with the result LineItems we need:
lineItem/UsageStartDate <- The start date of daily calculation, eg: "2018-08-01T00:00:00Z"
lineItem/UsageEndDate
bill/BillingEntity <- who is the seller, eg: "AWS", "AWSMarketplace"
bill/PayerAccountId <- Payer ID
lineItem/UsageAccountId <- Linked account ID
lineItem/LineItemType <- eg: "Usage", "RIFee", "Credit", "Refund", "Fee", "Tax".
lineItem/ProductCode <- eg: "AmazonEC2"
product/ProductName <- eg: "Amazon Elastic Compute" in full name
lineItem/UsageType <- eg: "APN1-EBS:SnapshotUsage"
lineItem/Operation <- eg: "RunInstances:0001"
lineItem/UsageAmount <- The usage quantity for specific time range
lineItem/UnblendedRate <- unblended unit price
lineItem/BlendedRate <- blended unit price
lineItem/UnblendedCost <- cost based on unblended unit price [*]
lineItem/BlendedCost <- cost based on blended unit price [*]
product/instanceType <- eg: "t2.micro", "db.m4.large"
product/region <- eg: "ap-northeast-1"
Once we get our data we use to re-calculate EC2 by filtering above columns, we are ready to proceed to separating and summing up the total EC2 running hours by each instance type with each usage type on each operation at each different region and availability zone. Below is a sample PHP code used while looping into each LineItems and combining the same vectors above:
/**
* Handle EC2 Compute Instances, and Dedicated Hosts
* Put results into a separated array: $this->AmazonEC2['ec2']
*
* Note: with a dedicated host, you purchase an entire physical host from AWS and that host is billed to you on an hourly basis just like EC2 instances are billed.
* Dedicated hosts have the "InstanceType" with instance family prefix naming only, eg: "t3", "m4".. (there is no such "t3.medium" alike namings)
* Dedicated hosts have the "Tenancy" as "N/A" as always and there is no size-flexibility
*/
if (
$ProductFamily == "Dedicated Host" ||
(
$ProductFamily == "Compute Instance" &&
strpos($Operation, "RunInstances") !== false
)
) {
$result = [
"ProductFamily" => $ProductFamily,
"UsageType" => $UsageType,
"InstanceType" => $InstanceType,
"Operation" => $Operation,
"Tenancy" => $Tenancy,
"AvailabilityZone" => $AvailabilityZone,
"CostBeforeTax" => bcadd((string)$UnblendedCost, "0", 10),
"ItemDescription" => $Description,
"NormalizationFactor" => $NormalizationFactor,
"UsageQuantity" => bcadd((string)$UsageAmount, "0", 10),
"NormalizedUsageQuantity" => bcadd((string)$NormalizedUsageAmount, "0", 10),
"NormalizedUsageQuantity_OverClock" => bcadd("0", "0", 10),
"NormalizedUsageQuantity_Hrly" => [ $UsageStartDate .'|'. $UsageEndDate => bcadd((string)$NormalizedUsageAmount, "0", 10) ],
];
if (empty($this->AmazonEC2['ec2'][$UsageRegion])) {
$this->AmazonEC2['ec2'][$UsageRegion][] = $result;
} else {
foreach ($this->AmazonEC2['ec2'][$UsageRegion] as $k => $v) {
if (
$v["UsageType"] == $UsageType &&
$v["InstanceType"] == $InstanceType &&
$v["Operation"] == $Operation &&
$v["Tenancy"] == $Tenancy &&
$v["AvailabilityZone"] == $AvailabilityZone &&
$v["NormalizationFactor"] == $NormalizationFactor
) {
$this->AmazonEC2['ec2'][$UsageRegion][$k]['CostBeforeTax'] = bcadd($this->AmazonEC2['ec2'][$UsageRegion][$k]['CostBeforeTax'], (string)$UnblendedCost, 10);
$this->AmazonEC2['ec2'][$UsageRegion][$k]['UsageQuantity'] = bcadd($this->AmazonEC2['ec2'][$UsageRegion][$k]['UsageQuantity'], (string)$UsageAmount, 10);
$this->AmazonEC2['ec2'][$UsageRegion][$k]['NormalizedUsageQuantity'] = bcadd($this->AmazonEC2['ec2'][$UsageRegion][$k]['NormalizedUsageQuantity'], (string)$NormalizedUsageAmount, 10);
$this->AmazonEC2['ec2'][$UsageRegion][$k]['NormalizedUsageQuantity_Hrly'][$UsageStartDate .'|'. $UsageEndDate] = bcadd($this->AmazonEC2['ec2'][$UsageRegion][$k]['NormalizedUsageQuantity_Hrly'][$UsageStartDate .'|'. $UsageEndDate], (string)$NormalizedUsageAmount, 10);
return;
}
}
$this->AmazonEC2['ec2'][$UsageRegion][] = $result;
}
}
A sample return of running above code on an AWS account CUR CSV, we get:
Array
(
[ap-northeast-1] => Array
(
[0] => Array
(
[ProductFamily] => Compute Instance
[UsageType] => APN1-BoxUsage:t3.micro
[InstanceType] => t3.micro
[Operation] => RunInstances
[Tenancy] => Shared
[AvailabilityZone] => ap-northeast-1a
[CostBeforeTax] => 9.7920000000
[ItemDescription] => $0.0136 per On Demand Linux t3.micro Instance Hour
[NormalizationFactor] => 0.5
[UsageQuantity] => 720.0000000000
[NormalizedUsageQuantity] => 360.0000000000
[NormalizedUsageQuantity_OverClock] => 0.0000000000
)
)
[us-west-2] => Array
(
[0] => Array
(
[ProductFamily] => Compute Instance
[UsageType] => USW2-BoxUsage:t2.micro
[InstanceType] => t2.micro
[Operation] => RunInstances
[Tenancy] => Shared
[AvailabilityZone] => us-west-2a
[CostBeforeTax] => 8.3520000000
[ItemDescription] => Linux/UNIX (Amazon VPC), t2.micro reserved instance applied
[NormalizationFactor] => 0.5
[UsageQuantity] => 720
[NormalizedUsageQuantity] => 360
[NormalizedUsageQuantity_OverClock] => 360
)
[1] => Array
(
[ProductFamily] => Compute Instance
[UsageType] => USW2-BoxUsage:t2.micro
[InstanceType] => t2.micro
[Operation] => RunInstances
[Tenancy] => Shared
[AvailabilityZone] => us-west-2c
[CostBeforeTax] => 8.3520000000
[ItemDescription] => $0.0116 per On Demand Linux t2.micro Instance Hour
[NormalizationFactor] => 0.5
[UsageQuantity] => 0
[NormalizedUsageQuantity] => 0
[NormalizedUsageQuantity_OverClock] => 360
)
)
)
From the combined usage result above, we get each instance usage total under each region. Next, we will:
- Perform hourly RI clock-hour limit check (For RI Clock-hour limitation, if multiple eligible instances are running concurrently, the Reserved Instance billing benefit is applied to all the instances at the same time up to a maximum of 3600 seconds in a clock-hour; thereafter, On-Demand rates apply.)
- Apply RI to maching instances
- consider RI size-flexibility (For Regional/Zonal RI differenciations, Regional RI has size-flexibility; Zonal RI do not apply size flexibility.)
- consider Zonal RI / Regional RI
- consider cross accounts sharing
- apply RI to cover on-demand running hours properly
The code for calculation is quite long (written in a class), I won’t paste all here. But certain instance operations ared needed to be considered since their pricing are all different:
case 'RunInstances':
return 'Amazon EC2 running Linux/UNIX';
break;
case 'RunInstances:000g':
return 'Amazon EC2 running SUSE Linux';
break;
case 'RunInstances:0010':
return 'Amazon EC2 running Red Hat Enterprise Linux';
break;
case 'RunInstances:0002':
return 'Amazon EC2 running Windows';
break;
case 'RunInstances:0006':
return 'Amazon EC2 running Windows with SQL Server Standard';
break;
case 'RunInstances:0102':
return 'Amazon EC2 running Windows with SQL Server Enterprise';
break;
case 'RunInstances:0202':
return 'Amazon EC2 running Windows with SQL Server Web';
break;
case 'RunInstances:0800':
return 'Amazon EC2 running Windows (Bring your own license)';
break;
Once proper RI is applied, we get the correct cost for that account in JSON format:
{
"ap-northeast-1": [
{
"ProductFamily": "Compute Instance",
"UsageType": "APN1-BoxUsage:t3.micro",
"InstanceType": "t3.micro",
"Operation": "RunInstances",
"Tenancy": "Shared",
"AvailabilityZone": "ap-northeast-1a",
"CostBeforeTax": "9.7920000000",
"ItemDescription": "$0.0136 per on-demand t3.micro EC2 Linux\/UNIX instance hour (or partial hour)",
"NormalizationFactor": 0.5,
"UsageQuantity": "720.0000000000",
"NormalizedUsageQuantity": "360.0000000000"
}
],
"us-west-2": [
{
"ProductFamily": "Compute Instance",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"Tenancy": "Shared",
"AvailabilityZone": "us-west-2a",
"CostBeforeTax": "0",
"ItemDescription": "t2.micro Reserved Instance applied.",
"NormalizationFactor": 0.5,
"UsageQuantity": "0",
"NormalizedUsageQuantity": "0"
},
{
"ProductFamily": "Compute Instance",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"Tenancy": "Shared",
"AvailabilityZone": "us-west-2c",
"CostBeforeTax": "0",
"ItemDescription": "t2.micro Reserved Instance applied.",
"NormalizationFactor": 0.5,
"UsageQuantity": "0",
"NormalizedUsageQuantity": "0"
},
{
"ProductFamily": "Compute Instance",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"Tenancy": "Shared",
"AvailabilityZone": "us-west-2a",
"CostBeforeTax": "33.4080000000",
"ItemDescription": "$0.0116 per on-demand t2.micro EC2 Linux\/UNIX instance hour (or partial hour)",
"NormalizationFactor": 0.5,
"UsageQuantity": 2880,
"NormalizedUsageQuantity": 1440,
"OverClockHour": true
},
{
"ProductFamily": "Compute Instance",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"Tenancy": "Shared",
"AvailabilityZone": "us-west-2c",
"CostBeforeTax": "8.3520000000",
"ItemDescription": "$0.0116 per on-demand t2.micro EC2 Linux\/UNIX instance hour (or partial hour)",
"NormalizationFactor": 0.5,
"UsageQuantity": 720,
"NormalizedUsageQuantity": 360,
"OverClockHour": true
},
{
"RIFee": "yes",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"UsageQuantity": "720.0000000000",
"CostBeforeTax": "2.4480000000",
"ItemDescription": "$0.0034 hourly fee per reserved Standard 1-Year Partial Upfront t2.micro instance"
},
{
"RIFee": "yes",
"UsageType": "USW2-BoxUsage:t2.micro",
"InstanceType": "t2.micro",
"Operation": "RunInstances",
"UsageQuantity": "720.0000000000",
"CostBeforeTax": "2.8080000000",
"ItemDescription": "$0.0039 hourly fee per reserved Convertible 1-Year Partial Upfront t2.micro instance"
}
]
}
Viewing the above result in GUI will look something like the image below (very similar to AWS-native billing report dashboard):
Of course there are many factors which may cause the re-calculation on EC2, RDS, ElastiCache etc, such as “NormalizationFactor”, “Multi/Single-AZ Deployment”, “Free-Tier”, but the whole idea is like what I described in this post. It’s simple to understand but also complex to achieve. It certainly requires a long reading on AWS documentation and over time analysis on CUR CSV.
Furthermore, the big data era is booming and if we can intelligently dig the valuable data and summarize in a human-readable structure there will be much more valuable information for every AWS customers to better understand, manage and forecast the budget.
---For any comments, questions, or feedback, please reach out to us @MobingiTech.