Note: Although I work for AWS, these are my own personal findings and thoughts. Please do not consider them as official benchmarks in any way!
AWS announced the latest M6g instances in December 2019. These feature Arm-powered Graviton2 processors, as well as fully encrypted DDR4 memory. Arm processors are everywhere in terms of mobile devices, and my favourite Raspberry Pi computers, but have not traditionally been featured within the cloud. This is changing.
Arm-powered processors are not new to AWS: The first iteration (a1) was announced in November 2018.
Why would you care? In short, the announcements claim that you will see similar (or better!) performance to comparable Intel hardware, for a lower cost. What’s the catch? Your workloads have to be compatible with an Arm architecture. Chances are, if you are running standard Linux workloads on a popular distribution (e.g. Amazon Linux or Ubuntu), these are going to be a real option for you.
With M6g now Generally Available, I decided to take a closer look for myself. How easy would it be to get going? What peformance would I find?
Getting up and running
For a head-to-head comparison, I decided to go with the following instances:
- m5.xlarge (Intel 4 VPUs, 16 gig RAM, $0.214 per hour On Demand)
- m6g.xlarge (Graviton2 4 VPUs, 16 gig RAM, $0.172 per hour On Demand)
The above prices were correct at the time of writing for the eu-west-1 (Ireland) region. At launch, the m6g instances are also available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Tokyo).
This article assumes that you are already familiar with spinning up Linux instances within AWS, including ssh access once they are available. If you need it, here’s the official Getting Started guide.
Secondly, these instances are not Free Tier eligible! You will be incurring costs by spinning them up. Please remember to tidy up after yourself afterwards!
You know you are in the right place when, after electing to launch a new instance from the EC2 console, you see this:
Yes, the choice of either x86 or Arm architectures!
For this exercise, I went with Ubuntu Server 18.04 LTS. As some of the benchmarks are storage heavy, I changed the root EBS volumes to be 30 gigabytes in size, but still the default gp2 SSD storage.
I was then able to ssh in to start the benchmarking. The process for each instance was identical. Ubuntu looks and operates just the same, whether you are using the x86 or Arm editions. In fact, I would check /proc/cpuinfo every so often as a sanity check to ensure I was logged into the right machine at times (!)
1. Linux kernel compilation
Ah, that old favourite! Let’s grab the latest Linux kernel and see how each machine does at compiling it!
Here are the commands to invoke:
sudo apt update sudo apt-get install -y git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache bison flex wget https://git.kernel.org/torvalds/t/linux-5.7-rc5.tar.gz tar xf linux-5.7-rc5.tar.gz cd linux-5.7-rc5 make menuconfig time make -j 4
Some notes on the above:
- After running make menuconfig, simply Exit and save the default settings.
- We use -j 4 for the make process to use all 4 available CPU cores for the building process.
|m5.xlarge||33 minutes, 43 seconds (2023 seconds)|
|m6g.xlarge||34 minutes, 29 seconds (2069 seconds)|
In this test, the Arm system was 2.25% slower. That’s pretty close. Especially when you consider it is 21.7% cheaper in On Demand costs!
There’s a slight caveat with this test: The exact files compiled could vary between x86 and Arm architectures. It’s not exactly scientific. So let’s move on to some more…. traditional…. benchmarking.
2. MariaDB performance
For benchmarking exercise number 2, it was the turn of MariaDB. As a quick reminder:
MariaDB Server is one of the most popular open source relational databases. It’s made by the original developers of MySQL and guaranteed to stay open source. It is part of most cloud offerings and the default in most Linux distributions.
Here, I combine installing MariaDB as well as sysbench in order to load test it. A read / write test is performed. Here we go!
sudo apt-get install -y sysbench mariadb-server sudo mysql -u root -e 'create database sbtest' sudo sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --threads=4 --mysql-host=localhost --mysql-user=root --mysql-port=3306 --tables=5 --table-size=10000000 prepare sudo sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --threads=16 --events=0 --time=300 --mysql-host=localhost --mysql-user=root --tables=5 --delete_inserts=10 --index_updates=10 --non_index_updates=10 --table-size=10000000 --db-ps-mode=disable --report-interval=1 run
Some caveats here:
- This is the test that is very hungry on disk space. Most of the 30 gigabyte storage will be utilised.
- You will notice I don’t secure the MySQL instance, with the commands running as root. This is not Best Practice for general systems. However, in this case the instances are disposable, and they are being destroyed after the testing process.
SQL statistics: queries performed: read: 436520 write: 1247200 other: 62360 total: 1746080 transactions: 31180 (103.91 per sec.) queries: 1746080 (5819.10 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 300.0588s total number of events: 31180 Latency (ms): min: 6.92 avg: 153.96 max: 975.64 95th percentile: 314.45 sum: 4800464.88 Threads fairness: events (avg/stddev): 1948.7500/19.35 execution time (avg/stddev): 300.0291/0.02
SQL statistics: queries performed: read: 451836 write: 1290960 other: 64548 total: 1807344 transactions: 32274 (107.56 per sec.) queries: 1807344 (6023.09 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 300.0680s total number of events: 32274 Latency (ms): min: 5.66 avg: 148.74 max: 833.04 95th percentile: 303.33 sum: 4800535.10 Threads fairness: events (avg/stddev): 2017.1250/22.81 execution time (avg/stddev): 300.0334/0.02
If we pull out the key performance metrics:
|m5.xlarge||Transactions||103.91 / second|
|Queries||5819.10 / second|
|m6g.xlarge||Transactions||107.56 / second|
|Queries||6023.09 / second|
Here, the Arm-powered m6g instance was 3.45% faster for both Transactions and Queries. Decent! And, again, still 21.7% cheaper!
3. Redis performance
Finally, it was the turn of Redis.
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.
I was particularly interested in this one: It’s in-memory. Would there be a performance impact with the full encryption of RAM involved with the Graviton2 processors?
This one is nice and easy to do:
sudo apt-get install -y redis-server redis-benchmark -q
PING_INLINE: 100000.00 requests per second PING_BULK: 97370.98 requests per second SET: 101522.84 requests per second GET: 100908.17 requests per second INCR: 101729.40 requests per second LPUSH: 92678.41 requests per second RPUSH: 101112.23 requests per second LPOP: 94517.96 requests per second RPOP: 92421.44 requests per second SADD: 101010.10 requests per second HSET: 95419.85 requests per second SPOP: 84530.86 requests per second LPUSH (needed to benchmark LRANGE): 85910.65 requests per second LRANGE_100 (first 100 elements): 54614.96 requests per second LRANGE_300 (first 300 elements): 23929.17 requests per second LRANGE_500 (first 450 elements): 16672.22 requests per second LRANGE_600 (first 600 elements): 11687.71 requests per second MSET (10 keys): 99206.34 requests per second
PING_INLINE: 135135.14 requests per second PING_BULK: 127388.53 requests per second SET: 135318.00 requests per second GET: 132100.39 requests per second INCR: 136798.91 requests per second LPUSH: 142653.36 requests per second RPUSH: 136239.78 requests per second LPOP: 141442.72 requests per second RPOP: 136612.02 requests per second SADD: 134228.19 requests per second HSET: 142247.52 requests per second SPOP: 132802.12 requests per second LPUSH (needed to benchmark LRANGE): 141043.72 requests per second LRANGE_100 (first 100 elements): 64850.84 requests per second LRANGE_300 (first 300 elements): 21734.41 requests per second LRANGE_500 (first 450 elements): 14100.39 requests per second LRANGE_600 (first 600 elements): 10858.94 requests per second MSET (10 keys): 107991.36 requests per second
There are a lot of stats here, so just pulling out a few:
|m5.xlarge||SET||101,522 / second|
|GET||100,908 / second|
|m6g.xlarge||SET||135,318 / second|
|GET||132,100 / second|
A big difference here! The Arm-equipped ms6.xlarge is 26.7 – 28.5% faster on these metrics. In fact, it’s faster on all of them, except the latter two LRANGE tests. That could be worth looking deeper into.
From these tests, my conclusion is that the m6g.xlarge is punching very well in terms of performance, in comparison to the m5.xlarge. It was ever-so-slightly slower on the Linux kernel compilation, ever-so-slightly faster on the MariaDB tests, and notably faster on the Redis tests.
Then we factor in the price. The m6g.xlarge It’s 21.7% cheaper in terms of On Demand pricing!
I would definitely be looking to move any appropriate Linux workloads to the Graviton2 setup. The price / performance ratio is stellar here. I can see these processors becoming increasingly popular as ‘word gets out’ on just how good the ratio is!
I hope you have found this journey into some very simple benchmarks useful. I’d be very interested to hear on any workloads you are considering moving to Graviton – perhaps you’ve already moved them!