Note: Although I work for AWS, these are my own personal findings and thoughts. Please do not consider them as official benchmarks in any way!

AWS announced the latest M6g instances in December 2019. These feature Arm-powered Graviton2 processors, as well as fully encrypted DDR4 memory. Arm processors are everywhere in terms of mobile devices, and my favourite Raspberry Pi computers, but have not traditionally been featured within the cloud. This is changing.

Arm-powered processors are not new to AWS: The first iteration (a1) was announced in November 2018.

Why would you care? In short, the announcements claim that you will see similar (or better!) performance to comparable Intel hardware, for a lower cost. What’s the catch? Your workloads have to be compatible with an Arm architecture. Chances are, if you are running standard Linux workloads on a popular distribution (e.g. Amazon Linux or Ubuntu), these are going to be a real option for you.

With M6g now Generally Available, I decided to take a closer look for myself. How easy would it be to get going? What peformance would I find?

Getting up and running

For a head-to-head comparison, I decided to go with the following instances:

  • m5.xlarge (Intel 4 VPUs, 16 gig RAM, $0.214 per hour On Demand)
  • m6g.xlarge (Graviton2 4 VPUs, 16 gig RAM, $0.172 per hour On Demand)

The above prices were correct at the time of writing for the eu-west-1 (Ireland) region. At launch, the m6g instances are also available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Frankfurt), and Asia Pacific (Tokyo).

This article assumes that you are already familiar with spinning up Linux instances within AWS, including ssh access once they are available. If you need it, here’s the official Getting Started Guide.

Secondly, these instances are not Free Tier eligible! You will be incurring costs by spinning them up. Please remember to tidy up after yourself afterwards!

You know you are in the right place when, after electing to launch a new instance from the EC2 console, you see this:

Yes, the choice of either x86 or Arm architectures!

For this exercise, I went with Ubuntu Server 18.04 LTS. As some of the benchmarks are storage heavy, I changed the root EBS volumes to be 30 gigabytes in size, but still the default gp2 SSD storage.

I was then able to ssh in to start the benchmarking. The process for each instance was identical. Ubuntu looks and operates just the same, whether you are using the x86 or Arm editions. In fact, I would check /proc/cpuinfo every so often as a sanity check to ensure I was logged into the right machine at times (!)

1. Linux kernel compilation

Ah, that old favourite! Let’s grab the latest Linux kernel and see how each machine does at compiling it!

Here are the commands to invoke:

sudo apt update

sudo apt-get install -y git build-essential kernel-package fakeroot 
libncurses5-dev libssl-dev ccache bison flex

tar xf linux-5.7-rc5.tar.gz
cd linux-5.7-rc5
make menuconfig
time make -j 4

Some notes on the above:

  1. After running make menuconfig, simply Exit and save the default settings.
  2. We use -j 4 for the make process to use all 4 available CPU cores for the building process.

The results?

Instance Time
m5.xlarge 33 minutes, 43 seconds (2023 seconds)
m6g.xlarge 34 minutes, 29 seconds (2069 seconds)

In this test, the Arm system was 2.25% slower. That’s pretty close. Especially when you consider it is 21.7% cheaper in On Demand costs!

There’s a slight caveat with this test: The exact files compiled could vary between x86 and Arm architectures. It’s not exactly scientific. So let’s move on to some more…. traditional…. benchmarking.

2. MariaDB performance

For benchmarking exercise number 2, it was the turn of MariaDB. As a quick reminder:

MariaDB Server is one of the most popular open source relational databases. It’s made by the original developers of MySQL and guaranteed to stay open source. It is part of most cloud offerings and the default in most Linux distributions.

Here, I combine installing MariaDB as well as sysbench in order to load test it. A read / write test is performed. Here we go!

sudo apt-get install -y sysbench mariadb-server
sudo mysql -u root -e 'create database sbtest'

sudo sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --threads=4 --mysql-host=localhost --mysql-user=root --mysql-port=3306 --tables=5 --table-size=10000000 prepare

sudo sysbench /usr/share/sysbench/oltp_read_write.lua --db-driver=mysql --threads=16 --events=0 --time=300 --mysql-host=localhost --mysql-user=root --tables=5 --delete_inserts=10 --index_updates=10 --non_index_updates=10 --table-size=10000000 --db-ps-mode=disable --report-interval=1 run</code></pre>

Some caveats here:

  1. This is the test that is very hungry on disk space. Most of the 30 gigabyte storage will be utilised.
  2. You will notice I don’t secure the MySQL instance, with the commands running as root. This is not Best Practice for general systems. However, in this case the instances are disposable, and they are being destroyed after the testing process.

The results?


SQL statistics:
    queries performed:
        read:                            436520
        write:                           1247200
        other:                           62360
        total:                           1746080
    transactions:                        31180  (103.91 per sec.)
    queries:                             1746080 (5819.10 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0588s
    total number of events:              31180

Latency (ms):
         min:                                  6.92
         avg:                                153.96
         max:                                975.64
         95th percentile:                    314.45
         sum:                            4800464.88

Threads fairness:
    events (avg/stddev):           1948.7500/19.35
    execution time (avg/stddev):   300.0291/0.02</code></pre>


SQL statistics:
    queries performed:
        read:                            451836
        write:                           1290960
        other:                           64548
        total:                           1807344
    transactions:                        32274  (107.56 per sec.)
    queries:                             1807344 (6023.09 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0680s
    total number of events:              32274

Latency (ms):
         min:                                  5.66
         avg:                                148.74
         max:                                833.04
         95th percentile:                    303.33
         sum:                            4800535.10

Threads fairness:
    events (avg/stddev):           2017.1250/22.81
    execution time (avg/stddev):   300.0334/0.02</code></pre>

If we pull out the key performance metrics:

| Instance | Metric | Result | | m5.xlarge | Transactions | 103.91 / second | | | Queries | 5819.10 / second | | m6g.xlarge | Transactions | 107.56 / second | | | Queries | 6023.09 / second |

Here, the Arm-powered m6g instance was 3.45% faster for both Transactions and Queries. Decent! And, again, still 21.7% cheaper!

3. Redis performance

Finally, it was the turn of Redis.

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.

I was particularly interested in this one: It’s in-memory. Would there be a performance impact with the full encryption of RAM involved with the Graviton2 processors?

This one is nice and easy to do:

sudo apt-get install -y redis-server
redis-benchmark -q

The results:


PING_INLINE: 100000.00 requests per second
PING_BULK: 97370.98 requests per second
SET: 101522.84 requests per second
GET: 100908.17 requests per second
INCR: 101729.40 requests per second
LPUSH: 92678.41 requests per second
RPUSH: 101112.23 requests per second
LPOP: 94517.96 requests per second
RPOP: 92421.44 requests per second
SADD: 101010.10 requests per second
HSET: 95419.85 requests per second
SPOP: 84530.86 requests per second
LPUSH (needed to benchmark LRANGE): 85910.65 requests per second
LRANGE_100 (first 100 elements): 54614.96 requests per second
LRANGE_300 (first 300 elements): 23929.17 requests per second
LRANGE_500 (first 450 elements): 16672.22 requests per second
LRANGE_600 (first 600 elements): 11687.71 requests per second
MSET (10 keys): 99206.34 requests per second


PING_INLINE: 135135.14 requests per second
PING_BULK: 127388.53 requests per second
SET: 135318.00 requests per second
GET: 132100.39 requests per second
INCR: 136798.91 requests per second
LPUSH: 142653.36 requests per second
RPUSH: 136239.78 requests per second
LPOP: 141442.72 requests per second
RPOP: 136612.02 requests per second
SADD: 134228.19 requests per second
HSET: 142247.52 requests per second
SPOP: 132802.12 requests per second
LPUSH (needed to benchmark LRANGE): 141043.72 requests per second
LRANGE_100 (first 100 elements): 64850.84 requests per second
LRANGE_300 (first 300 elements): 21734.41 requests per second
LRANGE_500 (first 450 elements): 14100.39 requests per second
LRANGE_600 (first 600 elements): 10858.94 requests per second
MSET (10 keys): 107991.36 requests per second

There are a lot of stats here, so just pulling out a few:

| Instance | Metric | Result | | m5.xlarge | SET | 101,522 / second | | | GET | 100,908 / second | | m6g.xlarge | SET | 135,318 / second | | | GET | 132,100 / second |

A big difference here! The Arm-equipped ms6.xlarge is 26.7 – 28.5% faster on these metrics. In fact, it’s faster on all of them, except the latter two LRANGE tests. That could be worth looking deeper into.


From these tests, my conclusion is that the m6g.xlarge is punching very well in terms of performance, in comparison to the m5.xlarge. It was ever-so-slightly slower on the Linux kernel compilation, ever-so-slightly faster on the MariaDB tests, and notably faster on the Redis tests.

Then we factor in the price. The m6g.xlarge It’s 21.7% cheaper in terms of On Demand pricing!

I would definitely be looking to move any appropriate Linux workloads to the Graviton2 setup. The price / performance ratio is stellar here. I can see these processors becoming increasingly popular as ‘word gets out’ on just how good the ratio is!

I hope you have found this journey into some very simple benchmarks useful. I’d be very interested to hear on any workloads you are considering moving to Graviton – perhaps you’ve already moved them!