New – Parallel Query for Amazon Aurora

Amazon Aurora is a relational database that was designed to take full advantage of the abundance of networking, processing, and storage resources available in the cloud. While maintaining compatibility with MySQL and PostgreSQL on the user-visible side, Aurora makes use of a modern, purpose-built distributed storage system under the covers. Your data is striped across hundreds of storage nodes distributed over three distinct AWS Availability Zones, with two copies per zone, on fast SSD storage. Here’s what this looks like (extracted from Getting Started with Amazon Aurora):

New Parallel Query
When we launched Aurora we also hinted at our plans to apply the same scale-out design principle to other layers of the database stack. Today I would like to tell you about our next step along that path.

Each node in the storage layer pictured above also includes plenty of processing power. Aurora is now able to make great use of that processing power by taking your analytical queries (generally those that process all or a large part of a good-sized table) and running them in parallel across hundreds or thousands of storage nodes, with speed benefits approaching two orders of magnitude. Because this new model reduces network, CPU, and buffer pool contention, you can run a mix of analytical and transactional queries simultaneously on the same table while maintaining high throughput for both types of queries.

The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions

You can use the aurora_pq parameter to enable and disable the use of parallel queries at the global and the session level.

Parallel queries enhance the performance of over 200 types of single-table predicates and hash joins. The Aurora query optimizer will automatically decide whether to use Parallel Query based on the size of the table and the amount of table data that is already in memory; you can also use the aurora_pq_force session variable to override the optimizer for testing purposes.

Parallel Query in Action
You will need to create a fresh cluster in order to make use of the Parallel Query feature. You can create one from scratch, or you can restore a snapshot.

To create a cluster that supports Parallel Query, I simply choose Provisioned with Aurora parallel query enabled as the Capacity type:

I used the CLI to restore a 100 GB snapshot for testing, and then explored one of the queries from the TPC-H benchmark. Here’s the basic query:

  SUM(l_extendedprice * (1-l_discount)) AS revenue,

FROM customer, orders, lineitem

  AND c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND o_orderdate < date '1995-03-13'
  AND l_shipdate > date '1995-03-13'


  revenue DESC,
  o_orderdate LIMIT 15;

The EXPLAIN command shows the query plan, including the use of Parallel Query:

| id | select_type | table    | type | possible_keys                 | key  | key_len | ref  | rows      | Extra                                                                                                                          |
|  1 | SIMPLE      | customer | ALL  | PRIMARY                       | NULL | NULL    | NULL |  14354602 | Using where; Using temporary; Using filesort                                                                                   |
|  1 | SIMPLE      | orders   | ALL  | PRIMARY,o_custkey,o_orderdate | NULL | NULL    | NULL | 154545408 | Using where; Using join buffer (Hash Join Outer table orders); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)   |
|  1 | SIMPLE      | lineitem | ALL  | PRIMARY,l_shipdate            | NULL | NULL    | NULL | 606119300 | Using where; Using join buffer (Hash Join Outer table lineitem); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra) |
3 rows in set (0.01 sec)

Here is the relevant part of the Extras column:

Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)

The query runs in less than 2 minutes when Parallel Query is used:

| l_orderkey | revenue     | o_orderdate | o_shippriority |
|   92511430 | 514726.4896 | 1995-03-06  |              0 |
|  593851010 | 475390.6058 | 1994-12-21  |              0 |
|  188390981 | 458617.4703 | 1995-03-11  |              0 |
|  241099140 | 457910.6038 | 1995-03-12  |              0 |
|  520521156 | 457157.6905 | 1995-03-07  |              0 |
|  160196293 | 456996.1155 | 1995-02-13  |              0 |
|  324814597 | 456802.9011 | 1995-03-12  |              0 |
|   81011334 | 455300.0146 | 1995-03-07  |              0 |
|   88281862 | 454961.1142 | 1995-03-03  |              0 |
|   28840519 | 454748.2485 | 1995-03-08  |              0 |
|  113920609 | 453897.2223 | 1995-02-06  |              0 |
|  377389669 | 453438.2989 | 1995-03-07  |              0 |
|  367200517 | 453067.7130 | 1995-02-26  |              0 |
|  232404000 | 452010.6506 | 1995-03-08  |              0 |
|   16384100 | 450935.1906 | 1995-03-02  |              0 |
15 rows in set (1 min 53.36 sec)

I can disable Parallel Query for the session (I can use an RDS custom cluster parameter group for a longer-lasting effect):

set SESSION aurora_pq=OFF;

The query runs considerably slower without it:

| l_orderkey | o_orderdate | revenue     | o_shippriority |
|   92511430 | 1995-03-06  | 514726.4896 |              0 |
|   16384100 | 1995-03-02  | 450935.1906 |              0 |
15 rows in set (1 hour 25 min 51.89 sec)

This was on a db.r4.2xlarge instance; other instance sizes, data sets, access patterns, and queries will perform differently. I can also override the query optimizer and insist on the use of Parallel Query for testing purposes:

set SESSION aurora_pq_force=ON;

Things to Know
Here are a couple of things to keep in mind when you start to explore Amazon Aurora Parallel Query:

Engine Support – We are launching with support for MySQL 5.6, and are working on support for MySQL 5.7 and PostgreSQL.

Table Formats – The table row format must be COMPACT; partitioned tables are not supported.

Data Types – The TEXT, BLOB, and GEOMETRY data types are not supported.

DDL – The table cannot have any pending fast online DDL operations.

Cost – You can make use of Parallel Query at no extra charge. However, because it makes direct access to storage, there is a possibility that your IO cost will increase.

Give it a Shot
This feature is available now and you can start using it today!



AWS Data Transfer Price Reductions – Up to 34% (Japan) and 28% (Australia)

I’ve got good good news for AWS customers who make use of our Asia Pacific (Tokyo) and Asia Pacific (Sydney) Regions. Effective September 1, 2018 we are reducing prices for data transfer from Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), and Amazon CloudFront by up to 34% in Japan and 28% in Australia.

EC2 and S3 Data Transfer
Here are the new prices for data transfer from EC2 and S3 to the Internet:

EC2 & S3 Data Transfer Out to Internet Japan Australia
Old Rate New Rate Change Old Rate New Rate Change
Up to 1 GB / Month $0.000 $0.000 0% $0.000 $0.000 0%
Next 9.999 TB / Month $0.140 $0.114 -19% $0.140 $0.114 -19%
Next 40 TB / Month $0.135 $0.089 -34% $0.135 $0.098 -27%
Next 100 TB / Month $0.130 $0.086 -34% $0.130 $0.094 -28%
Greater than 150 TB / Month $0.120 $0.084 -30% $0.120 $0.092 -23%

You can consult the EC2 Pricing and S3 Pricing pages for more information.

CloudFront Data Transfer
Here are the new prices for data transfer from CloudFront edge nodes to the Internet

CloudFront Data Transfer Out to Internet Japan Australia
Old Rate New Rate Change Old Rate New Rate Change
Up to 10 TB / Month $0.140 $0.114 -19% $0.140 $0.114 -19%
Next 40 TB / Month $0.135 $0.089 -34% $0.135 $0.098 -27%
Next 100 TB / Month $0.120 $0.086 -28% $0.120 $0.094 -22%
Next 350 TB / Month $0.100 $0.084 -16% $0.100 $0.092 -8%
Next 524 TB / Month $0.080 $0.080 0% $0.095 $0.090 -5%
Next 4 PB / Month $0.070 $0.070 0% $0.090 $0.085 -6%
Over 5 PB / Month $0.060 $0.060 0% $0.085 $0.080 -6%

Visit the CloudFront Pricing page for more information.

We have also reduced the price of data transfer from CloudFront to your Origin. The price for CloudFront Data Transfer to Origin from edge locations in Australia has been reduced 20% to $0.080 per GB. This represents content uploads via POST and PUT.

Things to Know
Here are a couple of interesting things that you should know about AWS and data transfer:

AWS Free Tier – You can use the AWS Free Tier to get started with, and to learn more about, EC2, S3, CloudFront, and many other AWS services. The AWS Getting Started page contains lots of resources to help you with your first project.

Data Transfer from AWS Origins to CloudFront – There is no charge for data transfers from an AWS origin (S3, EC2, Elastic Load Balancing, and so forth) to any CloudFront edge location.

CloudFront Reserved Capacity Pricing – If you routinely use CloudFront to deliver 10 TB or more content per month, you should investigate our Reserved Capacity pricing. You can receive a significant discount by committing to transfer 10 TB or more content from a single region, with additional discounts at higher levels of usage. To learn more or to sign up, simply Contact Us.



New – AWS Storage Gateway Hardware Appliance

AWS Storage Gateway connects your on-premises applications to AWS storage services such as Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS), and Amazon Glacier. It runs in your existing virtualized environment and is visible to your applications and your client operating systems as a file share, a local block volume, or a virtual tape library. The resulting hybrid storage model gives our customers the ability to use their AWS Storage Gateways for backup, archiving, disaster recovery, cloud data processing, storage tiering, and migration.

New Hardware Appliance
Today we are making Storage Gateway available as a hardware appliance, adding to the existing support for VMware ESXi, Microsoft Hyper-V, and Amazon EC2. This means that you can now make use of Storage Gateway in situations where you do not have a virtualized environment, server-class hardware or IT staff with the specialized skills that are needed to manage them. You can order appliances from for delivery to branch offices, warehouses, and “outpost” offices that lack dedicated IT resources. Setup (as you will see in a minute) is quick and easy, and gives you access to three storage solutions:

File Gateway – A file interface to Amazon S3, accessible via NFS or SMB. The files are stored as S3 objects, allowing you to make use of specialized S3 features such as lifecycle management and cross-region replication. You can trigger AWS Lambda functions, run Amazon Athena queries, and use Amazon Macie to discover and classify sensitive data.

Volume Gateway – Cloud-backed storage volumes, accessible as local iSCSI volumes. Gateways can be configured to cache frequently accessed data locally, or to store a full copy of all data locally. You can create EBS snapshots of the volumes and use them for disaster recovery or data migration.

Tape Gateway – A cloud-based virtual tape library (VTL), accessible via iSCSI, so you can replace your on-premises tape infrastructure, without changing your backup workflow.

To learn more about each of these solutions, read What is AWS Storage Gateway.

The AWS Storage Gateway Hardware Appliance is based on a specially configured Dell EMC PowerEdge R640 Rack Server that is pre-loaded with AWS Storage Gateway software. It has 2 Intel® Xeon® processors, 128 GB of memory, 6 TB of usable SSD storage for your locally cached data, redundant power supplies, and you can order one from

If you have an Amazon Business account (they’re free) you can use a purchase order for the transaction. In addition to simplifying deployment, using this standardized configuration helps to assure consistent performance for your local applications.

Hardware Setup
As you know, I like to go hands-on with new AWS products. My colleagues shipped a pre-release appliance to me; I left it under the watchful guide of my CSO (Canine Security Officer) until I was ready to write this post:

I don’t have a server room or a rack, so I set it up on my hobby table for testing:

In addition to the appliance, I also scrounged up a VGA cable, a USB keyboard, a small monitor, and a power adapter (C13 to NEMA 5-15). The adapter is necessary because the cord included with the appliance is intended to plug in a power distribution jack commonly found in a data center. I connected it all up, turned it on and watched it boot up, then entered a new administrative password.

Following the directions in the documentation, I configured an IPV4 address, using DHCP for convenience:

I captured the IP address for use in the next step, selected Back (the UI is keyboard-driven) and then logged out. This is the only step that takes place on the local console.

Gateway Configuration
At this point I will switch from past to present, and walk you through the configuration process. As directed by the Getting Started Guide, I open the Storage Gateway Console on the same network as the appliance, select the region where I want to create my gateway, and click Get started:

I select File gateway and click Next to proceed:

I select Hardware Appliance as my host platform (I can click Buy on Amazon to purchase one if necessary), and click Next:

Then I enter the IP address of my appliance and click Connect:

I enter a name for my gateway (jbgw1), set the time zone, pick ZFS as my RAID Volume Manager, and click Activate to proceed:

My gateway is activated within a second or two and I can see it in the Hardware section of the console:

At this point I am free to use a console that is not on the same network, so I’ll switch back to my trusty WorkSpace!

Now that my hardware has been activated, I can launch the actual gateway service on it. I select the appliance, and choose Launch Gateway from the Actions menu:

I choose the desired gateway type, enter a name (fgw1) for it, and click Launch gateway:

The gateway will start off in the Offline status, and transition to Online within 3 to 5 minutes. The next step is to allocate local storage by clicking Edit local disks:

Since I am creating a file gateway, all of the local storage is used for caching:

Now I can create a file share on my appliance! I click Create file share, enter the name of an existing S3 bucket, and choose NFS or SMB, then click Next:

I configure a couple of S3 options, request creation of a new IAM role, and click Next:

I review all of my choices and click Create file share:

After I create the share I can see the commands that are used to mount it in each client environment:

I mount the share on my Ubuntu desktop (I had to install the nfs-client package first) and copy a bunch of files to it:

Then I visit the S3 bucket and see that the gateway has already uploaded the files:

Finally, I have the option to change the configuration of my appliance. After making sure that all network clients have unmounted the file share, I remove the existing gateway:

And launch a new one:

And there you have it. I installed and configured the appliance, created a file share that was accessible from my on-premises systems, and then copied files to it for replication to the cloud.

Now Available
The Storage Gateway Hardware Appliance is available now and you can purchase one today. Start in the AWS Storage Gateway Console and follow the steps above!




New – AWS Systems Manager Session Manager for Shell Access to EC2 Instances

It is a very interesting time to be a corporate IT administrator. On the one hand, developers are talking about (and implementing) an idyllic future where infrastructure as code, and treating servers and other resources as cattle. On the other hand, legacy systems still must be treated as pets, set up and maintained by hand or with the aid of limited automation. Many of the customers that I speak with are making the transition to the future at a rapid pace, but need to work in the world that exists today. For example, they still need shell-level access to their servers on occasion. They might need to kill runaway processes, consult server logs, fine-tune configurations, or install temporary patches, all while maintaining a strong security profile. They want to avoid the hassle that comes with running Bastion hosts and the risks that arise when opening up inbound SSH ports on the instances.

We’ve already addressed some of the need for shell-level access with the AWS Systems Manager Run Command. This AWS facility gives administrators secure access to EC2 instances. It allows them to create command documents and run them on any desired set of EC2 instances, with support for both Linux and Microsoft Windows. The commands are run asynchronously, with output captured for review.

New Session Manager
Today we are adding a new option for shell-level access. The new Session Manager makes the AWS Systems Manager even more powerful. You can now use a new browser-based interactive shell and a command-line interface (CLI) to manage your Windows and Linux instances. Here’s what you get:

Secure Access – You don’t have to manually set up user accounts, passwords, or SSH keys on the instances and you don’t have to open up any inbound ports. Session Manager communicates with the instances via the SSM Agent across an encrypted tunnel that originates on the instance, and does not require a bastion host.

Access Control – You use IAM policies and users to control access to your instances, and don’t need to distribute SSH keys. You can limit access to a desired time/maintenance window by using IAM’s Date Condition Operators.

Auditability – Commands and responses can be logged to Amazon CloudWatch and to an S3 bucket. You can arrange to receive an SNS notification when a new session is started.

Interactivity – Commands are executed synchronously in a full interactive bash (Linux) or PowerShell (Windows) environment

Programming and Scripting – In addition to the console access that I will show you in a moment, you can also initiate sessions from the command line (aws ssm ...) or via the Session Manager APIs.

The SSM Agent running on the EC2 instances must be able to connect to Session Manager’s public endpoint. You can also set up a PrivateLink connection to allow instances running in private VPCs (without Internet access or a public IP address) to connect to Session Manager.

Session Manager in Action
In order to use Session Manager to access my EC2 instances, the instances must be running the latest version (2.3.12 or above) of the SSM Agent. The instance role for the instances must reference a policy that allows access to the appropriate services; you can create your own or use AmazonEC2RoleForSSM. Here are my EC2 instances (sk1 and sk2 are running Amazon Linux; sk3-win and sk4-win are running Microsoft Windows):

Before I run my first command, I open AWS Systems Manager and click Preferences. Since I want to log my commands, I enter the name of my S3 bucket and my CloudWatch log group. If I enter either or both values, the instance policy must also grant access to them:

I’m ready to roll! I click Sessions, see that I have no active sessions, and click Start session to move ahead:

I select a Linux instance (sk1), and click Start session again:

The session opens up immediately:

I can do the same for one of my Windows instances:

The log streams are visible in CloudWatch:

Each stream contains the content of a single session:

In the Works
As usual, we have some additional features in the works for Session Manager. Here’s a sneak peek:

SSH Client – You will be able to create SSH sessions atop Session Manager without opening up any inbound ports.

On-Premises Access – We plan to give you the ability to access your on-premises instances (which must be running the SSM Agent) via Session Manager.

Available Now
Session Manager is available in all AWS regions (including AWS GovCloud) at no extra charge.


AWS – Ready for the Next Storm

As I have shared in the past (AWS – Ready to Weather the Storm) we take extensive precautions to help ensure that AWS will remain operational in the face of hurricanes, storms, and other natural disasters. With Hurricane Florence heading for the east coast of the United States, I thought it would be a good time to review and update some of the most important points from that post. Here’s what I want you to know:

Availability Zones – We replicate critical components of AWS across multiple Availability Zones to ensure high availability. Common points of failure, such as generators, UPS units, and air conditioning, are not shared across Availability Zones. Electrical power systems are designed to be fully redundant and can be maintained without impacting operations. The AWS Well-Architected Framework provides guidance on the proper use of multiple Availability Zones to build applications that are reliable and resilient, as does the Building Fault-Tolerant Applications on AWS whitepaper.

Contingency Planning – We maintain contingency plans and regularly rehearse our responses. We maintain a series of incident response plans and update them regularly to incorporate lessons learned and to prepare for emerging threats. In the days leading up to a known event such as a hurricane, we increase fuel supplies, update staffing plans, and add provisions to ensure the health and safety of our support teams.

Data Transfer – With a storage capacity of 100 TB per device, AWS Snowball Edge appliances can be used to quickly move large amounts of data to the cloud.

Disaster Response – When call volumes spike before, during, or after a disaster, Amazon Connect can supplement your existing call center resources and allow you to provide a better response.

Support – You can contact AWS Support if you are in need of assistance with any of these issues.




Learn about AWS Services and Solutions – September AWS Online Tech Talks

AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. Join us this month to learn about AWS services and solutions. We’ll have experts online to help answer any questions you may have.

Featured this month is our first ever fireside chat discussion. Join Debanjan Saha, General Manager of Amazon Aurora and Amazon RDS, to learn how customers are using our relational database services and leveraging database innovations.

Register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:


September 24, 2018 | 09:00 AM – 09:45 AM PT – Accelerating Product Development with HPC on AWS – Learn how you can accelerate product development by harnessing the power of high performance computing on AWS.

September 26, 2018 | 09:00 AM – 10:00 AM PT – Introducing New Amazon EC2 T3 Instances – General Purpose Burstable Instances – Learn about new Amazon EC2 T3 instance types and how they can be used for various use cases to lower infrastructure costs.

September 27, 2018 | 09:00 AM – 09:45 AM PT – Hybrid Cloud Customer Use Cases on AWS: Part 2 – Learn about popular hybrid cloud customer use cases on AWS.


September 19, 2018 | 11:00 AM – 11:45 AM PT – How Talroo Used AWS Fargate to Improve their Application Scaling – Learn how Talroo, a data-driven solution for talent and jobs, migrated their applications to AWS Fargate so they can run their application without worrying about managing infrastructure.

Data Lakes & Analytics

September 17, 2018 | 11:00 AM – 11:45 AM PT – Secure Your Amazon Elasticsearch Service Domain – Learn about the multi-level security controls provided by Amazon Elasticsearch Service (Amazon ES) and how to set the security for your Amazon ES domain to prevent unauthorized data access.

September 20, 2018 | 11:00 AM – 12:00 PM PT – New Innovations from Amazon Kinesis for Real-Time Analytics – Learn about the new innovations from Amazon Kinesis for real-time analytics.


September 17, 2018 | 01:00 PM – 02:00 PM PT – Applied Live Migration to DynamoDB from Cassandra – Learn how to migrate a live Cassandra-based application to DynamoDB.

September 18, 2018 | 11:00 AM – 11:45 AM PT – Scaling Your Redis Workloads with Redis Cluster – Learn how Redis cluster with Amazon ElastiCache provides scalability and availability for enterprise workloads.

**Featured: September 20, 2018 | 09:00 AM – 09:45 AM PT – Fireside Chat: Relational Database Innovation at AWS – Join our fireside chat with Debanjan Saha, GM, Amazon Aurora and Amazon RDS to learn how customers are using our relational database services and leveraging database innovations.


September 19, 2018 | 09:00 AM – 10:00 AM PT – Serverless Application Debugging and Delivery – Learn how to bring traditional best practices to serverless application debugging and delivery.

Enterprise & Hybrid

September 26, 2018 | 11:00 AM – 12:00 PM PT – Transforming Product Development with the Cloud – Learn how to transform your development practices with the cloud.

September 27, 2018 | 11:00 AM – 12:00 PM PT – Fueling High Performance Computing (HPC) on AWS with GPUs – Learn how you can accelerate time-to-results for your HPC applications by harnessing the power of GPU-based compute instances on AWS.


September 24, 2018 | 01:00 PM – 01:45 PM PT – Manage Security of Your IoT Devices with AWS IoT Device Defender – Learn how AWS IoT Device Defender can help you manage the security of IoT devices.

September 26, 2018 | 01:00 PM – 02:00 PM PT – Over-the-Air Updates with Amazon FreeRTOS – Learn how to execute over-the-air updates on connected microcontroller-based devices with Amazon FreeRTOS.

Machine Learning

September 17, 2018 | 09:00 AM – 09:45 AM PT – Build Intelligent Applications with Machine Learning on AWS – Learn how to accelerate development of AI applications using machine learning on AWS.

September 18, 2018 | 09:00 AM – 09:45 AM PT – How to Integrate Natural Language Processing and Elasticsearch for Better Analytics – Learn how to process, analyze and visualize data by pairing Amazon Comprehend with Amazon Elasticsearch.

September 20, 2018 | 01:00 PM – 01:45 PM PT – Build, Train and Deploy Machine Learning Models on AWS with Amazon SageMaker – Dive deep into building, training, & deploying machine learning models quickly and easily using Amazon SageMaker.

Management Tools

September 19, 2018 | 01:00 PM – 02:00 PM PT – Automated Windows and Linux Patching – Learn how AWS Systems Manager can help reduce data breach risks across your environment through automated patching.


September 12, 2018 | 08:00 AM – 08:30 AM PT – Episode 5: Deep Dive with Our Community Heroes and Jeff Barr – Get the insider secrets with top recommendations and tips for re:Invent 2018 from AWS community experts.

Security, Identity, & Compliance

September 24, 2018 | 11:00 AM – 12:00 PM PT – Enhanced Security Analytics Using AWS WAF Full Logging – Learn how to use AWS WAF security incidence logs to detect threats.

September 27, 2018 | 01:00 PM – 02:00 PM PT – Threat Response Scenarios Using Amazon GuardDuty – Discover methods for operationalizing your threat detection using Amazon GuardDuty.


September 18, 2018 | 01:00 PM – 02:00 PM PT – Best Practices for Building Enterprise Grade APIs with Amazon API Gateway – Learn best practices for building and operating enterprise-grade APIs with Amazon API Gateway.


September 25, 2018 | 09:00 AM – 10:00 AM PT – Ditch Your NAS! Move to Amazon EFS – Learn how to move your on-premises file storage to Amazon EFS.

September 25, 2018 | 11:00 AM – 12:00 PM PT – Deep Dive on Amazon Elastic File System (EFS): Scalable, Reliable, and Elastic File Storage for the AWS Cloud – Get live demos and learn tips & tricks for optimizing your file storage on EFS.

September 25, 2018 | 01:00 PM – 01:45 PM PT – Integrating File Services to Power Your Media & Entertainment Workloads – Learn how AWS file services deliver high performance shared file storage for media & entertainment workflows.

Amazon AppStream 2.0 – New Application Settings Persistence and a Quick Launch Recap

Amazon AppStream 2.0 gives you access to Windows desktop applications through a web browser. Thousands of AWS customers, including SOLIDWORKS, Siemens, and MathWorks are already using AppStream 2.0 to deliver applications to their customers.

Today I would like to bring you up to date on some recent additions to AppStream 2.0, wrapping up with a closer look at a brand new feature that will automatically save application customizations (preferences, bookmarks, toolbar settings, connection profiles, and the like) and Windows settings between your sessions.

The recent additions to AppStream 2.0 can be divided into four categories:

User Enhancements – Support for time zone, locale, and language input, better copy/paste, and the new application persistence feature.

Admin Improvements – The ability to configure default application settings, control access to some system resources, copy images across AWS regions, establish custom branding, and share images between AWS accounts.

Storage Integration – Support for Microsoft OneDrive for Business and Google Drive for G Suite.

Regional Expansion – AppStream 2.0 recently became available in three additional AWS regions in Europe and Asia.

Let’s take a look at each item and then at application settings persistence….

User Enhancements
In June we gave AppStream 2.0 users control over the time zone, locale, and input methods. Once set, the values apply to future sessions in the same AWS region. This feature (formally known as Regional Settings) must be enabled by the AppStream 2.0 administrator as detailed in Enable Regional Settings for Your AppStream 2.0 Users.

In July we added keyboard shortcuts for copy/paste between your local device and your AppStream 2.0 sessions when using Google Chrome.

Admin Improvements
In February we gave AppStream 2.0 administrators the ability to copy AppStream 2.0 images to other AWS regions, simplifying the process of creating and managing global application deployments (to learn more, visit Tag and Copy an Image):

In March we gave AppStream 2.0 administrators additional control over the user experience, including the ability to customize the logo, color, text, and help links in the application catalog page. Read Add Your Custom Branding to AppStream 2.0 to learn more.

In May we added administrative control over the data that moves to and from the AppStream 2.0 streaming sessions. AppStream 2.0 administrators can control access to file upload, file download, printing, and copy/paste to and from local applications. Read Create AppStream 2.0 Fleets and Stacks to learn more.

In June we gave AppStream 2.0 administrators the power to configure default application settings (connection profiles, browser settings, and plugins) on behalf of their users. Read Enabling Default OS and Application Settings for Your Users to learn more.

In July we gave AppStream 2.0 administrators the ability to share AppStream 2.0 images between AWS accounts for use in the same AWS Region. To learn more, take a look at the UpdateImagePermissions API and the update-image-permissions command.

Storage Integration
Both of these launches provide AppStream 2.0 users with additional storage options for the documents that they access, edit, and create:

Launched in June, the Google Drive for G Suite support allows users to access files on a Google Drive from inside of their applications. Read Google Drive for G Suite is now enabled on Amazon AppStream 2.0 to learn how to enable this feature for an AppStream application stack.

Similiarly, the Microsoft OneDrive for Business Support that was launched in July allows users to access files stored in OneDrive for Business accounts. Read Amazon AppStream 2.0 adds support for OneDrive for Business to learn how to set this up.


Regional Expansion
In January we made AppStream 2.0 available in the Asia Pacific (Singapore) and Asia Pacific (Sydney) Regions.

In March we made AppStream 2.0 available in the Europe (Frankfurt) Region.

See the AWS Region Table for the full list of regions where AppStream 2.0 is available.

Application Settings Persistence
With the past out of the way, let’s take a look at today’s new feature, Application Settings Persistence!

As you can see from the launch recap above, AppStream 2.0 already saves several important application and system settings between sessions. Today we are adding support for the elements that make up the Windows Roaming Profile. This includes:

Windows Profile – The contents of C:usersuser_nameappdata .

Windows Profile Folder – The contents of C:usersuser_name .

Windows Registry – The tree of registry entries rooted at HKEY_CURRENT_USER .

This feature must be enabled by the AppStream 2.0 administrator. The contents of the Windows Roaming Profile are stored in an S3 bucket in the administrator’s AWS account, with an initial storage allowance (easily increased) of up to 1 GB per user. The S3 bucket is configured for Server Side Encryption with keys managed by S3. Data moves between AppStream 2.0 and S3 across a connection that is protected by SSL. The administrator can choose to enable S3 versioning to allow recovery from a corrupted profile.

Application Settings Persistence can be enabled for an existing stack, as long as it is running the latest version of the AppStream 2.0 Agent. Here’s how it is enabled when creating a new stack:

Putting multiple stacks in the same settings group allows them to share a common set of user settings. The settings are applied when the user logs in, and then persisted back to S3 when they log out.

This feature is available now and AppStream 2.0 administrators can enable it today. The only cost is for the S3 storage consumed by the stored profiles, charged at the usual S3 prices.


PS – Follow the AWS Desktop and Application Streaming Blog to make sure that you know about new features as quickly as possible.


AWS X-Ray Now Supports Amazon API Gateway and New Sampling Rules API

My colleague Jeff first introduced us to AWS X-Ray almost 2 years ago in his post from AWS re:Invent. If you’re not already aware, AWS X-Ray helps developers analyze and debug everything from simple web apps to large and complex distributed microservices, both in production and in development. Since X-Ray became generally available in 2017, we’ve iterated rapidly on customer feedback and continued to make enhancements to the service like encryption with AWS Key Management Service (KMS), new SDKs and language support (Python!), open sourcing the daemon, and latency visualization tools. Today, we’re adding two new features:

    • Support for Amazon API Gateway, making it easier to trace and analyze requests as they travel through your APIs to the underlying services.
    • We also recently launched support for controlling sampling rules in the AWS X-Ray console and API.

Let me show you how to enable tracing for an API.

Enabling X-Ray Tracing

I’ll start with a simple API deployed to API Gateway. I’ll add two endpoints. One to push records into Amazon Kinesis Data Streams and one to invoke a simple AWS Lambda function. It looks something like this:

After deploying my API, I can go to the Stages sub console, and select a specific stage, like “dev” or “production”. From there, I can enable X-Ray tracing by navigating to the Logs/Tracing tab, selecting Enable X-Ray Tracing and clicking Save Changes.

After tracing is enabled, I can hop over to the X-Ray console to look at my sampling rules in the new Sampling interface.

I can modify the rules in the console and, of course, with the CLI, SDKs, or API. Let’s take a brief interlude to talk about sampling rules.

Sampling Rules
The sampling rules allow me to customize, at a very granular level, the requests and traces I want to record. This allows me to control the amount of data that I record on-the-fly, across code running anywhere (AWS Lambda, Amazon ECS, Amazon Elastic Compute Cloud (EC2), or even on-prem) – all without having to rewrite any code or redeploy an application. The default rule that is pictured above states that it will record the first request each second, and five percent of any additional requests. We talk about that one request each second as the reservoir, which ensures that at least one trace is recorded each second. The five percent of additional requests is what we refer to as the fixed rate. Both the reservoir and the fixed rate are configurable. If I set the reservoir size to 50 and the fixed rate to 10%, then if 100 requests per second match the rule, the total number of requests sampled is 55 requests per second. Configuring my X-Ray recorders to read sampling rules from the X-Ray service allows the X-Ray service to maintain the sampling rate and reservoir across all of my distributed compute. If I want to enable this functionality, I just install the latest version of the X-Ray SDK and daemon on my instances. At the moment only the GA SDKs are supported with support for Ruby and Go on the way. With services like API Gateway and Lambda, I can configure everything right in the X-Ray console or API. There’s a lot more detail on this feature in the documentation, and I suggest taking the time to check it out.

While I can, of course, use the sampling rules to control costs, the dynamic nature and the granularity of the rules is also extremely powerful for debugging production systems. If I know one particular URL or service is going to need extra monitoring I can specify that as part of the sampling rule. I can filter on individual stages of APIs, service types, service names, hosts, ARNs, HTTP methods, segment attributes, and more. This lets me quickly examine distributed microservices at 30,000 feet, identify issues, adjust some rules, and then dive deep into production requests. I can use this to develop insights about problems occurring in the 99th percentile of my traffic and deliver a better overall customer experience. I remember building and deploying a lot of ad-hoc instrumentation over the years, at various companies, to try to support something like this, and I don’t think I was ever particularly successful. Now that I can just deploy X-Ray and adjust sampling rules centrally, it feels like I have a debugging crystal ball. I really wish I’d had this tool 5 years ago.

Ok, enough reminiscing, let’s hop back to the walkthrough.

I’ll stick with the default sampling rule for now. Since we’ve enabled tracing and I’ve got some requests running, after about 30 seconds I can refresh my service map and look at the results. I can click on any node to view the traces directly or drop into the Traces sub console to look at all of the traces.

From there, I can see the individual URLs being triggered, the source IPs, and various other useful metrics.

If I want to dive deeper, I can write some filtering rules in the search bar and find a particular trace. An API Gateway segment has a few useful annotations that I can use to filter and group like the API ID and stage. This is what a typical API Gateway trace might look like.

Adding API Gateway support to X-Ray gives us end-to-end production traceability in serverless environments and sampling rules give us the ability to adjust our tracing in real time without redeploying any code. I had the pleasure of speaking with Ashley Sole from Skyscanner, about how they use AWS X-Ray at the AWS Summit in London last year, and these were both features he asked me about earlier that day. I hope this release makes it easier for Ashley and other developers to debug and analyze their production applications.

Available Now

Support for both of these features is available, today, in all public regions that have both API Gateway and X-Ray. In fact, X-Ray launched their new console and API last week so you may have already seen it! You can start using it right now. As always, let us know what you think on Twitter or in the comments below.


Extending AWS CloudFormation with AWS Lambda Powered Macros

Today I’m really excited to show you a powerful new feature of AWS CloudFormation called Macros. CloudFormation Macros allow developers to extend the native syntax of CloudFormation templates by calling out to AWS Lambda powered transformations. This is the same technology that powers the popular Serverless Application Model functionality but the transforms run in your own accounts, on your own lambda functions, and they’re completely customizable. CloudFormation, if you’re new to AWS, is an absolutely essential tool for modeling and defining your infrastructure as code (YAML or JSON). It is a core building block for all of AWS and many of our services depend on it.

There are two major steps for using macros. First, we need to define a macro, which of course, we do with a CloudFormation template. Second, to use the created macro in our template we need to add it as a transform for the entire template or call it directly. Throughout this post, I use the term macro and transform somewhat interchangeably. Ready to see how this works?

Creating a CloudFormation Macro

Creating a macro has two components: a definition and an implementation. To create the definition of a macro we create a CloudFormation resource of a type AWS::CloudFormation::Macro, that outlines which Lambda function to use and what the macro should be called.

Type: "AWS::CloudFormation::Macro"
  Description: String
  FunctionName: String
  LogGroupName: String
  LogRoleARN: String
  Name: String

The Name of the macro must be unique throughout the region and the Lambda function referenced by FunctionName must be in the same region the macro is being created in. When you execute the macro template, it will make that macro available for other templates to use. The implementation of the macro is fulfilled by a Lambda function. Macros can be in their own templates or grouped with others, but you won’t be able to use a macro in the same template you’re registering it in. The Lambda function receives a JSON payload that looks like something like this:

    "region": "us-east-1",
    "accountId": "$ACCOUNT_ID",
    "fragment": { ... },
    "transformId": "$TRANSFORM_ID",
    "params": { ... },
    "requestId": "$REQUEST_ID",
    "templateParameterValues": { ... }

The fragment portion of the payload contains either the entire template or the relevant fragments of the template – depending on how the transform is invoked from the calling template. The fragment will always be in JSON, even if the template is in YAML.

The Lambda function is expected to return a simple JSON response:

    "requestId": "$REQUEST_ID",
    "status": "success",
    "fragment": { ... }

The requestId needs to be the same as the one received in the input payload, and if status contains any value other than success (case-insensitive) then the changeset will fail to create. Now, fragment must contain the valid CloudFormation JSON of the transformed template. Even if your function performed no action it would still need to return the fragment for it to be included in the final template.

Using CloudFormation Macros

To use the macro we simply call out to Fn::Transform with the required parameters. If we want to have a macro parse the whole template we can include it in our list of transforms in the template the same way we would with SAM: Transform: [Echo]. When we go to execute this template the transforms will be collected into a changeset, by calling out to each macro’s specified function and returning the final template.

Let’s imagine we have a dummy Lambda function called EchoFunction, it just logs the data passed into it and returns the fragments unchanged. We define the macro as a normal CloudFormation resource, like this:

  Type: "AWS::CloudFormation::Macro"
    FunctionName: arn:aws:lambda:us-east-1:1234567:function:EchoFunction
	Name: EchoMacro

The code for the lambda function could be as simple as this:

def lambda_handler(event, context):
    return {
        "requestId": event['requestId'],
        "status": "success",
        "fragment": event["fragment"]

Then, after deploying this function and executing the macro template, we can invoke the macro in a transform at the top level of any other template like this:

AWSTemplateFormatVersion: 2010-09-09 
 Transform: [EchoMacro, AWS::Serverless-2016-10-31]
      Type: AWS::Serverless::SimpleTable

The CloudFormation service creates a changeset for the template by first calling the Echo macro we defined and then the AWS::Serverless transform. It will execute the macros listed in the transform in the order they’re listed.

We could also invoke the macro using the Fn::Transform intrinsic function which allows us to pass in additional parameters. For example:

AWSTemplateFormatVersion: 2010-09-09
    Type: 'AWS::S3::Bucket'
      Name: EchoMacro
        Key: Value

The inline transform will have access to all of its sibling nodes and all of its children nodes. Transforms are processed from deepest to shallowest which means top-level transforms are executed last. Since I know most of you are going to ask: no you cannot include macros within macros – but nice try.

When you go to execute the CloudFormation template it would simply ask you to create a changeset and you could preview the output before deploying.

Example Macros

We’re launching a number of reference macros to help developers get started and I expect many people will publish others. These four are the winners from a little internal hackathon we had prior to releasing this feature:

Name Description Author
PyPlate Allows you to inline Python in your templates Jay McConnell – Partner SA
ShortHand Defines a short-hand syntax for common cloudformation resources Steve Engledow – Solutions Builder
StackMetrics Adds cloudwatch metrics to stacks Steve Engledow and Jason Gregson – Global SA
String Functions Adds common string functions to your templates Jay McConnell – Partner SA

Here are a few ideas I thought of that might be fun for someone to implement:

If you end up building something cool I’m more than happy to tweet it out!

Available Now

CloudFormation Macros are available today, in all AWS regions that have AWS Lambda. There is no additional CloudFormation charge for Macros meaning you are only billed normal AWS Lambda function charges. The documentation has more information that may be helpful.

This is one of my favorite new features for CloudFormation and I’m excited to see some of the amazing things our customers will build with it. The real power here is that you can extend your existing infrastructure as code with code. The possibilities enabled by this new functionality are virtually unlimited.


Chat with the Alexa Prize Finalists Today

The Alexa Prize is an annual competition designed to spur academic research and development in the field of conversational artificial intelligence. This year, students are working to build socialbots that can engage in a fun, high-quality conversation on popular societal topics for up to 20 minutes. In order to succeed at this task, the teams must innovate in a broad range of areas including knowledge acquisition, natural language understanding, natural language generation, context modeling, common-sense reasoning, and dialog planning. They use the Alexa Skills Kit (ASK) to construct their bot and to receive real-time feedback on its performance.

Last month the socialbots from Heriot-Watt University (Alana), Czech Technical University (Alquist), and UC Davis (Gunrock) were chosen as the finalists (watch the Twitch stream to learn more). The competition was tough, with points assigned for the potential scientific contribution to the field, the technical merit of the approach, the overall novelty of the idea, and the team’s ability to deliver on their vision.

Time to Chat
We’re now ready for the final round.

Step up to your nearest Alexa-powered device and say “Alexa, let’s chat!” You will be connected to one of the three socialbots (chosen at random) and can converse with it for as long as you would like. When you are through, say “Alexa stop,” and rate the socialbot when prompted. You can also provide additional feedback for the team. We’ll announce the winner at AWS re:Invent 2018 in Las Vegas.


PS – If you are ready to build your very own Alexa Skill, check out the Alexa Skills Kit Tutorials and subscribe to the Alexa Blogs.