Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part Two

Previously we learned all about the benefits in placing Ceph storage services directly on compute nodes in a co-located fashion. This time, we dive deep into the deployment templates to see how an actual deployment comes together and then test the results!

Enabling Co-Location

This article assumes the director is installed and configured with nodes already registered. The default Heat deployment templates ship an environment file for enabling Pure HCI. This environment file is:

/usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yaml

This file does two things:

  1. It redefines the composable service list for the Compute role to include both Compute and Ceph Storage services. The parameter for storing this list in ComputeServices.

  2. It enables a port on the Storage Management network for Compute nodes using the OS::TripleO::Compute::Ports::StorageMgmtPort resource. The default network isolation disables this port for standard Compute nodes. For our scenario we must enable this port and its network for the Ceph services to communicate. If you are not using network isolation, you can leave the resource at None to disable the resource.

Updating Network Templates

As mentioned, the Compute nodes need to be attached to the Storage Management network so Red Hat Ceph Storage can access the OSDs on them. This is not usually required in a standard deployment. To ensure the Compute node receives an IP address on the Storage Management network, you need to modify the NIC templates for your  Compute node to include it. As a basic example, the following snippet adds the Storage Management network to the compute node via the OVS bridge supporting multiple VLANs:

    - type: ovs_bridge
     name: br-vlans
     use_dhcp: false
     members:
     - type: interface
       name: nic3
       primary: false
     - type: vlan
       vlan_id:
         get_param: InternalApiNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: InternalApiIpSubnet
     - type: vlan
       vlan_id:
         get_param: StorageNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: StorageIpSubnet
     - type: vlan
       vlan_id:
         get_param: StorageMgmtNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: StorageMgmtIpSubnet
     - type: vlan
       vlan_id:
         get_param: TenantNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: TenantIpSubnet

The blue highlighted section is the additional VLAN interface for the Storage Management network we discussed.

Isolating Resources

We calculate the amount of memory to reserve for the host and Red Hat Ceph Storage services using the formula found in “Reserve CPU and Memory Resources for Compute”. Note that we accommodate for 2 OSDs so that we can potentially scale an extra OSD on the node in the future.

Our total instances:
32GB / (2GB per instance + 0.5GB per instance for host overhead) = ~12 hosts

Total host memory to reserve:
(12 hosts * 0.5 overhead) + (2 OSDs * 3GB) = 12GB or 12000MB

This means our reserved host memory is 12000MB.

We can also define how to isolate the CPU resources in two ways:

  • CPU Allocation Ratio – Estimate the CPU utilization of each instance and set the ratio of instances per CPU while taking into account Ceph service usage. This ensures a certain amount of CPU resources are available for the host and Ceph services. See the ”Reserve CPU and Memory Resources for Compute” documentation for more information on calculating this value.
  • CPU PinningDefine which CPU cores are reserved for instances and use the remaining CPU cores for the host and Ceph services.

This example uses CPU pinning. We are reserving cores 1-7 and 9-15 of our Compute node for our instances. This leaves cores 0 and 8 (both on the same physical core) for the host and Ceph services. This provides one core for the current Ceph OSD and a second core in case we scale the OSDs. Note that we also need to isolate the host to these two cores. This is shown after deploying the overcloud. 

main-board-89050_640

Using the configuration shown, we create an additional environment file that contains the resource isolation parameters defined above:

parameter_defaults:
 NovaReservedHostMemory: 12000
 NovaVcpuPinSet: ['1-7,9-15']

Our example does not use NUMA pinning because our test hardware does not support multiple NUMA nodes. However if you want to pin the Ceph OSDs to a specific NUMA node, you can do so using following “Configure Ceph NUMA Pinning”.

Deploying the configuration …

This example uses the following environment files in the overcloud deployment:

  • /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml – Enables network isolation for the default roles, including the standard Compute role.
  • /home/stack/templates/network.yamlCustom file defining network parameters (see Updating Network Templates). This file also sets the OS::TripleO::Compute::Net::SoftwareConfig resource to use our custom NIC Template containing the additional Storage Management VLAN we added to the Compute nodes above.
  • /usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yamlRedefines the service list for Compute nodes to include the Ceph OSD service. Also adds a Storage Management port for this role. This file is provided with the director’s Heat template collection.
  • /home/stack/templates/hci-resource-isolation.yamlCustom file with specific settings for resource isolation features such as memory reservation and CPU pinning (see Isolating Resources).

The following command deploys an overcloud with one Controller node and one co-located Compute/Storage node:

$ openstack overcloud deploy 
    --templates /usr/share/openstack-tripleo-heat-templates 
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml       -e /home/stack/templates/network.yaml 
    -e /home/stack/templates/storage-environment.yaml 
    -e /usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yaml 
    -e /home/stack/templates/hci-resource-isolation.yaml
    --ntp-server pool.ntp.org

Configuring Host CPU Isolation

As a final step, this scenario requires isolating the host from using the CPU cores reserved for instances. To do this, log into the Compute node and run the following commands:

$ sudo grubby --update-kernel=ALL --args="isolcpus=1,2,3,4,5,6,7,9,10,11,12,13,14,15"
$ sudo grub2-install /dev/sda

This updates the kernel to use the isolcpus parameter, preventing the kernel from using cores reserved for instances. The grub2-install command updates the boot record, which resides on /dev/sda for default locations. If using a custom disk layout for your overcloud nodes, this location might be different.

After setting this parameter, we reboot our Compute node:

$ sudo reboot

Testing

After the Compute node reboots, we can view the hypervisor details to see the isolated resources from the undercloud:

$ source ~/overcloudrc
$ openstack hypervisor show overcloud-compute-0.localdomain -c vcpus
+-------+-------+
| Field | Value |
+-------+-------+
| vcpus | 14    |
+-------+-------+
$ openstack hypervisor show overcloud-compute-0.localdomain -c free_ram_mb
+-------------+-------+
| Field       | Value |
+-------------+-------+
| free_ram_mb | 20543 |
+-------------+-------+

2 of the 16 CPU cores are reserved for the Ceph services and only 20GB out for 32GB is available for the host to use for instance.

So, let’s see if this really worked. To find out, we will run some Browbeat tests against the overcloud. Browbeat is a performance and analysis tool specifically for OpenStack. It allows you to analyse, tune, and automate the entire process.

For our test we have run a set of Browbeat benchmark tests showing the CPU activity for different cores. The following graph displays the activity for a host/Ceph CPU core (Core 0) during one of the tests:

Screen Shot 2017-09-29 at 10.18.42 am

The green line indicates the system processes and the yellow line indicates the user processes. Notice that the CPU core activity peaks during the beginning and end of the test, which is when the disks for the instances were created and deleted respectively. Also notice the CPU core activity is fairly low as a percentage.

The other available host/Ceph CPU core (Core 8) follows a similar pattern:

Screen Shot 2017-09-29 at 10.19.43 am

The peak activity for this CPU core occurs during instance creation and during three periods of high instance activity (the Browbeat tests). Also notice the activity percentages are significantly higher than the activity on Core 0.

Finally, the following is an unused CPU core (Core 2) during the same test:

Screen Shot 2017-09-29 at 10.21.06 am

As expected, the unused CPU core shows no activity during the test. However, if we create more instances and exceed the ratio of allowable instances on Core 1, then these instances would use another CPU core, such as Core 2.

These graphs indicate our resource isolation configuration works and the Ceph services will not overlap with our Compute services, and vice versa.

Conclusion

Co-locating storage on compute nodes provides a simple method to consolidate storage and compute resources. This can help when you want to maximize the hardware of each node and consolidate your overcloud. By adding tuning and resource isolation you can allocate dedicated resources to both storage and compute services, preventing both from starving each other of CPU and memory. And by doing this via Red Hat OpenStack Platform director and Red Hat Ceph Storage, you have a solution that is easy to deploy and maintain!

Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part One

An exciting new feature in Red Hat OpenStack Platform 11 is full Red Hat OpenStack Platform director support for deploying Red Hat Ceph storage directly on your overcloud compute nodes. Often called hyperconverged, or HCI (for Hyperconverged Infrastructure), this deployment model places the Red Hat Ceph Storage Object Storage Daemons (OSDs) and storage pools directly on the compute nodes.

Co-locating Red Hat Ceph Storage in this way can significantly reduce both the physical and financial footprint of your deployment without requiring any compromise on storage.

opwithtoolsinside

Red Hat OpenStack Platform director is the deployment and lifecycle management tool for Red Hat OpenStack Platform. With director, operators can deploy and manage OpenStack from within the same convenient and powerful lifecycle tool.

There are two primary ways to deploy this type of storage deployment which we currently refer to as pure HCI and mixed HCI.

In this two-part blog series we are going to focus on the Pure HCI scenario demonstrating how to deploy an overcloud with all compute nodes supporting Ceph. We do this using the Red Hat OpenStack Platform director. In this example we also implement resource isolation so that the Compute and Ceph services have their own dedicated resources and do not conflict with each other. We then show the results in action with a set of Browbeat benchmark tests.

But first …

Before we get into the actual deployment, let’s take a look at some of the benefits around co-locating storage and compute resources.

  • Smaller deployment footprint: When you perform the initial deployment, you co-locate more services together on single nodes, which helps simplify the architecture on fewer physical servers.

  • Easier to plan, cheaper to start out: co-location provides a decent option when your resources are limited. For example, instead of using six nodes, three for Compute and three for Ceph Storage, you can just co-locate the storage and use only three nodes.

  • More efficient capacity usage: You can utilize the same hardware resources for both Compute and Ceph services. For example, the Ceph OSDs and the compute services can take advantage of the same CPU, RAM, and solid-state drive (SSD). Many commodity hardware options provide decent resources that can accommodate both services on the same node.

  • Resource isolation: Red Hat addresses the noisy neighbor effect through resource isolation, which you orchestrate through Red Hat OpenStack Platform director.

However, while co-location realizes many benefits there are some considerations to be aware of with this deployment model. Co-location does not necessarily offer reduced latency in storage I/O. This is due to the distributed nature of Ceph storage: storage data is spread across different OSDs, and OSDs will be spread across several hyper-converged nodes. An instance on one node might need to access storage data from OSDs spread across several other nodes.

The Lab

Now that we fully understand the benefits and considerations for using co-located storage, let’s take a look at a deployment scenario to see it in action. 

computer-893226_640

We have developed a scenario using Red Hat OpenStack Platform 11 that deploys and demonstrates a simple “Pure HCI” environment. Here are the details.

We are using three nodes for simplicity:

  • 1 director node
  • 1 Controller node
  • 1 Compute node (Compute + Ceph)

Each of these nodes are these same specifications:

  • Dell PowerEdge R530
  • Intel Xeon CPU E5-2630 v3 @ 2.40GHz  – This contains 8 cores each with hyper-threading, providing us with a total of 16 cores.
  • 32 GB RAM
  • 278 GB SSD Hard Drive

Of course for production installs you would need a much more detailed architecture; this scenario simply allows us to quickly and easily demonstrate the advantages of co-located storage. 

This scenario follows these resource isolation guidelines:

  • Reserve enough resources for 1 Ceph OSD on the Compute node
  • Reserve enough resources to potentially scale an extra OSD on the same Compute node
  • Plan for instances to use 2GB on average but reserve 0.5GB per instance on the Compute node for overhead.

This scenario uses network isolation using VLANs:

  • Because the default Compute node deployment template shipped with the tripleo-heat-templates do not attach the Storage Management network computes we need to change that. They require a simple modification to accommodate the Storage Management network which is illustrated later.

Now that we have everything ready, we are set to deploy our hyperconverged solution! But you’ll have to wait for next time for that so check back soon to see the deployment in action in Part Two of the series!



Want to find out how Red Hat can help you plan, implement and run your OpenStack environment? Join Red Hat Architects Dave Costakos and Julio Villarreal Pelegrino in “Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud” today.

For full details on architecting your own Red Hat OpenStack Platform deployment check out the official Architecture Guide. And for details about Red Hat OpenStack Platform networking see the detailed Networking Guide.

OpenStack Summit Sydney preview: Red Hat to present at more than 40 sessions

The next OpenStack Summit will take place in Sydney, Australia, November 6-8. And despite the fact that the conference will only run three days instead of the usual four, there will be plenty of opportunities to learn about OpenStack from Red Hat’s thought leaders.

Red Hatters will be presenting or co-presenting at more than 40 breakout sessions, sponsored track sessions, lightning talks, demos, and panel discussions. Just about every OpenStack topic, from various services to NFV solutions to day-2 management to containers integration will be covered.

In addition, as a premier sponsor, we’ll have a large presence in the OpenStack Marketplace. Visit us at booth B1 where you can learn about our products and services, speak to RDO and other community leaders about upstream projects, watch Red Hat product demos from experts, and score some pretty cool swag. There will also be many Red Hat partners with booths throughout the exhibit hall, so you can speak with them about their OpenStack solutions with Red Hat.

Below is a schedule of sessions Red Hatters will be presenting at. Click on each title link to find more information about that session or to add it to your Summit schedule. We’ll add our sponsored track sessions (seven more) in the near future.

If you haven’t registered for OpenStack Summit yet, feel free to use our discount for 10% off of your registration price. Just use the code: REDHAT10.

Hope to see you there!

Monday, November 6

Title Presenters Time
Upstream bug triage: the hidden gem? Sylvain Bauza and Stephen Finucane 11:35-12:15
The road to virtualization: highlighting the unique challenges faces by telcos Anita Tragler, Andrew Harris, and Greg Smith (Juniper Networks) 11:35-12:15
Will the real public clouds, please SDK up. OpenStack in the native prog lang of your choice. Monty Taylor, David Flanders (University of Melbourne), and Tobias Rydberg (City Network Hosting AB) 11:35-12:15
OpenStack: zero to hero Keith Tenzer 1:30-1:40
Questions to make your storage vendor squirm Gregory Farnum 1:30-2:10
Panel: experiences scaling file storage with CephFS and OpenStack Gergory Farnum, Sage Weil, Patrick Donnelly, and Arne Wiebalck (CERN) 1:30-2:10
Keeping it real (time) Stephen Finucane and Sylvain Bauza 2:15-2:25
Multicloud requirements and implementations: from users, developers, service providers Mark McLoughlin, Jay Pipes (Mirantis), Kurt Garloff (T-Systems International GmbH), Anni Lai (Huawei), and Tim Bell (CERN) 2:20-3:00
How Zanata powers upstream collaboration with OpenStack internationalization Alex Eng, Patrick Huang, and Ian Y. Choi (Fuse) 2:20-3:00
CephFS: now fully awesome (what is the impact of CephFS on the OpenStack cloud?) Andrew Hatfield, Ramana Raja, and Victoria Martinez de la Cruz 2:30-2:40
Putting OpenStack on Kubernetes: what tools can we use? Flavio Percoco 4:20-5:00
Scalable and distributed applications in Python Julien Danjou 4:35-4:45
Achieving zen-like bliss with Glance Erno Kuvaja, Brian Rosmaita (Verizon), and Abhishek Kekane (NTT Data) 5:10-5:50
Migrating your job from Jenkins Job Builder to Ansible Playbooks, a Zuulv3 story Paul Belanger 5:10-5:50
The return of OpenStack Telemetry and the 10,000 instances Julien Danjou and Alex Krzos 5:20-5:30
Warp-speed Open vSwitch: turbo-charge VNFs to 100Gbps in next-gen SDN/NFV datacenter Anita Tragler, Ash Bhalgat, and Mark Iskra (Nokia) 5:50-6:00

 

Tuesday, November 7

Title Presenters Time
ETSI NFV specs’ requirements vs. OpenStack reality Frank Zdarsky and Gergely Csatari (Nokia) 9:00-9:40
Monitoring performance of your OpenStack environment Matthias Runge 9:30-9:40
OpenStack compliance speed and agility: yes, it’s possible Keith Basil and Shawn Wells 9:50-10:30
Operational management: how is it really done, and what should OpenStack do about it Anandeep Pannu 10:20-10:30
Creating NFV-ready containers with kuryr-kubernetes Antoni Segura Puimedon and Kirill Zaitsev (Samsung) 10:50-11:30
Encryption workshop: using encryption to secure your cloud Ade Lee, Juan Osorio Robles, Kaitlin Farr (Johns Hopkins University), and Dave McCowan (Cisco) 10:50-12:20
Neutron-based networking in Kubernetes using Kuryr – a hands-on lab Sudhir Kethamakka, Geetika Batra, and Amol Chobe (JP Morgan Chase) 10:50-12:20
A Telco story of OpenStack success Krzysztof Janiszewski, Darin Sorrentino, and Dimitar Ivanov (TELUS) 1:50-2:30
Turbo-charging OpenStack for NFV workloads Ajay Simha, Vinay Rao, and Ian Wells (Cisco) 3:20-3:30
Windmill 101: Ansible-based deployments for Zuul / Nodepool Paul Belanger and Ricardo Carrillo Cruz 3:20-4:50
Simplet encrypted volume management with Tang Nathaniel McCallum and Ade Lee 3:50-4:00
Deploying multi-container applications with Ansible service broker Eric Dube and Todd Sanders 5:00-5:40

 

Wednesday, November 8

Title Presenters Time
OpenStack: the perfect virtual infrastructure manager (VIM) for a virtual evolved packet core (vEPC) Julio Villarreal Pelegrino and Rimma Iontel 9:00-9:40
Bringing worlds together: designing and deploying Kubernetes on an OpenStack multi-site environment Roger Lopez and Julio Villarreal Pelegrino 10:20-10:30
DMA (distributed monitoring and analysis): monitoring practice and lifecycle management for Telecom Tomofumi Hayashi, Yuki Kasuya (KDDI) and Toshiaki Takahashi (NEC) 1:50-2:00
Standing up and operating a container service on top of OpenStack using OpenShift Dan McPherson, Ata Turk (MOC), and Robert Baron (Boston University) 1:50-2:30
Why are you not a mentor in the OpenStack community yet? Rodrigo Duarte Sousa, Raildo Mascena, and Telles Nobrega 1:50-2:30
What the heck are DHSS driver modes in OpenStack Manila? Tom Barron, Rodrigo Barbieri, and Goutham Pacha Ravi (NetApp) 1:50-2:30
SD-WAN – the open source way Azhar Sayeed and Jaffer Derwish 2:40-3:20
Adding Cellsv2 to your existing Nova deployment Dan Smith 3:30-4:10
What’s your workflow? Daniel Mellado and David Paterson (Dell) 3:30-4:10
Glance image import is here…now it’s time to start using it! Erno Kuvaja and Brian Rosmaita (Verizon) 4:30-5:10

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 3

In Part 1 of this series Federico Iezzi, EMEA Cloud Architect with Red Hat covered the architecture and planning requirements to begin the journey into achieving zero packet loss in Red Hat OpenStack Platform 10 for NFV deployments. In Part 2 he went into the details around the specific tuning and parameters required. Now, in Part 3, Federico concludes the series with an example of how all this planning and tuning comes together!

opwithtoolsinside

Putting it all together …

So, what happens when you use the cpu tuning features?

Well, it depends on the hardware choice of course. But to see some examples we can use Linux perf events to see what is going on. Let’s look at two examples.

Virtual Machine

On a KVM VM, you will have the ideal results because you don’t have all of the interrupts from the real hardware:

$ perf record -g -C 1 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 100  of event 'cpu-clock'
# Event count (approx.): 25000000
#
# Children      Self  Command  Shared Object      Symbol               
# ........  ........  .......  .................  .....................
#
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] default_idle
            |
            ---default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] cpu_startup_entry
            |
            ---cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%   100.00%  swapper  [kernel.kallsyms]  [k] native_safe_halt
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] start_secondary
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt

Physical Machine

On physical hardware, it’s quite different. The best results involved backlighting a bunch of ipmi and watchdog kernel modules:

$ modprobe -r iTCO_wdt iTCO_vendor_support
$ modprobe -r i2c_i801
$ modprobe -r ipmi_si ipmi_ssif ipmi_msghandler

Note: If you have a different watchdog than the example above, (iTCO is for Supermicro motherboards), check out the kernel modules folder where you can find the whole list: /lib/modules/*/kernel/drivers/watchdog/

Here’s the perf command and output for physical:

$ perf record -F 99 -g -C 2 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4  of event 'cycles:ppp'
# Event count (approx.): 255373
#
# Children      Self       Samples  Command  Shared Object      Symbol                                        
# ........  ........  ............  .......  .................  ..............................................
#
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] generic_smp_call_function_single_interrupt
            |
            ---generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] smp_call_function_single_interrupt
            |
            ---smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] call_function_single_interrupt
            |
            ---call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] cpuidle_idle_call
            |
            ---cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore

Using mpstat, and excluding the hardware interrupts, the results are as follows:

Please note: one CPU core per socket has been excluded – in this case using two Xeon E5-2640 V4.

$ mpstat -P 1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,29,31,32,32,34,35,36,37,38,39 3600
Linux 3.10.0-514.16.1.el7.x86_64 (ws1.localdomain)      04/20/2017      _x86_64_     (40 CPU)

03:05:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       9    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      11    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      13    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      15    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      24    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      25    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      27    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      29    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      32    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      34    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      35    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      36    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      37    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      38    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Cool, right? Want to know more about Linux perf events? Check out the following links:

Zero Packet Loss, achieved …

As you can see, using tuned and the cpu-partitioning profile is exceptional in that it exposes a lot of deep Linux tuning which usually only a few people know about.

IMG_0786

And with a combination of tuning, service settings, and plenty of  interrupt isolations (over 50% of the total settings are about interrupt isolations!) things really start to fly.

Finally, once you make sure PMD threads and VNF vCPUs do not get interrupted by other threads allowing for proper CPU core allocation, the zero packet loss game is achieved.

Of course, there are other considerations such as the hardware chosen, the VNF quality, and the number of PMD threads, but, generally speaking, those are the main requirements.

Further Reading …

Red Hat Enterprise Linux Performance Tuning Guide
Network Functions Virtualization Configuration Guide (Red Hat OpenStack Platform 10)


Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 2

Ready for more Fast Packets?!

In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.

Buckle in and join the fast lane of packet processing!

opwithtoolsinside

Getting into the specifics

It’s important to understand the components we’ll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).

The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.

So, let’s do it …

SystemD CPUAffinity

The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things  should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they

16825122555_3fa958b022_m
Photo: CC0-licensed (Fritzchens Fritz)

might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.

So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).

IRQBALANCE_BANNED_CPUS

The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).

Tickless Kernel

Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.  

Adaptive-Ticks CPUs

Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be “adaptive-ticks CPUs.” This is important for applications with aggressive real-time response constraints because it allows them to improve their

85289822_6124c4865e_m
Photo: CC0-licensed. (Roland Tanglao)

worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.

RCU Callbacks Offload

The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.

Fixed CPU frequency scaling

The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.

nosoftlockup

16292083608_c114e2a3d2_q
Photo: CC0-licensed. (Alan Levine)

The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.

Dirty pages affinity

Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.

Execute workqueue requests

Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.

Disable Machine Check Exception

Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem.

KVM Low Latency

Both standard KVM modules and Intel KVM modules support a number of options to

33196069693_b987a2da1c_m
Photo: CC0-licensed. (Marsel Minga)

reduce latency by removing unwanted VM Exit and interrupts. 

Pause Loop Exiting

In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”

Periodic Kvmclock Sync

In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.

SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:

Parameter Value Details
kernel.hung_task_timeout_secs 600 Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile.
kernel.nmi_watchdog 0 Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile.
vm.stat_interval 10 Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile.
kernel.timer_migration 1 In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels – https://bugzilla.redhat.com/show_bug.cgi?id=1408308  – will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile.
kernel.numa_balancing 0 Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile.

Disable Transparent Hugepages

Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).

Tuned parameters

The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.

  • force_latency = 1
  • governor = performance
  • energy_perf_bias = performance
  • min_perf_pct = 100

Speeding to a conclusion

As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.

Next time the series finishes with an example of how this all comes together!

Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 1

For Telcos considering OpenStack, one of the major areas of focus can be around network performance. While the performance discussion may often begin with talk of throughput numbers expressed in Million-packets-per-second (Mpps) values across Gigabit-per-second (Gbps) hardware, it really is only the tip of the performance iceberg. The most common requirement is to have absolutely stable and deterministic network performance (Mpps and latency) over the absolutely fastest possible throughput. With that in mind, many applications in the Telco space require low latency that can only tolerate zero packet loss.

In this “Operationalizing OpenStack” blogpost Federico Iezzi, EMEA Cloud Architect with Red Hat, discusses some of the real-world deep tuning and process required to make zero packet loss a reality!

opwithtoolsinside

Packet loss is bad for business …

Packet loss can be defined as occurring “when one or more packets of data travelling across a computer network fail to reach their destination [1].” Packet loss results in protocol latency as losing a TCP packet requires retransmission, which takes time. What’s worse, protocol latency manifests itself externally as “application delay.” And, of course, “application delay” is nothing more than a fancy term for something that all Telco’s want to avoid: a fault. So, as network performance degrades, and packets drop, retransmission occurs at higher and higher rates.  The more retransmission the more latency experienced and the slower the system gets. With increased packets due to this retransmission we also see increased congestion slowing the system even further.

Screen Shot 2017-06-26 at 9.17.04 AM

Tune in now for better performance …

So how do we prepare OpenStack for Telco? 

30019218201_a1d7f1963c_m1.jpg
Photo CC0-licensed (Alan Levine)

It’s easy! Tuning!

Red Hat OpenStack Platform is supported by a detailed Network Functions Virtualization (NFV) Reference Architecture which offers a lot of deep tuning across multiple technologies ranging from Red Hat Enterprise Linux to the Data Plane Development Kit (DPDK) from Intel.  A great place to start is with the Red Hat Network Functions Virtualization (NFV) Product Guide. It covers tuning for the following components:

  • Red Hat Enterprise Linux version 7.3
  • Red Hat OpenStack Platform version 10 or greater
  • Data plane tuning
    • Open vSwitch with DPDK at least version 2.6
    • SR-IOV VF or PF
  • System Partitioning through Tuned using profile cpu-partitioning at least version 2.8
  • Non-uniform memory access (NUMA) and virtual non-uniform memory access (vNUMA)
  • General OpenStack configuration

Hardware notes and prep …

It’s worth mentioning that the hardware to be used in achieving zero packet loss often

28584287540_a29eb7a12b_m
Photo CC0-licensed (PC Gehäuse)

needs to be latest generation. Hardware decisions around network interface cards and vendors can often affect packet loss and tuning success. For hardware, be sure to consult your vendor’s documentation prior to purchase to ensure the best possible outcomes. Ultimately, regardless of hardware, some setup should be done in the hardware BIOS/UEFI for stable CPU frequency while removing power saving features.

Setting Value
MLC Streamer Enabled
MLC Spatial Prefetcher Enabled
Memory RAS and Performance Config Maximum Performance
NUMA optimized Enabled
DCU Data Prefetcher Enabled
DCA Enabled
CPU Power and Performance Performance
C6 Power State Disabled
C3 Power State Disabled
CPU C-State Disabled
C1E Autopromote Disabled
Cluster-on-Die Disabled
Patrol Scrub Disabled
Demand Scrub Disabled
Correctable Error 10
Intel(R) Hyper-Threading Disabled or Enabled
Active Processor Cores All
Execute Disable Bit Enabled
Intel(R) Virtualization Technology Enabled
Intel(R) TXT Disabled
Enhanced Error Containment Mode Disabled
USB Controller Enabled
USB 3.0 Controller Auto
Legacy USB Support Disabled
Port 60/64 Emulation Disabled


BIOS Settings from:

Divide and Conquer …

Properly enforcing resource partitioning is essential in achieving zero packet loss performance and to do this you need to partition the resources between the host and the guest correctly. System partitioning ensures that software resources running on the host are always given access to dedicated hardware. However, partitioning goes further than just access to hardware as it can be used to ensure that resources utilize the closest possible memory addresses across all the processors. When a CPU retrieves data from a memory address it first looks at the local cache on the local processor core itself. Proper partitioning, via tuning, ensures that requests are answered from the closest cache (L1, L2 or L3 cache) as well as from the local memory, minimizing transaction times and the usage of a point-to-point processor interconnection bus such as the QPI (Intel QuickPath Interconnect). This way of accessing and dividing the memory is defined as NUMA (non-uniform memory access) design.

Tuned in …

System partitioning involves a lot of complex, low-level tuning. So how does one do this easily?

You’ll need to use the tuned daemon along with the the accompanying cpu partitioning profile. Tuned is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information. Tuned is distributed with a number of predefined profiles for common use cases. For all this to work, you’ll need the newest tuned features. This requires the latest version of tuned (i.e. 2.8 or later) as well as the latest tuned cpu-partitioning profile (i.e. 2.8 or later). Both have are available publicly via the Red Hat Enterprise Linux 7.4 beta release or you can grab the daemon and profiles directly from their upstream projects. 

Interested in the latest generation of Red Hat Enterprise Linux? Be the first to know when it is released by following the official Red Hat Enterprise Linux Blog!


However, before any tuning can begin, you must first decide how the system should be partitioned.Screen Shot 2017-06-26 at 9.19.36 AM

Based on Red Hat experience with customer deployments, we usually find it necessary to define how the system should be partitioned for every specific compute model. In the example pictured above, the total number of PMD cores – one CPU core is two CPU threads – had to be carefully calculated by knowing the overall required Mpps as well as the total number of DPDK interfaces, both physical and vPort. An unbalanced PMD number versus DPDK ports will result in lower performance and interrupts which will generate packet-loss. The rest of the tuning was for the VNF threads, excluding at least one core per NUMA node for the operating system.


Looking for more great ways to ensure your Red Hat OpenStack Platform deployment is rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 



Looking at the
upstream templates as well as in the tuned cpu-partitioning profile, there is a lot to understand about the specific settings that are executed on each core per NUMA node.

So, just what needs to be tuned? Find out more in Part 2 where you’ll get a thorough and detailed breakdown of many specific tuning parameters to help achieve zero packet loss!


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

OpenStack Down Under – OpenStack Days Australia 2017

As OpenStack continues to grow and thrive around the world the OpenStack Foundation continues to bring OpenStack events to all corners of the globe. From community run meetups to more high-profile events like the larger Summits there is probably an OpenStack event going on somewhere near you.

One of the increasingly popular events is the OpenStack Days series. OpenStack Days are regionally focussed events sponsored by local user groups and businesses in the OpenStack universe. The are intended to be formal events with a detailed structure, keynotes and sponsorship.

This year’s OpenStack Days – Australia was held June 1st in Melbourne, Australia and Red Hat was proud to be a sponsor with speakers in multiple tracks!



Keynotes

Despite being a chilly Melbourne winter’s day there was an excellent turnout. Keynotes featured OpenStack Foundation Executive Director Jonathan Bryce presenting a recap of the recent Boston Summit and overall state of OpenStack. He was joined by Heidi Joy Tretheway, Senior Marketing Manager for the OpenStack Foundation, who presented OpenStack User Survey results (you can even mine the data for yourself). Local community leaders also presented, sharing their excitement for the upcoming Sydney Summit, introducing a pre-summit Hackathon and sharing ideas to help get OpenStack into college curriculum.



Just in: The latest OpenStack User Survey has just begun! Get your OpenStack Login ready and contribute today!

Red Hatters Down Under

Red Hatters in Australia and New Zealand (ANZ) are a busy bunch, but we were lucky enough to have had two exceptional Red Hat delivered presentations on the day.

Andrew Hatfield, Practice Lead – Cloud Storage and Big Data

IMG_20170603_122358

The first, from Practice Lead Andrew Hatfield, kicked off the day’s Technical Track to an overflowing room. Andrew’s session, entitled “Hyperconverged Cloud – not just a toy anymore” introduced the technical audience to the idea of OpenStack infrastructure being co-located with Ceph storage on the same nodes. He discussed how combining services, such as Ceph OSD’s onto Nova computes, is fully ready for production workloads in Red Hat OpenStack Platform Version 11. Andrew pointed out how hyperconvergence, when matched properly to workload, can result in both intelligent resource utilisation and cost savings. For industries such as Telco this is an important step forward for both NFV deployments and Edge computing, where a small footprint and highly tuned workloads are required.

Peter Jung, Business Development Executive | Cloud Transformation

IMG_0670

In our second session of the day, Red Hat Business Development Executive Peter Jung presented “OpenStack and Red Hat: How we learned to adapt with our customers in a maturing market.” Peter discussed how Red Hat’s OpenStack journey has evolved over the last few years. With the release of Red Hat OpenStack Platform 7 a few short years ago and its introduction of the deployment lifecycle tool, Red Hat OpenStack Platform director, Red Hat has worked hard to solve real-world user issues around deployment and lifecycle management. Peter’s session covered a range of topics and customer examples, further demonstrating how Red Hat continues to evolve with the maturing OpenStack market.

Community led!

Other sessions on the day covered a wide range of topics across the three tracks providing something for everyone. From an amazing talk in the “Innovation Track” by David Perry of The University of Melbourne around running containers on HPC (with live demo!) to a cheeky “Technical Track” talk by Alexander Tsirel of Hivetec on creating a billing system running on OpenStack, the range, and depth, of the content was second to none. ANZ and Stackers really are a dynamic and exciting bunch! The day ended with an interesting panel covering topics ranging from the complexities of upgrading to the importance of diversity.

Keep an eye on OpenStack Australia’s video channel for session videos! And check out previous event videos from Canberra and Sydney for more OpenStack Down Under!

Of course, I’d be remiss if I were to not mention one of the most talked about non-OpenStack items of the day: The Donut Wall. Just have to see it to believe it, and here it is:

A big success Down Under

The day was a resounding success and the principal organisers, Aptira and the OpenStack Foundation, continue to do an exceptional job to keep the OpenStack presence strong in the ANZ region. We at Red Hat are proud and excited to continue to work with the businesses and community that make OpenStack in Australia and New Zealand amazing and are extremely excited to be the host region for the next OpenStack Summit, being held in Sydney November 6-8, 2017. Can’t wait to see you there!



Want to find out how Red Hat can help you plan, implement and run your OpenStack environment? Join Red Hat Architects Dave Costakos and Julio Villarreal Pelegrino in
“Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud” today.

Use OpenStack every day? Want to know more about how the experts run their clouds? Check out the Operationalising OpenStack series for real-world tips, tricks and stories from OpenStack experts just like you!

Back to Boston! A recap of the 2017 OpenStack Summit

This year the OpenStack® Summit returned to Boston, Massachusetts. The Summit was held the week after the annual Red Hat® Summit, which was also held in Boston. The combination of the two events, back to back, made for an intense, exciting and extremely busy few weeks.

More than 5,000 attendees and 1,000 companies were in attendance for OpenStack Summit. Visitors came from over 60 countries and could choose from more than 750 sessions.

And of course all sessions and keynotes are now easily accessible for online viewing at your own leisure.

IMG_1598

The Summit proved to be a joyful information overload and I’d like to share with you some of my personal favorite moments.

Keynotes: “Costs Less, Does More.”

As in previous years, the Summit kicked off its first two days with a lengthy set of keynotes. The keynote sessions highlighted a variety of companies using OpenStack in many different ways highlighting the “Costs Less, Does More” theme. GE talked about using OpenStack in healthcare for compliance and Verizon discussed their use of Red Hat OpenStack Platform for NFV and edge computing. AT&T and DirectTV showed how they are using OpenStack to deliver customers a highly interactive and customizable on demand streaming service. 
Throughout the keynotes it became quite clear, to me, that OpenStack is truly moving beyond its “newcomer status” and is now solving a wider range of industry use cases than in the past.

In his keynote Red Hat’s Chief Technologist Chris Wright’s discussed Red Hat’s commitment and excitement in being part of the OpenStack community and he shared some numbers from the recent user survey. Chris also shared an important collaboration between Red Hat, Boston University and the Children’s Hospital working to significantly decrease the time required to process and render 3D fetal imaging using OpenShift, GPUs and machine learning. Watch his keynote to learn more about this important research.

Screen Shot 2017-06-05 at 11.46.02 AM
Image courtesy of the OpenStack Foundation

Another interesting keynote reinforcing the “Costs Less, Does More” theme was “The U.S. Army Cyber School: Saving Millions & Achieving Educational Freedom Through OpenStack” by Major Julianna Rodriguez and Captain Christopher W. Apsey. Starting just two short years ago, with almost no hardware, they now use OpenStack to enable their students to solve problems in a “warfare domain.” To do this they require instructors to react quickly to their students requirements and implement labs and solutions that reflect the ever-changing and evolving challenges they face in today’s global cyber domain. The school created an “everything as code for courseware” agile solution framework using OpenStack. Instructors can “go from idea, to code, to deployment, to classroom” in less than a day. And the school is able to do this with a significant cost savings avoiding the “legacy model of using a highly licensed and costly solution.” Both their keynote and their session talk detail very interesting and unexpected OpenStack solution.

Screen Shot 2017-06-16 at 1.33.09 PM

Superusers!

Finally, of particular note and point of pride for those of us in the Red Hat community, we were thrilled to see two of our customers using Red Hat OpenStack Platform share this year’s Superuser Award. Both Paddy Power Betfair and UKCloud transformed their businesses while also contributing back in significant ways to the OpenStack community. We at Red Hat are proud to be partnered with these great, community-minded and leading edge organizations! You can watch the announcement here.

Community Strong!

Another recurring theme was the continuation, strength, and importance of the community behind OpenStack. Red Hat’s CEO Jim Whitehurst touched on this in his fireside chat with OpenStack Foundation Executive Director Jonathan Bryce. Jim and Jonathan discussed how OpenStack has a strong architecture and participation from vendors, users, and enterprises. Jim pointed out that having a strong community, governance structure and culture forms a context for great things to happen, suggesting, “You don’t have to worry about the roadmap; the roadmap will take care of itself.”

Screen Shot 2017-06-05 at 11.49.02 AM
Image courtesy of the OpenStack Foundation

As to the state of OpenStack today, and where it is going to be in, say, five years, Jim’s thoughts really do reflect the strength of the community and the positive future of OpenStack. He noted that the OpenStack journey is unpredictable but has reacted well to the demands of the marketplace and reminded us “if you build … the right community the right things will happen” I think it’s safe to say this community remains on the right track!

The Big Surprise Guest.

IMG_0526 (1)

There was also a surprise guest that was teased throughout the lead up to the Summit and not revealed until many of us arrived at the venue in the morning: Edward Snowden. Snowden spoke with OpenStack Foundation COO Mark Collier in a wide ranging and interesting conversation. Topics included Snowden’s view around the importance in assuring the openness of underlying IaaS layers, warning that it is “fundamentally disempowering to sink costs into an infrastructure that you do not fully control.” He also issued a poignant piece of advice to computer scientists proclaiming “this is the atomic moment for computer scientists.”

I think any community that happily embraces a keynote from both the U.S. Army Cyber School and Edward Snowden in the same 24 hour period is very indicative of an incredibly diverse, intelligent and open community and is one I’m proud to be a part of!

So many great sessions!

As mentioned, with over 750 talks there was no way to attend all of them. Between the exciting Marketplace Hall filled with awesome vendor booths, networking and giveaways, and the many events around the convention center, choosing sessions was tough. Reviewing the full list of recorded sessions reveals just how spoiled for choice we were in Boston.

Even more exciting is that with over 60 talks, Red Hat saw its highest speaker participation level of any OpenStack Summit. Red Hatters covered topics across all areas of the technology and business spectrum. Red Hat speakers ranging from engineering all the way to senior management were out in force! Here’s a short sampling of some of the sessions.

Product management

Red Hat Principal Product Manager Steve Gordon’s “Kubernetes and OpenStack at scale” shared performance testing results when running a 2000+ node OpenShift Container Platform cluster running on a 300-node Red Hat OpenStack Platform cloud. He detailed ways to tune and run OpenShift, Kubernetes and OpenStack based on the results of the testing.

IMG_0502

Security and User Access

For anyone who has ever wrestled with Keystone access control, or who simply wants to better understand how it works and where it could be going, check out Adam Young, Senior Software Engineer at Red Hat and Kristi Nikolla, Software Engineer with the Massachusetts Open Cloud team at Boston University’s session entitled Per API Role Based Access Control. Adam and Kristi discuss the challenges and limitations in the current Keystone implementation around access control and present their vision of its future in what they describe as “an overview of the mechanism, the method, and the madness of RBAC in OpenStack.” Watch to the end for an interesting Q&A session. For more information on Red Hat and the Massachusetts Open Cloud, check out the case study and press release.

Red Hat Services

Red Hat Services featured talks highlighting real world Red Hat OpenStack Platform installations. In “Don’t Fail at Scale- How to Plan for, Build, and Operate a Successful OpenStack Cloud” David Costakos, OpenStack Solutions Architect, and Julio Villarreal Pelegrino, Principal Architect, lightheartedly brought the audience through the real-world do’s and don’t’s of an OpenStack deployment.

IMG_0528

And in “Red Hat – Best practices in the Deployment of a Network Function Virtualization Infrastructure” Julio and Stephane Lefrere, Cloud Infrastructure Practice Lead, discussed the intricacies and gotchas of one of the most complicated and sophisticated deployments in the OpenStack space: NFV. Don’t miss it!

Red Hat Technical Support

Red Hat Cloud Success Architect Sadique Puthen and Senior Technical Support Engineer Jaison Raju took a deep dive into networks in Mastering in Troubleshooting NFV Issues. Digging into the intricacies and picking apart a complex NFV-based deployment would scare even the best networking professionals but Sadique and Jaison’s clear and detailed analysis, reflecting their real world customer experiences, is exceptional. I’ve been lucky enough to work with these gentleman from the field side of the business and I can tell you the level of skills present in the support organization within Red Hat is second to none. Watch the talk and see for yourself, you won’t be disappointed!

Red Hat Management

Red Hat’s Senior Director of Product Marketing Margaret Dawson, presented Red Hat – Cloud in the Era of Hybrid-Washing: What’s Real & What’s Not?. Margaret’s session digs into the real-world decision making processes required to make the Digital Transformation journey a success. She highlights that “Hybrid Cloud” is not simply two clouds working together but rather a detailed and complex understanding and execution of shared processes across multiple environments.

As you can see, there was no shortage of Red Hat talent speaking at this year’s Summit.

To learn more about how Red Hat can help you in your Digital Transformation journey check out the full “Don’t Fail at Scale” Webinar!

See you in six months in Sydney!

Each year the OpenStack Summit seems to get bigger and better. But this year I really felt it was the beginning of a significant change. The community is clearly ready to move to the next level of OpenStack to meet the increasingly detailed enterprise demands. And with strong initiatives from the Foundation around key areas such as addressing complexity, growing the community through leadership and mentoring programs, and ensuring a strong commitment to diversity, the future is bright.

IMG_0522

I’m really excited to see this progress showcased at the next Summit, being held in beautiful Sydney, Australia, November 6-8, 2017! Hope to see you there.

As Jim Whitehurst pointed out in his keynote, having a strong community, governance structure and culture really is propelling OpenStack into the future!

The next few years are going to be really, really exciting!

Using Ansible Validations With Red Hat OpenStack Platform – Part 3

In the previous two blogposts (Part 1 and Part 2) we demonstrated how to create a dynamic Ansible inventory file for a running OpenStack cloud. We then used that inventory to run Ansible-based validations with the ansible-playbook command from the CLI.

In the final part of our series, we demonstrate how to run those same validations using two new methods: the OpenStack scheduling service, Mistral, and the Red Hat OpenStack director UI.

opwithtoolsinside

Method 2: Mistral

Validations can be executed using the OpenStack Mistral Unified CLI. Mistral is the task service on the director and can be used for doing everything from calling local scripts, as we are doing here, to launching instances.

You can easily find the available validations using Mistral from the openstack unified CLI. The command returns all the validations loaded on director, which can be a long list. Below we have run the command, but omitted all but the ceilometerdb-size check:

[stack@undercloud ansible]$  openstack action execution run tripleo.validations.list_validations | jq '.result[]'
...
{
  "name": "Ceilometer Database Size Check",
  "groups": [
    "pre-deployment"
  ],
  "id": "ceilometerdb-size",
  "metadata": {},
  "description": "The undercloud's ceilometer database can grow to a substantial size if metering_time_to_live and event_time_to_live is set to a negative value (infinite limit). This validation checks each setting and fails if variables are set to a negative value or if they have no custom setting (their value is -1 by default).n"
}
...

Next step is to execute this workflow by using the “id” value found in the Mistral output:

$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "ceilometerdb-size"}'

The example below is what it looks like when run on the director and it contains the final piece of information needed to execute our check:

mistral1

Look for the “Workflow ID”, and once more run a Mistral command using it:

$ openstack workflow execution output show 4003541b-c52e-4403-b634-4f9987a326e1

The output on the director is below:

mistral2

As expected, the negative value in metering_time_to_live has triggered the check and the returned output indicates it clearly.

Method 3: The Director GUI

The last way we will run a validation is via the director UI. The validations visible from within the UI depend on what playbooks are present in the /usr/share/openstack-tripleo-validations/validations/ directory on the director. Validations can be added and removed dynamically.

Here is a short (60-second) video which demonstrates adding the ceilometerdb-size validation to the director via the CLI and then running it from the UI:

Pretty cool, right?

Where to from here?

As you write your own validations you can submit them upstream and help grow the community. To learn more about the upstream validations check out their project repository on github.

And don’t forget, contributing an approved commit to an OpenStack project can gain you Active Technical Contributor (ATC) status for the release cycle. So, not only do you earn wicked OpenStack developer cred, but you may be eligible to attend a Project Teams Gathering (PTG) and receive discounted entry to the OpenStack Summit for that release.

With the availability of Ansible on Red Hat OpenStack Platform you can immediately access the power Ansible brings to IT automation and management. There are more than 20 pre-supplied TripleO validation playbooks supplied with Red Hat OpenStack Platform 11 director and many more upstream.

Ansible validations are ready now. Try them out. Join the community. Keep your Cloud happy.

Thanks!

That’s the end of our series on Ansible validations. Don’t forget to read Part 1 and Part 2 if you haven’t already.

Thanks for reading!

Further info about Red Hat OpenStack Platform

For more information about Red Hat OpenStack Platform please visit the technology overview page, product documentation, and release notes.

Ready to go deeper with Ansible? Check out the latest collection of Ansible eBooks, including a free samples from every title!

And don’t forget you can evaluate Red Hat OpenStack Platform for free for 60 days!

The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

 

Using Ansible Validations With Red Hat OpenStack Platform – Part 2

In Part 1 we demonstrated how to set up a Red Hat OpenStack Ansible environment by creating a dynamic Ansible inventory file (check it out if you’ve not read it yet!).

Next, in Part 2 we demonstrate how to use that dynamic inventory with included, pre-written Ansible validation playbooks from the command line.

opwithtoolsinside

Time to Validate!

The openstack-tripleo-validations RPM provides all the validations. You can find them in /usr/share/openstack-tripleo-validations/validations/ on the director host. Here’s a quick look, but check them out on your deployment as well.

validations_dir

With Red Hat OpenStack Platform we ship over 20 playbooks to try out, and there are many more upstream.  Check the community often as the list of validations is always changing. Unsupported validations can be downloaded and included in the validations directory as required.

A good first validation to try is the ceilometerdb-size validation. This playbook ensures that the ceilometer configuration on the Undercloud doesn’t allow data to be retained indefinitely. It checks the metering_time_to_live and event_time_to_live parameters in /etc/ceilometer/ceilometer.conf to see if they are either unset or set to a negative value (representing infinite retention). Ceilometer data retention can lead to decreased performance on the director node and degraded abilities for third party tools which rely on this data.

Now, let’s run this validation using the command line in an environment where we have one of the values it checks set correctly and the other incorrectly. For example:

[stack@undercloud ansible]$ sudo awk '/^metering_time_to_live|^event_time_to_live/' /etc/ceilometer/ceilometer.conf

metering_time_to_live = -1

event_time_to_live=259200

Method 1: ansible-playbook

The easiest way is to run the validation using the standard ansible-playbook command:

$ ansible-playbook /usr/share/openstack-tripleo-validations/validations/ceilometerdb-size.yaml

ansible-playbook-3

So, what happened?

Ansible output is colored to help read it more easily. The green “OK” lines for the “setup” and “Get TTL setting values from ceilometer.conf” tasks represent Ansible successfully finding the metering and event values, as per this task:

  - name: Get TTL setting values from ceilometer.conf
    become: true
    ini: path=/etc/ceilometer/ceilometer.conf section=database key={{ item }} ignore_missing_file=True
    register: config_result
    with_items:
      - "{{ metering_ttl_check }}"
      - "{{ event_ttl_check }}"

And the red and blue outputs come from this task:

  - name: Check values
    fail: msg="Value of {{ item.item }} is set to {{ item.value or "-1" }}."
    when: item.value|int < 0 or item.value  == None
    with_items: "{{ config_result.results }}"

Here, Ansible will issue a failed result (the red) if the “Check Values” task meets the conditional test (less than 0 or non-existent). So, in our case, since metering_time_to_live was set to -1 it met the condition and the task was run, resulting in the only possible outcome: failed.

With the blue output, Ansible is telling us it skipped the task. In this case this represents a good result. Consider that the event_time_to_live value is set to 259200. This value does not match the conditional in the task (item.value|int < 0 or item.value  == None). And since the task only runs when the conditional is met, and the task’s only output is to produce a failed result, it skips the task. So, a skip means we have passed for this value.

For even more detail you can run ansible-playbook in a verbose mode, by adding -vvv to the command:

$ ansible-playbook -vvv /usr/share/openstack-tripleo-validations/validations/ceilometerdb-size.yaml

You’ll find an excellent and interesting amount of information is returned and worth the time to review. Give it a try on your own environment. You may also want to learn more about Ansible playbooks by reviewing the full documentation.

Now that you’ve seen your first validation you can see how powerful they are. But the CLI is not the only way to run the validations.

Ready to go deeper with Ansible? Check out the latest collection of Ansible eBooks, including free samples from every title!

In the final part of the series we introduce validations with both the OpenStack scheduling service, Mistral, and the director web UI. Check back soon!

The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.