Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 3

In Part 1 of this series Federico Iezzi, EMEA Cloud Architect with Red Hat covered the architecture and planning requirements to begin the journey into achieving zero packet loss in Red Hat OpenStack Platform 10 for NFV deployments. In Part 2 he went into the details around the specific tuning and parameters required. Now, in Part 3, Federico concludes the series with an example of how all this planning and tuning comes together!

opwithtoolsinside

Putting it all together …

So, what happens when you use the cpu tuning features?

Well, it depends on the hardware choice of course. But to see some examples we can use Linux perf events to see what is going on. Let’s look at two examples.

Virtual Machine

On a KVM VM, you will have the ideal results because you don’t have all of the interrupts from the real hardware:

$ perf record -g -C 1 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 100  of event 'cpu-clock'
# Event count (approx.): 25000000
#
# Children      Self  Command  Shared Object      Symbol               
# ........  ........  .......  .................  .....................
#
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] default_idle
            |
            ---default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] cpu_startup_entry
            |
            ---cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%   100.00%  swapper  [kernel.kallsyms]  [k] native_safe_halt
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] start_secondary
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt

Physical Machine

On physical hardware, it’s quite different. The best results involved backlighting a bunch of ipmi and watchdog kernel modules:

$ modprobe -r iTCO_wdt iTCO_vendor_support
$ modprobe -r i2c_i801
$ modprobe -r ipmi_si ipmi_ssif ipmi_msghandler

Note: If you have a different watchdog than the example above, (iTCO is for Supermicro motherboards), check out the kernel modules folder where you can find the whole list: /lib/modules/*/kernel/drivers/watchdog/

Here’s the perf command and output for physical:

$ perf record -F 99 -g -C 2 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4  of event 'cycles:ppp'
# Event count (approx.): 255373
#
# Children      Self       Samples  Command  Shared Object      Symbol                                        
# ........  ........  ............  .......  .................  ..............................................
#
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] generic_smp_call_function_single_interrupt
            |
            ---generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] smp_call_function_single_interrupt
            |
            ---smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] call_function_single_interrupt
            |
            ---call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] cpuidle_idle_call
            |
            ---cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore

Using mpstat, and excluding the hardware interrupts, the results are as follows:

Please note: one CPU core per socket has been excluded – in this case using two Xeon E5-2640 V4.

$ mpstat -P 1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,29,31,32,32,34,35,36,37,38,39 3600
Linux 3.10.0-514.16.1.el7.x86_64 (ws1.localdomain)      04/20/2017      _x86_64_     (40 CPU)

03:05:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       9    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      11    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      13    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      15    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      24    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      25    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      27    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      29    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      32    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      34    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      35    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      36    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      37    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      38    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Cool, right? Want to know more about Linux perf events? Check out the following links:

Zero Packet Loss, achieved …

As you can see, using tuned and the cpu-partitioning profile is exceptional in that it exposes a lot of deep Linux tuning which usually only a few people know about.

IMG_0786

And with a combination of tuning, service settings, and plenty of  interrupt isolations (over 50% of the total settings are about interrupt isolations!) things really start to fly.

Finally, once you make sure PMD threads and VNF vCPUs do not get interrupted by other threads allowing for proper CPU core allocation, the zero packet loss game is achieved.

Of course, there are other considerations such as the hardware chosen, the VNF quality, and the number of PMD threads, but, generally speaking, those are the main requirements.

Further Reading …

Red Hat Enterprise Linux Performance Tuning Guide
Network Functions Virtualization Configuration Guide (Red Hat OpenStack Platform 10)


Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 2

Ready for more Fast Packets?!

In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.

Buckle in and join the fast lane of packet processing!

opwithtoolsinside

Getting into the specifics

It’s important to understand the components we’ll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).

The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.

So, let’s do it …

SystemD CPUAffinity

The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things  should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they

16825122555_3fa958b022_m
Photo: CC0-licensed (Fritzchens Fritz)

might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.

So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).

IRQBALANCE_BANNED_CPUS

The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).

Tickless Kernel

Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.  

Adaptive-Ticks CPUs

Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be “adaptive-ticks CPUs.” This is important for applications with aggressive real-time response constraints because it allows them to improve their

85289822_6124c4865e_m
Photo: CC0-licensed. (Roland Tanglao)

worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.

RCU Callbacks Offload

The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.

Fixed CPU frequency scaling

The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.

nosoftlockup

16292083608_c114e2a3d2_q
Photo: CC0-licensed. (Alan Levine)

The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.

Dirty pages affinity

Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.

Execute workqueue requests

Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.

Disable Machine Check Exception

Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem.

KVM Low Latency

Both standard KVM modules and Intel KVM modules support a number of options to

33196069693_b987a2da1c_m
Photo: CC0-licensed. (Marsel Minga)

reduce latency by removing unwanted VM Exit and interrupts. 

Pause Loop Exiting

In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”

Periodic Kvmclock Sync

In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.

SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:

Parameter Value Details
kernel.hung_task_timeout_secs 600 Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile.
kernel.nmi_watchdog 0 Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile.
vm.stat_interval 10 Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile.
kernel.timer_migration 1 In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels – https://bugzilla.redhat.com/show_bug.cgi?id=1408308  – will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile.
kernel.numa_balancing 0 Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile.

Disable Transparent Hugepages

Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).

Tuned parameters

The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.

  • force_latency = 1
  • governor = performance
  • energy_perf_bias = performance
  • min_perf_pct = 100

Speeding to a conclusion

As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.

Next time the series finishes with an example of how this all comes together!

Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 1

For Telcos considering OpenStack, one of the major areas of focus can be around network performance. While the performance discussion may often begin with talk of throughput numbers expressed in Million-packets-per-second (Mpps) values across Gigabit-per-second (Gbps) hardware, it really is only the tip of the performance iceberg. The most common requirement is to have absolutely stable and deterministic network performance (Mpps and latency) over the absolutely fastest possible throughput. With that in mind, many applications in the Telco space require low latency that can only tolerate zero packet loss.

In this “Operationalizing OpenStack” blogpost Federico Iezzi, EMEA Cloud Architect with Red Hat, discusses some of the real-world deep tuning and process required to make zero packet loss a reality!

opwithtoolsinside

Packet loss is bad for business …

Packet loss can be defined as occurring “when one or more packets of data travelling across a computer network fail to reach their destination [1].” Packet loss results in protocol latency as losing a TCP packet requires retransmission, which takes time. What’s worse, protocol latency manifests itself externally as “application delay.” And, of course, “application delay” is nothing more than a fancy term for something that all Telco’s want to avoid: a fault. So, as network performance degrades, and packets drop, retransmission occurs at higher and higher rates.  The more retransmission the more latency experienced and the slower the system gets. With increased packets due to this retransmission we also see increased congestion slowing the system even further.

Screen Shot 2017-06-26 at 9.17.04 AM

Tune in now for better performance …

So how do we prepare OpenStack for Telco? 

30019218201_a1d7f1963c_m1.jpg
Photo CC0-licensed (Alan Levine)

It’s easy! Tuning!

Red Hat OpenStack Platform is supported by a detailed Network Functions Virtualization (NFV) Reference Architecture which offers a lot of deep tuning across multiple technologies ranging from Red Hat Enterprise Linux to the Data Plane Development Kit (DPDK) from Intel.  A great place to start is with the Red Hat Network Functions Virtualization (NFV) Product Guide. It covers tuning for the following components:

  • Red Hat Enterprise Linux version 7.3
  • Red Hat OpenStack Platform version 10 or greater
  • Data plane tuning
    • Open vSwitch with DPDK at least version 2.6
    • SR-IOV VF or PF
  • System Partitioning through Tuned using profile cpu-partitioning at least version 2.8
  • Non-uniform memory access (NUMA) and virtual non-uniform memory access (vNUMA)
  • General OpenStack configuration

Hardware notes and prep …

It’s worth mentioning that the hardware to be used in achieving zero packet loss often

28584287540_a29eb7a12b_m
Photo CC0-licensed (PC Gehäuse)

needs to be latest generation. Hardware decisions around network interface cards and vendors can often affect packet loss and tuning success. For hardware, be sure to consult your vendor’s documentation prior to purchase to ensure the best possible outcomes. Ultimately, regardless of hardware, some setup should be done in the hardware BIOS/UEFI for stable CPU frequency while removing power saving features.

Setting Value
MLC Streamer Enabled
MLC Spatial Prefetcher Enabled
Memory RAS and Performance Config Maximum Performance
NUMA optimized Enabled
DCU Data Prefetcher Enabled
DCA Enabled
CPU Power and Performance Performance
C6 Power State Disabled
C3 Power State Disabled
CPU C-State Disabled
C1E Autopromote Disabled
Cluster-on-Die Disabled
Patrol Scrub Disabled
Demand Scrub Disabled
Correctable Error 10
Intel(R) Hyper-Threading Disabled or Enabled
Active Processor Cores All
Execute Disable Bit Enabled
Intel(R) Virtualization Technology Enabled
Intel(R) TXT Disabled
Enhanced Error Containment Mode Disabled
USB Controller Enabled
USB 3.0 Controller Auto
Legacy USB Support Disabled
Port 60/64 Emulation Disabled


BIOS Settings from:

Divide and Conquer …

Properly enforcing resource partitioning is essential in achieving zero packet loss performance and to do this you need to partition the resources between the host and the guest correctly. System partitioning ensures that software resources running on the host are always given access to dedicated hardware. However, partitioning goes further than just access to hardware as it can be used to ensure that resources utilize the closest possible memory addresses across all the processors. When a CPU retrieves data from a memory address it first looks at the local cache on the local processor core itself. Proper partitioning, via tuning, ensures that requests are answered from the closest cache (L1, L2 or L3 cache) as well as from the local memory, minimizing transaction times and the usage of a point-to-point processor interconnection bus such as the QPI (Intel QuickPath Interconnect). This way of accessing and dividing the memory is defined as NUMA (non-uniform memory access) design.

Tuned in …

System partitioning involves a lot of complex, low-level tuning. So how does one do this easily?

You’ll need to use the tuned daemon along with the the accompanying cpu partitioning profile. Tuned is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information. Tuned is distributed with a number of predefined profiles for common use cases. For all this to work, you’ll need the newest tuned features. This requires the latest version of tuned (i.e. 2.8 or later) as well as the latest tuned cpu-partitioning profile (i.e. 2.8 or later). Both have are available publicly via the Red Hat Enterprise Linux 7.4 beta release or you can grab the daemon and profiles directly from their upstream projects. 

Interested in the latest generation of Red Hat Enterprise Linux? Be the first to know when it is released by following the official Red Hat Enterprise Linux Blog!


However, before any tuning can begin, you must first decide how the system should be partitioned.Screen Shot 2017-06-26 at 9.19.36 AM

Based on Red Hat experience with customer deployments, we usually find it necessary to define how the system should be partitioned for every specific compute model. In the example pictured above, the total number of PMD cores – one CPU core is two CPU threads – had to be carefully calculated by knowing the overall required Mpps as well as the total number of DPDK interfaces, both physical and vPort. An unbalanced PMD number versus DPDK ports will result in lower performance and interrupts which will generate packet-loss. The rest of the tuning was for the VNF threads, excluding at least one core per NUMA node for the operating system.


Looking for more great ways to ensure your Red Hat OpenStack Platform deployment is rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 



Looking at the
upstream templates as well as in the tuned cpu-partitioning profile, there is a lot to understand about the specific settings that are executed on each core per NUMA node.

So, just what needs to be tuned? Find out more in Part 2 where you’ll get a thorough and detailed breakdown of many specific tuning parameters to help achieve zero packet loss!


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

OpenStack Down Under – OpenStack Days Australia 2017

As OpenStack continues to grow and thrive around the world the OpenStack Foundation continues to bring OpenStack events to all corners of the globe. From community run meetups to more high-profile events like the larger Summits there is probably an OpenStack event going on somewhere near you.

One of the increasingly popular events is the OpenStack Days series. OpenStack Days are regionally focussed events sponsored by local user groups and businesses in the OpenStack universe. The are intended to be formal events with a detailed structure, keynotes and sponsorship.

This year’s OpenStack Days – Australia was held June 1st in Melbourne, Australia and Red Hat was proud to be a sponsor with speakers in multiple tracks!



Keynotes

Despite being a chilly Melbourne winter’s day there was an excellent turnout. Keynotes featured OpenStack Foundation Executive Director Jonathan Bryce presenting a recap of the recent Boston Summit and overall state of OpenStack. He was joined by Heidi Joy Tretheway, Senior Marketing Manager for the OpenStack Foundation, who presented OpenStack User Survey results (you can even mine the data for yourself). Local community leaders also presented, sharing their excitement for the upcoming Sydney Summit, introducing a pre-summit Hackathon and sharing ideas to help get OpenStack into college curriculum.



Just in: The latest OpenStack User Survey has just begun! Get your OpenStack Login ready and contribute today!

Red Hatters Down Under

Red Hatters in Australia and New Zealand (ANZ) are a busy bunch, but we were lucky enough to have had two exceptional Red Hat delivered presentations on the day.

Andrew Hatfield, Practice Lead – Cloud Storage and Big Data

IMG_20170603_122358

The first, from Practice Lead Andrew Hatfield, kicked off the day’s Technical Track to an overflowing room. Andrew’s session, entitled “Hyperconverged Cloud – not just a toy anymore” introduced the technical audience to the idea of OpenStack infrastructure being co-located with Ceph storage on the same nodes. He discussed how combining services, such as Ceph OSD’s onto Nova computes, is fully ready for production workloads in Red Hat OpenStack Platform Version 11. Andrew pointed out how hyperconvergence, when matched properly to workload, can result in both intelligent resource utilisation and cost savings. For industries such as Telco this is an important step forward for both NFV deployments and Edge computing, where a small footprint and highly tuned workloads are required.

Peter Jung, Business Development Executive | Cloud Transformation

IMG_0670

In our second session of the day, Red Hat Business Development Executive Peter Jung presented “OpenStack and Red Hat: How we learned to adapt with our customers in a maturing market.” Peter discussed how Red Hat’s OpenStack journey has evolved over the last few years. With the release of Red Hat OpenStack Platform 7 a few short years ago and its introduction of the deployment lifecycle tool, Red Hat OpenStack Platform director, Red Hat has worked hard to solve real-world user issues around deployment and lifecycle management. Peter’s session covered a range of topics and customer examples, further demonstrating how Red Hat continues to evolve with the maturing OpenStack market.

Community led!

Other sessions on the day covered a wide range of topics across the three tracks providing something for everyone. From an amazing talk in the “Innovation Track” by David Perry of The University of Melbourne around running containers on HPC (with live demo!) to a cheeky “Technical Track” talk by Alexander Tsirel of Hivetec on creating a billing system running on OpenStack, the range, and depth, of the content was second to none. ANZ and Stackers really are a dynamic and exciting bunch! The day ended with an interesting panel covering topics ranging from the complexities of upgrading to the importance of diversity.

Keep an eye on OpenStack Australia’s video channel for session videos! And check out previous event videos from Canberra and Sydney for more OpenStack Down Under!

Of course, I’d be remiss if I were to not mention one of the most talked about non-OpenStack items of the day: The Donut Wall. Just have to see it to believe it, and here it is:

A big success Down Under

The day was a resounding success and the principal organisers, Aptira and the OpenStack Foundation, continue to do an exceptional job to keep the OpenStack presence strong in the ANZ region. We at Red Hat are proud and excited to continue to work with the businesses and community that make OpenStack in Australia and New Zealand amazing and are extremely excited to be the host region for the next OpenStack Summit, being held in Sydney November 6-8, 2017. Can’t wait to see you there!



Want to find out how Red Hat can help you plan, implement and run your OpenStack environment? Join Red Hat Architects Dave Costakos and Julio Villarreal Pelegrino in
“Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud” today.

Use OpenStack every day? Want to know more about how the experts run their clouds? Check out the Operationalising OpenStack series for real-world tips, tricks and stories from OpenStack experts just like you!

Back to Boston! A recap of the 2017 OpenStack Summit

This year the OpenStack® Summit returned to Boston, Massachusetts. The Summit was held the week after the annual Red Hat® Summit, which was also held in Boston. The combination of the two events, back to back, made for an intense, exciting and extremely busy few weeks.

More than 5,000 attendees and 1,000 companies were in attendance for OpenStack Summit. Visitors came from over 60 countries and could choose from more than 750 sessions.

And of course all sessions and keynotes are now easily accessible for online viewing at your own leisure.

IMG_1598

The Summit proved to be a joyful information overload and I’d like to share with you some of my personal favorite moments.

Keynotes: “Costs Less, Does More.”

As in previous years, the Summit kicked off its first two days with a lengthy set of keynotes. The keynote sessions highlighted a variety of companies using OpenStack in many different ways highlighting the “Costs Less, Does More” theme. GE talked about using OpenStack in healthcare for compliance and Verizon discussed their use of Red Hat OpenStack Platform for NFV and edge computing. AT&T and DirectTV showed how they are using OpenStack to deliver customers a highly interactive and customizable on demand streaming service. 
Throughout the keynotes it became quite clear, to me, that OpenStack is truly moving beyond its “newcomer status” and is now solving a wider range of industry use cases than in the past.

In his keynote Red Hat’s Chief Technologist Chris Wright’s discussed Red Hat’s commitment and excitement in being part of the OpenStack community and he shared some numbers from the recent user survey. Chris also shared an important collaboration between Red Hat, Boston University and the Children’s Hospital working to significantly decrease the time required to process and render 3D fetal imaging using OpenShift, GPUs and machine learning. Watch his keynote to learn more about this important research.

Screen Shot 2017-06-05 at 11.46.02 AM
Image courtesy of the OpenStack Foundation

Another interesting keynote reinforcing the “Costs Less, Does More” theme was “The U.S. Army Cyber School: Saving Millions & Achieving Educational Freedom Through OpenStack” by Major Julianna Rodriguez and Captain Christopher W. Apsey. Starting just two short years ago, with almost no hardware, they now use OpenStack to enable their students to solve problems in a “warfare domain.” To do this they require instructors to react quickly to their students requirements and implement labs and solutions that reflect the ever-changing and evolving challenges they face in today’s global cyber domain. The school created an “everything as code for courseware” agile solution framework using OpenStack. Instructors can “go from idea, to code, to deployment, to classroom” in less than a day. And the school is able to do this with a significant cost savings avoiding the “legacy model of using a highly licensed and costly solution.” Both their keynote and their session talk detail very interesting and unexpected OpenStack solution.

Screen Shot 2017-06-16 at 1.33.09 PM

Superusers!

Finally, of particular note and point of pride for those of us in the Red Hat community, we were thrilled to see two of our customers using Red Hat OpenStack Platform share this year’s Superuser Award. Both Paddy Power Betfair and UKCloud transformed their businesses while also contributing back in significant ways to the OpenStack community. We at Red Hat are proud to be partnered with these great, community-minded and leading edge organizations! You can watch the announcement here.

Community Strong!

Another recurring theme was the continuation, strength, and importance of the community behind OpenStack. Red Hat’s CEO Jim Whitehurst touched on this in his fireside chat with OpenStack Foundation Executive Director Jonathan Bryce. Jim and Jonathan discussed how OpenStack has a strong architecture and participation from vendors, users, and enterprises. Jim pointed out that having a strong community, governance structure and culture forms a context for great things to happen, suggesting, “You don’t have to worry about the roadmap; the roadmap will take care of itself.”

Screen Shot 2017-06-05 at 11.49.02 AM
Image courtesy of the OpenStack Foundation

As to the state of OpenStack today, and where it is going to be in, say, five years, Jim’s thoughts really do reflect the strength of the community and the positive future of OpenStack. He noted that the OpenStack journey is unpredictable but has reacted well to the demands of the marketplace and reminded us “if you build … the right community the right things will happen” I think it’s safe to say this community remains on the right track!

The Big Surprise Guest.

IMG_0526 (1)

There was also a surprise guest that was teased throughout the lead up to the Summit and not revealed until many of us arrived at the venue in the morning: Edward Snowden. Snowden spoke with OpenStack Foundation COO Mark Collier in a wide ranging and interesting conversation. Topics included Snowden’s view around the importance in assuring the openness of underlying IaaS layers, warning that it is “fundamentally disempowering to sink costs into an infrastructure that you do not fully control.” He also issued a poignant piece of advice to computer scientists proclaiming “this is the atomic moment for computer scientists.”

I think any community that happily embraces a keynote from both the U.S. Army Cyber School and Edward Snowden in the same 24 hour period is very indicative of an incredibly diverse, intelligent and open community and is one I’m proud to be a part of!

So many great sessions!

As mentioned, with over 750 talks there was no way to attend all of them. Between the exciting Marketplace Hall filled with awesome vendor booths, networking and giveaways, and the many events around the convention center, choosing sessions was tough. Reviewing the full list of recorded sessions reveals just how spoiled for choice we were in Boston.

Even more exciting is that with over 60 talks, Red Hat saw its highest speaker participation level of any OpenStack Summit. Red Hatters covered topics across all areas of the technology and business spectrum. Red Hat speakers ranging from engineering all the way to senior management were out in force! Here’s a short sampling of some of the sessions.

Product management

Red Hat Principal Product Manager Steve Gordon’s “Kubernetes and OpenStack at scale” shared performance testing results when running a 2000+ node OpenShift Container Platform cluster running on a 300-node Red Hat OpenStack Platform cloud. He detailed ways to tune and run OpenShift, Kubernetes and OpenStack based on the results of the testing.

IMG_0502

Security and User Access

For anyone who has ever wrestled with Keystone access control, or who simply wants to better understand how it works and where it could be going, check out Adam Young, Senior Software Engineer at Red Hat and Kristi Nikolla, Software Engineer with the Massachusetts Open Cloud team at Boston University’s session entitled Per API Role Based Access Control. Adam and Kristi discuss the challenges and limitations in the current Keystone implementation around access control and present their vision of its future in what they describe as “an overview of the mechanism, the method, and the madness of RBAC in OpenStack.” Watch to the end for an interesting Q&A session. For more information on Red Hat and the Massachusetts Open Cloud, check out the case study and press release.

Red Hat Services

Red Hat Services featured talks highlighting real world Red Hat OpenStack Platform installations. In “Don’t Fail at Scale- How to Plan for, Build, and Operate a Successful OpenStack Cloud” David Costakos, OpenStack Solutions Architect, and Julio Villarreal Pelegrino, Principal Architect, lightheartedly brought the audience through the real-world do’s and don’t’s of an OpenStack deployment.

IMG_0528

And in “Red Hat – Best practices in the Deployment of a Network Function Virtualization Infrastructure” Julio and Stephane Lefrere, Cloud Infrastructure Practice Lead, discussed the intricacies and gotchas of one of the most complicated and sophisticated deployments in the OpenStack space: NFV. Don’t miss it!

Red Hat Technical Support

Red Hat Cloud Success Architect Sadique Puthen and Senior Technical Support Engineer Jaison Raju took a deep dive into networks in Mastering in Troubleshooting NFV Issues. Digging into the intricacies and picking apart a complex NFV-based deployment would scare even the best networking professionals but Sadique and Jaison’s clear and detailed analysis, reflecting their real world customer experiences, is exceptional. I’ve been lucky enough to work with these gentleman from the field side of the business and I can tell you the level of skills present in the support organization within Red Hat is second to none. Watch the talk and see for yourself, you won’t be disappointed!

Red Hat Management

Red Hat’s Senior Director of Product Marketing Margaret Dawson, presented Red Hat – Cloud in the Era of Hybrid-Washing: What’s Real & What’s Not?. Margaret’s session digs into the real-world decision making processes required to make the Digital Transformation journey a success. She highlights that “Hybrid Cloud” is not simply two clouds working together but rather a detailed and complex understanding and execution of shared processes across multiple environments.

As you can see, there was no shortage of Red Hat talent speaking at this year’s Summit.

To learn more about how Red Hat can help you in your Digital Transformation journey check out the full “Don’t Fail at Scale” Webinar!

See you in six months in Sydney!

Each year the OpenStack Summit seems to get bigger and better. But this year I really felt it was the beginning of a significant change. The community is clearly ready to move to the next level of OpenStack to meet the increasingly detailed enterprise demands. And with strong initiatives from the Foundation around key areas such as addressing complexity, growing the community through leadership and mentoring programs, and ensuring a strong commitment to diversity, the future is bright.

IMG_0522

I’m really excited to see this progress showcased at the next Summit, being held in beautiful Sydney, Australia, November 6-8, 2017! Hope to see you there.

As Jim Whitehurst pointed out in his keynote, having a strong community, governance structure and culture really is propelling OpenStack into the future!

The next few years are going to be really, really exciting!

Using Ansible Validations With Red Hat OpenStack Platform – Part 3

In the previous two blogposts (Part 1 and Part 2) we demonstrated how to create a dynamic Ansible inventory file for a running OpenStack cloud. We then used that inventory to run Ansible-based validations with the ansible-playbook command from the CLI.

In the final part of our series, we demonstrate how to run those same validations using two new methods: the OpenStack scheduling service, Mistral, and the Red Hat OpenStack director UI.

opwithtoolsinside

Method 2: Mistral

Validations can be executed using the OpenStack Mistral Unified CLI. Mistral is the task service on the director and can be used for doing everything from calling local scripts, as we are doing here, to launching instances.

You can easily find the available validations using Mistral from the openstack unified CLI. The command returns all the validations loaded on director, which can be a long list. Below we have run the command, but omitted all but the ceilometerdb-size check:

[stack@undercloud ansible]$  openstack action execution run tripleo.validations.list_validations | jq '.result[]'
...
{
  "name": "Ceilometer Database Size Check",
  "groups": [
    "pre-deployment"
  ],
  "id": "ceilometerdb-size",
  "metadata": {},
  "description": "The undercloud's ceilometer database can grow to a substantial size if metering_time_to_live and event_time_to_live is set to a negative value (infinite limit). This validation checks each setting and fails if variables are set to a negative value or if they have no custom setting (their value is -1 by default).n"
}
...

Next step is to execute this workflow by using the “id” value found in the Mistral output:

$ openstack workflow execution create tripleo.validations.v1.run_validation '{"validation_name": "ceilometerdb-size"}'

The example below is what it looks like when run on the director and it contains the final piece of information needed to execute our check:

mistral1

Look for the “Workflow ID”, and once more run a Mistral command using it:

$ openstack workflow execution output show 4003541b-c52e-4403-b634-4f9987a326e1

The output on the director is below:

mistral2

As expected, the negative value in metering_time_to_live has triggered the check and the returned output indicates it clearly.

Method 3: The Director GUI

The last way we will run a validation is via the director UI. The validations visible from within the UI depend on what playbooks are present in the /usr/share/openstack-tripleo-validations/validations/ directory on the director. Validations can be added and removed dynamically.

Here is a short (60-second) video which demonstrates adding the ceilometerdb-size validation to the director via the CLI and then running it from the UI:

Pretty cool, right?

Where to from here?

As you write your own validations you can submit them upstream and help grow the community. To learn more about the upstream validations check out their project repository on github.

And don’t forget, contributing an approved commit to an OpenStack project can gain you Active Technical Contributor (ATC) status for the release cycle. So, not only do you earn wicked OpenStack developer cred, but you may be eligible to attend a Project Teams Gathering (PTG) and receive discounted entry to the OpenStack Summit for that release.

With the availability of Ansible on Red Hat OpenStack Platform you can immediately access the power Ansible brings to IT automation and management. There are more than 20 pre-supplied TripleO validation playbooks supplied with Red Hat OpenStack Platform 11 director and many more upstream.

Ansible validations are ready now. Try them out. Join the community. Keep your Cloud happy.

Thanks!

That’s the end of our series on Ansible validations. Don’t forget to read Part 1 and Part 2 if you haven’t already.

Thanks for reading!

Further info about Red Hat OpenStack Platform

For more information about Red Hat OpenStack Platform please visit the technology overview page, product documentation, and release notes.

Ready to go deeper with Ansible? Check out the latest collection of Ansible eBooks, including a free samples from every title!

And don’t forget you can evaluate Red Hat OpenStack Platform for free for 60 days!

The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

 

Using Ansible Validations With Red Hat OpenStack Platform – Part 2

In Part 1 we demonstrated how to set up a Red Hat OpenStack Ansible environment by creating a dynamic Ansible inventory file (check it out if you’ve not read it yet!).

Next, in Part 2 we demonstrate how to use that dynamic inventory with included, pre-written Ansible validation playbooks from the command line.

opwithtoolsinside

Time to Validate!

The openstack-tripleo-validations RPM provides all the validations. You can find them in /usr/share/openstack-tripleo-validations/validations/ on the director host. Here’s a quick look, but check them out on your deployment as well.

validations_dir

With Red Hat OpenStack Platform we ship over 20 playbooks to try out, and there are many more upstream.  Check the community often as the list of validations is always changing. Unsupported validations can be downloaded and included in the validations directory as required.

A good first validation to try is the ceilometerdb-size validation. This playbook ensures that the ceilometer configuration on the Undercloud doesn’t allow data to be retained indefinitely. It checks the metering_time_to_live and event_time_to_live parameters in /etc/ceilometer/ceilometer.conf to see if they are either unset or set to a negative value (representing infinite retention). Ceilometer data retention can lead to decreased performance on the director node and degraded abilities for third party tools which rely on this data.

Now, let’s run this validation using the command line in an environment where we have one of the values it checks set correctly and the other incorrectly. For example:

[stack@undercloud ansible]$ sudo awk '/^metering_time_to_live|^event_time_to_live/' /etc/ceilometer/ceilometer.conf

metering_time_to_live = -1

event_time_to_live=259200

Method 1: ansible-playbook

The easiest way is to run the validation using the standard ansible-playbook command:

$ ansible-playbook /usr/share/openstack-tripleo-validations/validations/ceilometerdb-size.yaml

ansible-playbook-3

So, what happened?

Ansible output is colored to help read it more easily. The green “OK” lines for the “setup” and “Get TTL setting values from ceilometer.conf” tasks represent Ansible successfully finding the metering and event values, as per this task:

  - name: Get TTL setting values from ceilometer.conf
    become: true
    ini: path=/etc/ceilometer/ceilometer.conf section=database key={{ item }} ignore_missing_file=True
    register: config_result
    with_items:
      - "{{ metering_ttl_check }}"
      - "{{ event_ttl_check }}"

And the red and blue outputs come from this task:

  - name: Check values
    fail: msg="Value of {{ item.item }} is set to {{ item.value or "-1" }}."
    when: item.value|int < 0 or item.value  == None
    with_items: "{{ config_result.results }}"

Here, Ansible will issue a failed result (the red) if the “Check Values” task meets the conditional test (less than 0 or non-existent). So, in our case, since metering_time_to_live was set to -1 it met the condition and the task was run, resulting in the only possible outcome: failed.

With the blue output, Ansible is telling us it skipped the task. In this case this represents a good result. Consider that the event_time_to_live value is set to 259200. This value does not match the conditional in the task (item.value|int < 0 or item.value  == None). And since the task only runs when the conditional is met, and the task’s only output is to produce a failed result, it skips the task. So, a skip means we have passed for this value.

For even more detail you can run ansible-playbook in a verbose mode, by adding -vvv to the command:

$ ansible-playbook -vvv /usr/share/openstack-tripleo-validations/validations/ceilometerdb-size.yaml

You’ll find an excellent and interesting amount of information is returned and worth the time to review. Give it a try on your own environment. You may also want to learn more about Ansible playbooks by reviewing the full documentation.

Now that you’ve seen your first validation you can see how powerful they are. But the CLI is not the only way to run the validations.

Ready to go deeper with Ansible? Check out the latest collection of Ansible eBooks, including free samples from every title!

In the final part of the series we introduce validations with both the OpenStack scheduling service, Mistral, and the director web UI. Check back soon!

The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

Using Ansible Validations With Red Hat OpenStack Platform – Part 1

Ansible is helping to change the way admins look after their infrastructure. It is flexible, simple to use, and powerful. Ansible uses a modular structure to deploy controlled pieces of code against infrastructure, utilizing thousands of available modules, providing everything from server management to network switch configuration.

With recent releases of Red Hat OpenStack Platform access to Ansible is included directly within the Red Hat OpenStack Platform subscription and installed by default with Red Hat OpenStack Platform director.

In this three-part series you’ll learn ways to use Ansible to perform powerful pre and post deployment validations against your Red Hat OpenStack environment, utilizing the special validation scripts that ship with recent Red Hat OpenStack Platform releases.

opwithtoolsinside

Ansible, briefly …

Ansible modules are commonly grouped into concise, targeted actions called playbooks. Playbooks allow you to create complex orchestrations using simple syntax and execute them against a targeted set of hosts. Operations use SSH which removes the need for agents or complicated client installations. Ansible is easy to learn and allows you to replace most of your existing shell loops and one-off scripts with a structured language that is extensible and reusable.

Introducing … OpenStack TripleO Validations

Red Hat ships a collection of pre-written Ansible playbooks to make cloud validation easier. These playbooks come from the OpenStack TripleO Validations project (upstream, github). The project was created out of a desire to share a standard set of validations for TripleO-based OpenStack installs. Since most operators already have many of their own infrastructure tests, sharing them with the community in a uniform way was the next logical step.

On Red Hat OpenStack Platform director, the validations are provided by the openstack-tripleo-validations RPM installed during a director install. There are many different tests for all parts of a deployment: prep, pre-introspection, pre-deployment, post-deployment and so on. Validation can be run in three different ways: directly with ansible-playbook, via the Mistral workflow execution, and thought the director UI.

Let’s Get Started!

Red Hat OpenStack Platform ships with an Ansible dynamic inventory creation script called tripleo-ansible-inventory. With it you can dynamically include all Undercloud and Overcloud hosts in your Ansible inventory. Dynamic inventory of hosts makes it easier to do administrative and troubleshooting tasks against infrastructure in a repeatable way. This helps manage things like server restarts, log gathering and environment validation. Here’s an example script, run on the director node, to get Ansible’s dynamic inventory setup quickly.

#!/bin/bash

pushd /home/stack
# Create a directory for ansible
mkdir -p ansible/inventory
pushd ansible

# create ansible.cfg
cat << EOF > ansible.cfg
[defaults]
inventory = inventory
library = /usr/share/openstack-tripleo-validations/validations/library
EOF

# Create a dynamic inventory script
cat << EOF > inventory/hosts
#!/bin/bash
# Unset some things in case someone has a V3 environment loaded
unset OS_IDENTITY_API_VERSION
unset OS_PROJECT_ID
unset OS_PROJECT_NAME
unset OS_USER_DOMAIN_NAME
unset OS_IDENTITY_API_VERSION
source ~/stackrc
DEFPLAN=overcloud
PLAN_NAME=$(openstack stack list -f csv -c 'Stack Name' | tail -n 1 | sed -e 's/"//g')
export TRIPLEO_PLAN_NAME=${PLAN_NAME:-$DEFPLAN}
/usr/bin/tripleo-ansible-inventory $*
EOF

chmod 755 inventory/hosts
# run inventory/hosts.sh --list for example output

cat << EOF >> ~/.ssh/config
Host *
 StrictHostKeyChecking no
EOF
chmod 600 ~/.ssh/config

This script sets up a working directory for your Ansible commands and creates an Ansible configuration file called ansible.cfg, which includes the openstack-tripleo-validations playbooks in the Ansible library. This helps with running the playbooks easily. Next, the script creates the dynamic inventory file (~/inventory/hosts) by using /usr/bin/tripleo-ansible-inventory executed against the Overcloud’s Heat stack name.

You can run the inventory file with the –list flag to see what has been discovered:

[stack@undercloud inventory]$ /home/stack/ansible/inventory/hosts --list | jq '.'
{
  "compute": [
    "192.168.0.25",
    "192.168.0.34",
    "192.168.0.39",
    "192.168.0.35"
  ],
  "undercloud": {
    "vars": {
      "ansible_connection": "local",
      "overcloud_admin_password": "AAABBBCCCXXXYYYZZZ",
      "overcloud_horizon_url": "http://10.12.48.100:80/dashboard"
    },
    "hosts": [
      "localhost"
    ]
  },
  "controller": [
    "192.168.0.23",
    "192.168.0.27",
    "192.168.0.33"
  ],
  "overcloud": {
    "vars": {
      "ansible_ssh_user": "heat-admin"
    },
    "children": [
      "controller",
      "compute"
    ]
  }
}

We now have a dynamically generated inventory as required, including groups, using the director’s standard controller and compute node deployment roles.

We’re now ready to run the validations! 

Ready to go deeper with Ansible? Check out the latest collection of Ansible eBooks, including free samples from every title!

This is the end of the first part of our series. Check back shortly for Part 2 to learn how you can use this dynamic inventory file with the included validations playbooks!

The “Operationalizing OpenStack” series features real-world tips, advice, and experiences from experts running and deploying OpenStack.

What’s new in Red Hat OpenStack Platform 11?

We are happy to announce that Red Hat OpenStack Platform 11 is now Generally Available (GA).

Version 11 is based on the upstream OpenStack release, Ocata, the 15th release of OpenStack. It brings a plethora of features, enhancements, bugfixes, documentation improvements and security updates. Red Hat OpenStack Platform 11 contains the additional usability, hardening and support that all Red Hat releases are known for. And with key enhancements to Red Hat OpenStack Platform’s deployment tool, Red Hat OpenStack Director, deploying and upgrading enterprise, production-ready private clouds has never been easier. 

So grab a nice cup of coffee or other tasty beverage and sit back as we introduce some of the most exciting new features in Red Hat OpenStack Platform 11!

Composable Upgrades

By far, the most exciting addition brought by Red Hat OpenStack Platform 11 is the extension of composable roles to now include composable upgrades.

But first, composable roles

As a refresher, a composable role is a collection of services that are grouped together to deploy the Overcloud’s main components. There are five default roles (Controller, Compute, BlockStorage, ObjectStorage, and CephStorage) allowing most common architectural scenarios to be achieved out of the box. Each service in a composable role is defined by an individual Heat template following a standardised approach that ensures services implement a basic set of input parameters and output values. With this approach these service templates can be more easily moved around, or composed, into a custom role. This creates greater flexibility around service placement and management.

And now, composable upgrades …

Before composable roles, upgrades were managed via a large set of complex code to ensure all steps were executed properly. By decomposing the services into smaller, standardized modules, the upgrade logic can be moved out of the monolithic and complex script into the service template directly. This is done by a complete refactoring of the upgrade procedure into modular snippets of Ansible code which can then be integrated and orchestrated by Heat. To do this each service’s template has a collection of Ansible plays to handle the upgrade steps and actions. Each Ansible play has a tagged value to allow heat to step through the code and execute in a precise and controlled order. This is the same methodology used by puppet and the “step_config” parameter already found in the “outputs” section of each service template.

Heat iterates through the roles and services and joins the services’ upgrade plays together into a larger playbook. It then executes the plays, by tag, moving through the upgrade procedure.

For example, take a look at Pacemaker’s upgrade_tasks section (from tripleo-heat-templates/puppet/services/pacemaker.yaml):

      upgrade_tasks:
        - name: Check pacemaker cluster running before upgrade
          tags: step0,validation
          pacemaker_cluster: state=online check_and_fail=true
          async: 30
          poll: 4
        - name: Stop pacemaker cluster
          tags: step2
          pacemaker_cluster: state=offline
        - name: Start pacemaker cluster
          tags: step4
          pacemaker_cluster: state=online
        - name: Check pacemaker resource
          tags: step4
          pacemaker_is_active:
            resource: "{{ item }}"
            max_wait: 500
          with_items: {get_param: PacemakerResources}
        - name: Check pacemaker haproxy resource
          tags: step4
          pacemaker_is_active:
            resource: haproxy
            max_wait: 500
          when: {get_param: EnableLoadBalancer}

Heat executes the play for step0, then step1, then step2 and so on. This is just like running ansible-playbook with the -t or –tags option to only run plays tagged with these values.

Composable upgrades help to support trustworthy lifecycle management of deployments by providing a stable upgrade path between supported releases. They offer simplicity and reliability to the upgrade process and the ability to easily control, run and customize upgrade logic in a modular and straightforward way.

Increased “Day 0” HA (Pacemaker) Service placement flexibility

New in version 11, deployments can use composable roles for all services. This means the remaining pacemaker-managed services, such as RabbitMQ and Galera, traditionally required to be collocated on a single controller node, can now be deployed as custom roles to any nodes. This allows operators to move core service layers to dedicated nodes increasing security, scale, and service design flexibility.

Please note: Due to the complex-nature of changing the pacemaker-managed services in an already running Overcloud we recommend consulting Red Hat support services before attempting to do so.

Improvements for NFV

Co-location of Ceph on Compute now supported in production (GA)

Co-locating Ceph on Nova is done by placing the Ceph Object Storage Daemons (OSDs) directly on the compute nodes. Co-location lowers many cost and complexity barriers for workloads that have minimal and/or predictable storage I/O requirements by reducing the number of total nodes required for an OpenStack deployment. Hardware previously dedicated for storage-specific requirements can now be utilized by the compute footprint for increased scale. With version 11 co-located storage is also now fully supported for deployment by director as a composable role. Operators can more easily perform detailed and targeted deployments of co-located storage, including technologies such as SR-IOV, all from a custom role. The process is fully supported with comprehensive documentation and tuning support (track this BZ for version 11 specifics).

For Telcos, support for co-locating storage can be helpful for optimizing workloads and deployment architectures on a varied range of hardware and networking technologies within a single OpenStack deployment.

VLAN-Aware VMs now supported in production (GA)

A VLAN-aware VM, or more specifically, “Neutron Trunkports,” is how an OpenStack instance can support VLAN tagged frames across a single vNIC. This allows an operator to use fewer vNICs to access many separate networks, significantly reducing complexity by reducing the need for one vNIC for each network. Neutron does this by allowing subports off the original parent, effectively turning the main parent port into a virtual trunk. These subports can have their own segmentation id’s assigned directly to them allowing an operator to assign each port its own VLAN.

(Image courtesy of https://wiki.openstack.org/wiki/Neutron/TrunkPort; used under Creative Commons)

Version bumps for key virtual networking technologies

DPDK now version 16.11

DPDK 16.11 brings non-uniform memory access (NUMA) awareness to openvswitch-dpdk deployments. Virtual host devices comprise of multiple different types of memory which should all be allocated to the same physical node. 16.11 uses NUMA awareness to achieve this in some of the following ways:

  • 16.11 removes the requirement for a single device-tracking node which often creates performance issues by splitting memory allocations when VMs are not on that node
  • NUMA ID’s can now be dynamically derived and that information used by DPDK to correctly place all memory types on the same node
  • DPDK now sends NUMA node information for a guest directly to Open vSwitch (OVS) allowing OVS to allocate memory more easily on the correct node
  • 16.11 removes the requirement for poll mode driver (PMD) threads to be on cores of the same NUMA node. PMDs can now be on the same node as a device’s memory allocations

Open vSwitch now version 2.6

OVS 2.6 lays the groundwork for future performance and virtual network requirements required for NFV deployments, specifically in the ovs-dpdk deployment space. Immediate benefits are gained by currency of features and initial, basic OVN support. See the upstream release notes for full details.

CloudForms Integration

Red Hat OpenStack Platform 11 remains tightly integrated with CloudForms. It has been fully tested and supports features such as:

  • Tenant Mapping: finds and lists all OpenStack tenants as CloudForms tenants and they remain in synch. Create, update and delete of CloudForms tenants are reflected in OpenStack and vice-versa
  • Multisite support where one OpenStack region is represented as one cloud provider in CloudForms
  • Multiple domains support where one domain is represented as one cloud provider in CloudForms
  • Cinder Volume Snapshot Management can be done at volume or instance level. A snapshot is a whole new volume and you can instantiate a new instance from it, all from Cloudforms

OpenStack Lifecycle: Our First “Sequential” Release

Long Life review …

With OSP 10 we introduced the concept of the Long Life release. Long Life releases allow customers who are happy with their current release and without any pressing need for specific feature updates to remain supported for up to five years. We have designated every 3rd release as Long Life. For instance, versions 10, 13, and 16 are Long Life, while versions 11, 12, 14 and 15 are sequential. Long Life releases allow for upgrades to subsequent Long Life releases (for example, 10 to 13 without stepping through 11 and 12). Long Life releases generally have an 18 month cadence (three upstream cycles) and do require additional hardware for the upgrade process. Also, while procedures and tooling will be provided for this type of upgrade, it is important to note that some outages will occur.

Now, Introducing … Sequential!

Red Hat OpenStack Platform 11 is the first “sequential” release (i.e. N+1). It is supported for one year and is released immediately into a “Production Phase 2” release classification. All upgrades for this type of release must be done sequentially (i.e. N+1). Sequential releases feature tighter integration with upstream projects and allow customers to quickly test new features and to deploy using their own knowledge of continuous integration and agile principles. Upgrades are generally done without major workload interruption and customers typically have multiple datacenters and/or highly demanding performance requirements. For more details see Red Hat OpenStack Platform Lifecycle (detailed FAQ as pdf) and Red Hat OpenStack Platform Director Life Cycle.

Additional notable new features of version 11

A new Ironic inspector plugin can process Link Layer Discovery Protocol (LLDP) packets received from network switches during deployment. This can significantly help deployers to understand the existing network topology during a deployment and reduces trial-and-error by helping to validate the actual physical network setup presented to a deployment. All data is collected automatically and stored in an accessible format in the Undercloud’s Swift install.

There is now full support for collectd agents to be deployed to the Overcloud from director using composable roles. Performance monitoring is now easier to do as collectd joins the other fully supported OpsTools services for availability monitoring (sensu) and log management (fluentd) present starting with version 10.

And please remember, this are agents, not the full server-side implementations. Check out how to implement the server components easily with Ansible by going to the CentOS OpsTools Special Interest Group for all the details.

Additional features landing as Tech Preview

Tech Preview Features should not be implemented in production. For full details please see: https://access.redhat.com/support/offerings/techpreview/

Octavia

Octavia brings a robust and mature LBaaS v2 API driver to OpenStack and will eventually replace the legacy HAProxy namespace driver currently found in Newton. It will become not only a load balancing driver but also the load balancing API hosting all the other drivers. Octavia is a now a top level project outside of Neutron; for more details see this excellent update talk from the recent OpenStack Summit in Boston.

Octavia implements load balancing via a group of virtual machines (or containers or bare metal servers) controlled via a controller called “Amphora.” It manages, among other things, the images used for the balancing engine. In Ocata, Amphora introduces image support for Red Hat Enterprise Linux, Centos and Fedora. Amphora images (collectively known as amphorae) utilize HAProxy to implement load balancing. For full details of the design, consult the Component Design document.

To allow Red Hat OpenStack Platform users to try out this new implementation in a non-production environment operators can deploy a Technology Preview with director starting with version 11.

Please Note: Octavia’s director-based implementation is currently scheduled for a z-stream release for Red Hat OpenStack Platform Version 11. This means that while it won’t be available on the day of the release it will be added to it shortly. However, please track the following bugzilla, as things may change at the last moment and affect this timing.

OpenDaylight

Red Hat OpenStack Platform 11 increases ODL support in version 10 by adding deployment of the OpenDaylight Boron SR2 release to director using a composable role.

Ceph block storage replication

The Cinder RADOS block driver (RBD) was updated to support RBD mirroring (promote/demote location) in order to allow customers to support essential concepts in disaster recovery by more easily managing and replicating their data using RBD-mirroring via the Cinder API.

Cinder Service HA 

Until now the cinder-volume service could run only in Active/Passive HA fashion. In version 11, the Cinder service received numerous internal fixes around locks, job distribution, cleanup, and data corruption protection to allow for an Active/Active implementation. Having a highly available Cinder implementation may be useful for uptime reliability and throughput requirements.

To sum it all up

Red Hat OpenStack Platform 11 brings important enhancements to all facets of cloud deployment, operations, and management. With solid and reliable upgrade logic enterprises will find moving to the next version of OpenStack is easier and smoother with a lower chance for disruption. The promotion of important features to full production support (GA) keeps installs current and supported while the introduction of new Technology Preview features gives an accessible glimpse into the immediate future of the Red Hat OpenStack Platform.

More info

For more information about Red Hat OpenStack Platform please visit the technology overview page, product documentation, release notes and release annoucement.

To see what others are doing with Red Hat OpenStack Platform check out these use cases

And don’t forget you can evaluate Red Hat OpenStack Platform for free for 60 days to see all these features in action.

Using OpenStack: Leveraging Managed Service Providers

Since 2011, when OpenStack was first released to the community, the following and momentum behind it has been amazing. In fact, it quickly became one of the fastest growing open source projects in the history of open source. Now, with nearly 700 community sponsors, over 600 different modules, and over 50,000 lines of code contributed, OpenStack has become the default platform of choice for much of the private and public cloud infrastructure.

This kind of growth doesn’t happen by chance. It’s because businesses and organizations alike have experienced *real* benefits, whether it be creating greater efficiency, faster time to market, automated infrastructure management, or simply saving them money, just to name a few.

However, as OpenStack technology and the cloud market matures, how OpenStack is delivered to customers by vendors, as well as how businesses choose to consume the technology, has introduced many new methodologies and options. These new options simply provide customers the flexibility to determine the best consumption method for their unique business that allows them to reap all the benefits of OpenStack, but with minimal disruption – all while adhering to their IT operational goals, policies, and staff capabilities.

One option that has come from this maturity, is the option for a “managed” cloud, being delivered by a managed service provider (or MSP). This option allows customers to maintain a private cloud, either on premises or off, but leave the burden of deployment, configuration, and day-to-day management to a hired, experienced team of experts. And while this does cost you a monthly/annual subscription to retain their services, it relieves you from the complexities of having to do this yourself. Many businesses may find that their internal IT teams may be understaffed, unskilled, or simply better off utilizing their resources elsewhere.

In this case, businesses might want to consider an OpenStack managed service provider to help move their business into the digital age and create modern cloud services to offer their internal end users or external customers.

At Red Hat, we believe OpenStack is a key component to digital transformation and helping move organizations to a modern cloud solution stack. And we’ve worked hard to establish Red Hat OpenStack Platform as an industry standard for private and public cloud infrastructure. As a result, we have hundreds of customers including the likes of BBVA; Cambridge University; FICO; NASA’s Jet Propulsion Laboratory; Paddy Power Betfair; Produban; Swisscom; UKCloud; and Verizon to name a few. In addition, we’ve spent years working with our partner ecosystem to establish deep, engineering-level partnerships with our partners to provide a robust, enterprise-level cloud that is capable of standing up to the rigors of production deployments.

In particular, we’ve been working to establish strong partnerships with our managed service providers, that include engineering and product-level integration, as a way for our customers to maintain a consistent and high level of quality, regardless of how they choose to consume OpenStack. However, we recognized that businesses around the globe operate at different levels and have their own unique preferences for specific and strategic technology partners. So rather than work with only one global provider, we wanted to stick to what Red Hat does best and provide choice to our customers around the globe.

First, we started with the original creators of OpenStack themselves – Rackspace. If Rackspace recognized the quality and open source leadership Red Hat maintains, we knew we would be able to make OpenStack’s benefits more accessible to customers. And after years of collaboration, it made sense to come together on an OpenStack managed service offering. Then we continued our collaboration efforts, working more closely with Cisco to release Cisco Metacloud (formerly called Metapod) powered by Red Hat OpenStack Platform, as we know many companies rely on Cisco for their datacenter infrastructure and service needs. And more recently, we announced a joint offering with IBM and their BlueMix Private Cloud with Red Hat technology, which also includes Red Hat Ceph Storage to help customers meet their storage needs at scale.

And while some customers may choose a global service provider like the three I just mentioned, we also recognize that some customers prefer smaller, regional service providers; whether it be to adhere to security policies or maybe just to support local businesses. Regardless, we’ve established a large and growing ecosystem of regional managed service providers like UKCloud (UK), Detacon (Saudi Arabia), Swisscom (Switzerland), Epbi (Netherlands), Blackmesh (North America public sector), NEC America (public sector), and more. These regional providers can help you establish a foothold into the digital age by moving to a scalable and more secure private cloud to meet the demands of your customers and support the future growth of your business.

In addition to our expanding ecosystem of MSP partnerships, we’ve also empowered our existing customers with the flexibility of utilizing their existing Red Hat subscriptions with these certified managed service providers, should they choose to. Existing customers can utilize our Cloud Access Program to help maintain business continuity with their current Red Hat Subscriptions.
Our goal is to help businesses like yours with their digital transformation journey to the cloud. Regardless of how you choose to consume the latest software technologies like Red Hat OpenStack Platform or Red Hat Ceph Storage, we work hard to ensure we’re always putting our customer needs first, providing long-term stability with minimal disruption to your business, and building everything on open standards and APIs to provide the flexibility and choice you need to meet the demands of your growing business. To learn more about Red Hat’s cloud technologies or find a certified managed service provider near you, reach out to us anytime. We look forward to helping you achieve your digital transformation goals!