ECO: Edge-Cloud Optimization of 5G applications

In The 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid 2021), Melbourne, Victoria, Australia

Centralized cloud computing with 100+ milliseconds network latencies cannot meet the tens of milliseconds to sub- millisecond response times required for emerging 5G applications like autonomous driving, smart manufacturing, tactile internet, and augmented or virtual reality. We describe a new, dynamic runtime that enables such applications to make effective use of a 5G network, computing at the edge of this network, and resources in the centralized cloud, at all times. Our runtime continuously monitors the interaction among the microservices, estimates the data produced and exchanged among the microservices, and uses a novel graph min-cut algorithm to dynamically map the microservices to the edge or the cloud to satisfy application- specific response times. Our runtime also handles temporary network partitions, and maintains data consistency across the distributed fabric by using microservice proxies to reduce WAN bandwidth by an order of magnitude, all in an application- specific manner by leveraging knowledge about the application’s functions, latency-critical pipelines and intermediate data. We illustrate the use of our runtime by successfully mapping two complex, representative real-world video analytics applications to the AWS/Verizon Wavelength edge-cloud architecture, and improving application response times by 2x when compared with a static edge-cloud implementation.

Watch presentation …

cloud sharing by Creative Stall from the Noun Project

Physics for martial arts students

physics_for_martial_arts_students_thumbRight after the kickboxing class, I saw one of classmate and friend of mine trying to solve a physics exercise. I studied physics in high school, and I had a exam in college; it was long time ago and even at that time I wasn’t very strong in the matter (and I used to soothe myself (or to fool myself?) quoting Linus Torvalds “While in physics you’re supposed to figure out how the world is made up, in computer science you create the world.” But that’s a different story), but I decided that I could help, after all it didn’t sound too difficult.

The problem

The problem required finding the final velocity (rounded at one decimal digit), v_f of a ball thrown down from a 40 meters tall tower with a initial velocity, v_0, of 12 m/s. The exercise book was providing also the solution: 30.5 m/s. Read more of this post

First step of my journey for rediscovering the straight path (aka I’m using again GNU Emacs after many years)

“… mi ritrovai per una selva oscura, ché la diritta via era smarrita …” (Divina Commedia – Canto I)

gnu_danteLike Dante in his allegoric journey, I (also) “found myself deep in a darkened forest, for I had lost all trace of the straight path” in the computing world.

The “darkened forest” in which I found myself is the world of proprietary software, in fact I found myself using some of the wrong (as in closed source) operating systems, programming languages and text editors. Sometimes there is no alternative in using proprietary software (i.e. a compiler for a strange architecture, a library that I absolute need to use for my daytime job).

But why I wasn’t using free software for the operating system and (even more) the text editor when there are the free software alternatives, and many of them are also technically better (way better), than the proprietary software I was using? That’s simple I just followed the hype. Read more of this post

A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute Clusters

In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International

xeon_phi_knapsackWe propose a cluster scheduling technique for compute clusters with Xeon Phi coprocessors. Even though the Xeon Phi runs Linux which allows multiprocessing, cluster schedulers generally do not allow jobs to share coprocessors because sharing can cause oversubscription of coprocessor memory and thread resources. It has been shown that memory or thread oversubscription on a many core like the Phi results in job crashes or drastic performance loss. We first show that such an exclusive device allocation policy causes severe coprocessor underutilization: for typical workloads, on average only 38% of the Xeon Phi cores are busy across the cluster. Then, to improve coprocessor utilization, we propose a scheduling technique that enables safe coprocessor sharing without resource oversubscription. Jobs specify their maximum memory and thread requirements, and our scheduler packs as many jobs as possible on each coprocessor in the cluster, subject to resource limits. We solve this problem using a greedy approach at the cluster level combined with a knapsack-based algorithm for each node. Every coprocessor is modeled as a knapsack and jobs are packed into each knapsack with the goal of maximizing job concurrency, i.e., as many jobs as possible executing on each coprocessor. Given a set of jobs, we show that this strategy of packing for high concurrency is a good proxy for (i) reducing make span, without the need for users to specify job execution times and (ii) reducing coprocessor footprint, or the number of coprocessors required to finish the jobs without increasing make span. We implement the entire system as a seamless add on to Condor, a popular distributed job scheduler, and show make span and footprint reductions of more than 50% across a wide range of workloads.

Property for Sale: Vari, Syros, Cyclades, Greece

36963sft (3434sqm) of residential land in the locality Vari (prefecture of Posidonia), Syros island (Cyclades, Aegean Sea), Greece.

The real estate is street front and it is composed by two adjacent lots (1666sqm + 1768sqm, 17933sft + 19031sft).

Each batch has a pre-approved buildability of 400sqm (4305sft) for residential purpose, but it can be extended for hotel accommodation.

The real estate is located 800m (2625ft) from the shore of Vari, 5km (3mi) from the town of Ermoupoli, 4km (2.5mi) from the airport and 5km from the sea-dock.

Please find more info, as well as the contacts of the seller, on http://www.syros-realestate.eu/.

Snapify: capturing snapshots of offload applications on Xeon Phi manycore processors

In Proceedings of the 23nd international symposium on High-performance parallel and distributed computing (HPDC ’14). ACM, New York, NY, USA, 1-12.

snapifyIntel Xeon Phi coprocessors provide excellent performance acceleration for highly parallel applications and have been deployed in several top-ranking supercomputers. One popular approach of programming the Xeon Phi is the offload model, where parallel code is executed on the Xeon Phi, while the host system executes the sequential code. However, Xeon Phi’s Many Integrated Core Platform Software Stack (MPSS) lacks fault-tolerance support for offload applications. This paper introduces Snapify, a set of extensions to MPSS that provides three novel features for Xeon Phi offload applications: checkpoint and restart, process swapping, and process migration. The core technique of Snapify is to take consistent process snapshots of the communicating offload processes and their host processes. To reduce the PCI latency of storing and retrieving process snapshots, Snapify uses a novel data transfer mechanism based on remote direct memory access (RDMA). Snapify can be used transparently by single-node and MPI applications, or be triggered directly by job schedulers through Snapify’s API. Experimental results on OpenMP and MPI offload applications show that Snapify adds a runtime overhead of at most 5%, and this overhead is low enough for most use cases in practice.

Continue reading the complete paper …