Alexander Stolyar: Waiting Rooms in Cyberspace

10/6/2019 Doug Peterson

Stolyar is one of the country’s leading experts in the stability of queues, or jobs lined up to be processed by servers.

Written by Doug Peterson

Imagine a doctor’s waiting room with people coming and going throughout the day.  “Stability” means that at any given time, you might have three or four people waiting to see the doctor, maybe even five. But if the patients begin to back up, filling the room so there isn’t a spare seat or magazine to go around, then you’ve got problems. This is an unstable system, says Alexander Stolyar, ISE professor.

Alexander Stolyar
Alexandr Stolyar
This is also a picture of the kind of work that Stolyar has been pursuing for most of his 30-year career, only the “waiting rooms” in his research are mostly in cyberspace. Stolyar is one of the country’s leading experts in the stability of queues, or jobs lined up to be processed by servers.

“In cloud computing, I look at how to process large amounts of data,” Stolyar says. Thousands of servers in a cloud system accept jobs from computers, so the problem is getting the system to distribute these jobs in an efficient way. This is known as a resource allocation problem. With all of the data flooding the servers in the cloud, how do you dynamically distribute the workload in the simplest and most effective way?

Every time a job arrives, the system may not know which server is the least loaded and can accept a job, Stolyar says. In addition, this may not be easy to find out in real time because the number of servers may be very large.

Stolyar has a solution to this problem, and he calls it “pull-based algorithms.”

Pull-based algorithms make it possible for servers to tell the router, or dispatcher, to send them the load. The servers “pull” the jobs to themselves, rather than wait for the router to “push” the load to them. It would be like office employees asking for more work whenever they are freed up, rather than sitting around waiting for the boss to bring the work to them.

One issue, he adds, is that when you have a large number of servers, a fraction of them will be idle at any given time, so you’ll want to turn them off because it’s not efficient to run more servers than you actually need. But when idle servers turn off, the load time increases when they turn back on. 

“It’s not like a switch where they turn on instantly,” Stolyar says. “They have to warm up before they can start doing things.”

The family of algorithms that Stolyar works on helps to determine which servers to turn off and on—and when to do it.

His algorithms also deal with “stochastic bin packing.” As he explains, some jobs being sent out to servers in the cloud may require a lot of memory but little processing power. Other jobs may require the exact opposite—a lot of processing power but little Stolyar and Wang have developed an algorithm that controls an inventory system for large-scale operations in cases when the “lead time” is random. The lead time is the time from when customers place an order to when they receive it, and Stolyar and Wang’s algorithm determines the optimal time to ship new items.

They have completed the proof of concept, and he says, “Now we have big plans on how to make the new inventory control algorithm practical.” 

In another collaboration, this one with a colleague from Edinburgh, Scotland, Stolyar is working on algorithms for wireless transmission—something he also worked on at Bell Labs.

In wireless systems, he says, multiple transmitters sending data on a shared wireless channel can interfere with each other. So there has to be an algorithm that arbitrates which transmitter will get access to the channel. One of the goals is to maximize throughput, but for many algorithms it’s difficult to determine what the throughput actually is under different conditions.

“Sometimes it’s even difficult to guess what is,” he says. “But our theoretical results provide some estimates.”

In this work, the key word once again is “stability.” As data packets line up, waiting to be processed like patients in a waiting room, how do you handle and distribute the workload?

 “Stability is the ability of a network to handle the workload, and this has been one of my main interests for 30 years,” he says. “It continues to fascinate me.”

Related Links

Share this story

This story was published October 6, 2019.