X-SciTech

Researchers Tout "Wimpy Nodes" for Net

October 16, 2009 / 11:59 AM EDT / CNET

This story was written by CNET's Stephen Shankland

Mainstream servers are growing increasingly brawny with multicore processors and tremendous memory capacity, but researchers at Carnegie Mellon University and Intel Labs Pittsburgh think 98-pound weaklings of the computing world might be better suited for many of the jobs on the Internet today.

The alternative the researchers advocate is named FAWN, short for Fast Array of Wimpy Nodes. It's described in a paper just presented at the Symposium on Operating Systems Principles.

In short, the researchers believe some work can be managed with lower expense and lower power consumption using a cluster of servers built with lower-end processors and flash memory than with a general-purpose server. And these days, with green technology in vogue and power costs no longer an afterthought, efficient computing is a big deal.

"We were looking at efficiency at sub-maximum load. We realized the same techniques could serve high loads more efficiently as well," said David Andersen, the Carnegie Mellon assistant professor of computer science who helped lead the project.

It's not just academic work. Google, Intel, and NetApp are helping to fund the project, and the researchers are talking to Facebook, too. "We want to understand their challenges," Andersen said.

Flash memory and low-end processors

Today's servers store data on a combination of slow but capacious hard drives and fast but expensive memory. In contrast, FAWN relies on flash memory, storage technology with a price and performance in between the two. Its CompactFlash memory cards are more ordinarily found in higher-end digital cameras, but they are relatively easily repurposed since they communicate with a conventional hard drive interface.

In addition, each computing node in the FAWN system doesn't use powerful Xeon or Athlon processors. In their place are cheap and relatively anemic models--a five-year-old AMD chip in the first prototype and Intel's Atom for handheld PCs and eventually mobile phones in the second-generation design under way, Andersen said.

The system uses a front end to communicate with the outside world, and an internal network adds reliability and flexibility. Each processor runs a stripped-down version of Ubuntu Linux.

Commercialization, though, will come from elsewhere.

"I told my students I'd be willing to do a start-up if they wanted to. Collectively we decided finishing a bunch of Ph.D.s is more important," Andersen said.

There are plenty of other server makers in the world, though, and many of them are interested in flash memory packaged densely into what are called solid-state drives. Sun Microsystems has keen interest in flash memory and built a specialized database system for would-be acquirer Oracle. Dell sees great promise in solid-state drives.

And addressing the brains in FAWN's design, start-up SeaMicro apparently is developing Atom-based servers with numerous processors. Founder Anil Rao is one inventor on a SeaMicro patent application for a computer system with numerous independent processor modules that share access to shared resources including storage, networking, and boot-up technology called the BIOS.

The key value of FAWN

So where exactly is FAWN useful? Andersen makes no claims that it's good for everything--but the use cases are often central to companies at the center of the ongoing Internet revolution.

Specifically, it's good for situations where companies must store a lot of smaller tidbits of information that's read from the storage system much more often than it's written. Often this data is stored in a form called "key-value pairs." These consist of an indexing key and some associated data: "The key might be 'Dave Andersen update 10,579.' The update value might be 'Back in Pittsburgh.'"

(Carnegie Mellon et al.)

But conventional memory is expensive, and hard drives are bad at retrieving lots of small bits of data such as image thumbnails or social-network contact names stored all over the disk. "If you look at kinds of workloads that challenge modern computers, those with lots of random access to input-output are incredibly inefficient on high-end CPUs," Andersen said.

(Left: The FAWN approach can be adjusted with hard drives or conventional memory to match various sizes of datasets or rates or the queries retrieving that data.)

Systems such as Amazon's Dynamo, Facebook's Memcached, and LinkedIn's Voldemort store data as key-value pairs. Google's MapReduce technology and Yahoo's Hadoop-based equivalent use key-value pairs in processing search engine data and other tasks.

Programming a FAWN cluster would be difficult, but tailoring it for key-value pairs means it can be packaged as a special-purpose server appliance. Customers need not know the details of its inner workings, Andersen said.

Cut the power

These large-scale systems don't come cheap. Besides the hardware, software, and maintenance costs, there's power, too--and companies often must pay for energy twice, in effect, because servers' waste heat means data centers must be cooled down.

The researchers compared how many datastore queries could be accomplished per unit of energy and found FAWN compelling: a conventional server with a quad-core Intel Q6700 processor, 2GB of memory, and an Mtron Mobi solid-state drive measured 52 queries per joule of energy compared to 346 for a FAWN cluster. And tests of a newer design show even more promise: "Our preliminary experience using Intel Atom-based systems paired with SATA-based Flash drives shows that they can provide over 1,000 queries per Joule," the paper said.

The approach can be tailored for different varieties of work. For larger data elements that needn't be accessed as often, FAWN clusters could be built with conventional hard drives. For those with smaller data elements needed even more frequently, FAWNs could use more conventional memory.

There have been server fads before--the early generation of blade servers, consisting largely of lower-end processors squeezed tightly together--were a commercial flop. But Internet-facing data centers are a big business these days, and with cloud computing on the rise, it's going to get bigger. So perhaps the FAWN approach will catch on at least in that area.

But if somebody wants to commercialize it, they'd better think about changing the name.

"I worry no manufacturer will ever want to produce a device called a wimpy node," Andersen said. "But we like the name."