Thursday, October 6, 2011

The Thursday Quote - Srikanth Nadhamuni, Vince Beiser

"Technology doesn't scale that elegantly. The problems you have at 100 million are different from problems you have at 500 million."
Srikanth Nadhamuni, Head of Technology for Aadhaar at the Unique Identification (UID) Authority of India, as quoted in Massive Biometric Project Gives Millions of Indians an ID by Vince Beiser, from Wired magazine September 2011

So you think your job is hard?

You feel good at the end of the day because you fixed that bug, got those to-do items done, gave as good as you got at the daily status meeting?

Well, give a thought to Srikanth Nadhamuni, whose goal is to "issue identification numbers linked to the fingerprints and iris scans of every single person in India" as Head of Technology for the Aadhaar project.

That's 1,200,000,000 people, give or take, as Vince Beiser explains in his article about Aadhaar; here's an excerpt to give a wider context:

The unprecedented scale of Aadhaar's data will make managing it extraordinarily difficult. One of Nadhamuni's most important tasks is de-duplication, ensuring that each record in the database is matched to one and only one person. That's crucial to keep scammers from enrolling multiple times under different names to double-dip on their benefits. To guard against that, the agency needs to check all 10 fingers and both irises of each person against those of everyone else. In a few years, when the database contains 600 million people and is taking in 1 million more per day, Nadhamuni says, they'll need to run about 14 billion matches per second. "That's enormous," he says.

Coping with that load takes more than just adding extra servers. Even Nadhamuni isn't sure how big the ultimate server farm will be. He isn't even totally sure how to work it yet. "Technology doesn't scale that elegantly," he says. "The problems you have at 100 million are different from problems you have at 500 million." And Aadhaar won't know what those problems are until they show up. As the system grows, different components slow down in different ways. There might be programming flaws that delay each request by an amount too tiny to notice when you're running a small number of queries—but when you get into the millions, those tiny delays add up to a major issue. When the system was first activated, Nadhamuni says, he and his team were querying their database, created with the ubiquitous software MySQL, about 5,000 times a day and getting answers back in a fraction of a second. But when they leaped up to 20,000 queries, the lag time rose dramatically. The engineers eventually figured out that they needed to run more copies of MySQL in parallel; software, not hardware, was the bottleneck. "It's like you've got a car with a Hyundai engine, and up to 30 miles per hour it does fine," Nadhamuni says. "But when you go faster, the nuts and bolts fall off and you go, whoa, I need a Ferrari engine. But for us, it's not like there are a dozen engines and we can just pick the fastest one. We are building these engines as we go along."

Next week: Barbara Liskov, Valerie Barr

No comments: