Alan Bradley: [holds up his pager] I was paged last night.

Sam Flynn: Oh, man, still rocking the pager? Good for you.

– TRON: Legacy

Here’s my card. It’s got my cell number, my pager number, my home number and my other pager number. I never take vacations, I never get sick. And I don’t celebrate any major holidays. 

– Dwight Schrute in NBC’s “The Office”

As Software Eats the World and more and more of our daily activities move online, we depend ever more on IT infrastructure. In a day spent emailing, tweeting, catching up on the news, checking Facebook, shopping on Amazon, watching a movie on Netflix, banking at an ATM — it’s all too easy to forget that underlying all of these “mission-critical” activities are servers, routers, load balancers, switches, storage and millions of line of code.

In a world where we’ve come to view the Internet as a utility, a major website outage is almost as serious as a power outage — and usually affects far more customers. Amazon Web Services had several major outages in 2012, taking down Netflix, Reddit, Heroku and many other sites in July and December. In October, it was the turn of YouTube, Dropbox, Tumblr and Google AppEngine. GoDaddy’s September outage affected up to 5 million hosted websites and 50 million domain names for six hours.

In addition to negative publicity and customer dissatisfaction, downtime now has an enormous financial cost. A 2010 study reported that U.S. businesses suffer an average of 10 hours of downtime per year, at a cost of $26.5 billion. Another analysis suggests that one hour of downtime costs the average business $300,000. If there had been a major outage on our most recent Black Friday, it would have jeopardized $1 billion in online sales.

Dealing with downtime

Of course, modern IT infrastructure has been built for redundancy and is extensively instrumented. Automated tools such as Nagios, Keynote, New Relic, Pingdom, SolarWinds and Splunk monitor every element of the stack and alert engineers immediately to urgent or emerging issues. In fact, today’s machines are very good at detecting and reporting incidents. It’s when those incidents get handed off to humans for remediation that things sometimes break down — because the humans are still using processes and technology that haven’t changed much in ten to fifteen years.

When I was at Loudcloud back in 2001, everyone carried a pager. A small team in our 24/7 Network Operations Center (NOC) would watch for critical monitoring system alerts on big screens and then page the administrator on duty, no matter what time of the day or night. If the administrator couldn’t resolve the issue, they would escalate to developers, who also wore pagers. The process was labor-intensive and error-prone, involving emails, phone-calls, written duty rosters and escalation schedules.

While most other aspects of IT have changed dramatically, incident management in many IT organizations looks remarkably like it did back in 2001. The cloud has done away with the need for many NOCs, and the move to DevOps may mean developers are more directly involved in issue resolution, but the processes are frequently still manual, cumbersome and inefficient. Moreover, today’s large complex systems are never the responsibility of just one person — database administrators, developers, and system administrators all have a role to play — and the more people involved, the more complex and error-prone the process becomes. Reporting of incidents and handoffs from person to person are often done manually via email or SMS. Escalations and problem descriptions are handled via person-to-person phone calls. Engineers consult spreadsheets to see who’s on duty at a particular time. I’m aware of at least one major cloud service provider whose ops people still wear pagers.


Having studied software engineering at the University of Waterloo and then built and supported large-scale systems at, Alex Solomon, Andrew Miklas and Baskar Puvanathasan set out to bring IT incident management into the twenty-first century. The result is PagerDuty, a modern SaaS-based platform for incident tracking, alerting, and on-call management.

In a nutshell, PagerDuty collects alerts from a customer’s existing IT monitoring tools and alerts the on-duty engineer if there’s a problem. PagerDuty doesn’t replace any particular monitoring tool. Instead, the system sits on top of existing monitoring systems and aggregates all of the errors generated by these tools in a single place.

incidents PD

PagerDuty allows each engineer to configure his or her own customized notification chain. Engineers can opt to receive incident alerts using any combination of phone calls, SMSes, emails and iOS push notifications. So, for example, you could opt to get a push notification immediately when an incident occurs, then an SMS 2 minutes later, then a phone call 5 minutes after that. PagerDuty also allows the on-call engineer to acknowledge, escalate or resolve a triggered incident directly from his or her mobile phone. The company utilizes multiple redundant data centers and SMS and telephony gateways to guarantee reliable message delivery across more than 100 countries.

Incidents in PagerDuty are routed according to an escalation policy. A policy specifies how incidents should be escalated within each team. For instance, you can configure a sysadmin policy to route incidents to a primary on-call engineer and automatically escalate the incident to a secondary on-call if the primary doesn’t answer within 20 minutes. Escalations are crucial to incident response because they add redundancy and ensure nothing falls through the cracks.

escalations PD

PagerDuty lets you build different on-call schedules for each specialization within the organization. For example, you can create one schedule for your database administrators, and another for your network engineers. Incidents can be easily configured to alert the appropriate on-call specialist, ensuring that problems are always automatically dispatched to those who are on-duty and best able to handle them. No more spreadsheets!

on call sched PD

Getting customer feedback on PagerDuty proved to be very easy, as it turned out that a large majority of our portfolio companies were using the product — and they were overwhelmingly positive about how it has dramatically simplified and improved their IT operations management. In fact, the company already has several thousand paying customers, including web giants such as Microsoft, Electronic Arts, Adobe, Rackspace and Intuit as well as a growing number of enterprise IT organizations. Overall, PagerDuty has achieved a remarkable amount on about $2 million dollars in initial funding, including generating a substantial and rapidly growing amount of recurring revenue. With a market of almost 10 million infrastructure and application specialists worldwide and multiple ways to expand within the multi-billion dollar IT Service Management segment, this company has a lot of potential.

In closing

The world’s inexorable transition to cloud computing and modern large-scale mission-critical IT systems is creating the opportunity for an exciting new generation of software companies like PagerDuty to play a critical role in its enablement. Many Andreessen Horowitz portfolio companies, for example GitHub, MixPanel, GoodData, CipherCloud and Snaplogic, are members of this class.

As veterans of IT systems management and automation ourselves, we are excited to lead a $10.7 million investment round for PagerDuty and welcome them to the a16z family.

“There is no such thing as a new idea. It is impossible. We simply take a lot of old ideas and put them into a sort of mental kaleidoscope. We give them a turn and they make new and curious combinations.”
—Mark Twain

Although Mark Twain’s statement that “There is no such thing as a new idea” may be an exaggeration, some of the world’s best businesses have been built on leveraging old ideas in a “new and curious combination”.  Some great ideas work spectacularly the first time around, handsomely rewarding the original entrepreneurs.  Others fail or flounder initially, sometimes multiple times, before a combination of the right entrepreneur and the right market and technology conditions unlocks their true potential.  Some of the best ideas of all have a great first run, only to return decades later and succeed all over again—reinvigorated with the latest technology and fresh thinking by a new generation of entrepreneurs who may not even be aware they’re leveraging an old idea.

One of the things I love about the technology business is seeing old ideas made new again.   It worked before—will it work again?  It didn’t work before—will it work this time?  At Andreessen Horowitz, it’s not unusual to see ideas from the first Internet wave of the late ‘90s.  Pet food or groceries on the web—maybe this time?   On the other hand, we just invested in a great idea that goes back almost a century—more on that later.  But first, a couple of examples—from 1968 and 1994.

The Information Superhighway Started in…Orlando?

Sometimes a great idea fails to take hold the first time around because the technology doesn’t exist yet to enable it.  Today, we take for granted a virtually infinite selection of online entertainment, information, communications and ecommerce—but the original idea was revolutionary, and unachievable, less than twenty years ago.  I know, because I was there.

In 1994, Time Warner launched the Full Service Network (FSN), a groundbreaking interactive system that promised to give consumers an unlimited choice of home entertainment, information, communications and shopping, starting with 4,000 lucky households in Orlando, Florida.  Customers would cue movies and games on demand, shop the world’s stores from their armchairs, and order Domino’s pizza with a click of the remote control.

Despite the initial fanfare, the project failed due to its reliance on the television as the delivery device, massively expensive proprietary technology ($5,000 set-top boxes and a massive Silicon Graphics (SGI) server at every head-end), and the impossible challenge of inventing and integrating everything involved—interactive content, user interfaces, ordering and billing systems, and so on—from scratch.

In today’s world of Amazon, Netflix and Xbox, it’s funny to read quotes like this one from a 1994 New York Times article about the FSN (it’s worth reading the whole article to realize how far we’ve come):

There is great uncertainty about which services will be popular and whether they can be offered profitably, particularly exotic services, like interactive video shopping and grippingly realistic on-line video games.

It seems almost laughable now, but back then it didn’t seem so obvious.  As a VP at US WEST, one of the major players in Orlando and other projects, I ended up with responsibility for GOtv, an interactive TV guide to local movies and restaurants developed for this brave new world.  Orlando couch potatoes who actually wanted to go out would reserve a table or buy movie tickets in advance on GOtv – something like a cross between today’s Fandango and OpenTable.  At least that was the vision.  Like the FSN, GOtv was a great idea, but based on a flawed technology paradigm.

Of course, the idea of providing full interactivity and unlimited choice to consumers was a very sound one.  That same year, on the other side of the country, my now partner Marc Andreessen and SGI co-founder Jim Clark founded Netscape.  Within no time, Andreessen, Clark and a whole new generation of Internet entrepreneurs were leveraging affordable PCs, open standards and powerful network effects to create a massive ecosystem of providers and consumers and successfully deliver on the core idea of the Full Service Network.  (As for me, I redirected our visionary but flawed TV efforts to the emerging Internet, and soon entered the action directly by joining @Home Network, the pioneer of today’s high-speed Internet, in 1997.)

The Full Service Network: Great idea, wrong technology paradigm.

Loebel’s Blue Envelope

Unlike the Full Service Network, some ideas now succeeding phenomenally on the Internet also worked spectacularly the first time around, using the technology of the time.  In 1968, an entrepreneur named Terry Loebel invested $500 to mail a “cooperative envelope” with offers from 14 local businesses to 20,000 households in Clearwater, Florida (yes, Florida again!).  His sales staff made a name for themselves by zipping from merchant to merchant on roller skates.  The idea of mailing local offers to consumers proved highly compelling to small businesses.  Loebel’s startup, Valpak, grew explosively to become a marketing behemoth, ultimately mailing 20 billion offers a year in 500 million blue envelopes to 45 million households.

Forty years after Loebel’s first mailing, Chicago entrepreneurs Andrew Mason and Eric Lefkofsky founded Groupon, bringing the concept of targeted local offers to a global Internet market of 2 billion users.  Groupon uses email rather than snail mail, its salespeople connect with local businesses over the phone and Internet rather than on roller skates, and it’s pioneered breakthrough innovations like group buying and daily deals—connecting local businesses and consumers worldwide in ways Loebel could only have dreamed of.

Valpak:  Great idea, great first run, now back in a radically different incarnation for a spectacular second act.

Next up

That almost century-old idea I mentioned earlier?  That’s the topic of tomorrow’s post—and the inspiration for Andreessen Horowitz’s newest investment!