The following document is intended as the general trip report for me at the 21st Systems Administration Conference (LISA 2007) in Dallas, TX from November 10-16, 2007. It is going to a variety of audiences, so feel free to skip the parts that don't concern or interest you.
I woke up before the alarm and managed to get out the door on schedule to head off to MSP airport. Got there, checked my bags, and cleared Security with no troubles. Hung out with someone at the gate who asked me if I were a member of NC Bears; I'm not, but could be, so we had a nice conversation waiting to board. We departed about 10 minutes late due to low air pressure in the tires. Landed on-time and managed to get to the hotel by 1pm.
After unpacking and hanging out in the lobby, went out to lunch with Mike to Atomic Sushi in the West End; I had miso soup, a spicy tuna roll, and three pieces of sushi (salmon, tuna, and yellowtail). We walked there and back, then I hung out in and around Landmark Circle in the hotel through registration and until dinner.
For dinner, I went out with Adam, Jeremy, Will, and Tom to Sunny Bryan's Smokehouse for barbecue. The table split an order of their famous onion rings (yummy), and then I had a three-meat combo of beef brisket, sausage, and pork ribs with fries and mac-and-cheese as my sides. Delicious.
After dinner, and a quick call to a local friend's voice mail, I hung out in the atrium bar with all sorts of people, catching up on Stuff and Life and Things, until my brain started shutting down around 11pm, so I headed upstairs updated this trip report, and crashed.
Today was a free day, with nothing on my schedule. I slept in (slept poorly, mainly the "first night in the new bed" syndrome, but also had some children screaming in the atrium) before heading down to the lobby to Hallway Track. Hung out with Jo Rhett until his room was ready, then with others as folks came and went.
At noon, a bunch of us — Adam, Duncan, Laen, Moose, and I — headed to lunch at the hotel restaurant. I had a very nice club sandwich (bacon, lettuce, tomato, grilled chicken, on chiabatta with basil mayonnaise and a side cucumber salad).
After that, I headed back downstairs for hallway tracking and general schmoozing. After the workshops and tutorials let out, bunches of folks headed out for Ethiopian food, but I waited for and went to RJ's for Mexican food with Frank, Mike, Bob, and Ted. They'd managed to run out of tortilla chips (and they didn't have tortillas and a deep fryer?) so we split a cheese quesadilla appetizer as a table, and I had a really tasty beef burrito with rice and refried beans and a house margarita.
Hung out in the hot tub for an hour after we got back (9-10pm), did a quick run through the gaming space and the
terminal roomlaptop lounge to say Hi to folks (including Moose who was working in the laptop lounge). Went back upstairs, dressed, and schmoozed in the bar until 11:30pm or so. Got caught up on email and headed upstairs to update this trip report.
Today was another free day, with nothing on my schedule. With the choice between "stay home and jobhunt" and "hang out and network with tens if not hundreds of friends and possible contacts" with the latter costing me 2 days' hotel and food less any airline change fees, still coming in early was a no-brainer. Was awake anyhow, so I headed down to breakfast with Amy, Bob, and Strata at my table and Bethlynn, Lee, and Moose at the table behind us. The buffet was mediocre (the eggs tended towards undercooked and the bacon was on the salty side (and when I notice it's salty, you better be aware it's salty)), though the service was generally pretty decent.
Did the hallway track to catch up on email. Around 11:30 I went to Kinko's with Tom to pick up the marketing materials for his April Fools' RFCs book and then we lunched at Gator's (where I had a pretty decent chicken fried steak).
After a quick power nap (and noting that Housekeeping hadn't gotten to my room at all yet), I went back down to the free wireless in the lobby only to be unable to obtain a DHCP address. Rebooting the laptop didn't help; rebooting it a second time worked. Go figure.
I didn't want to do any more walking so I had dinner at the hotel with 9 others, including David, Matthew, Mike, and Strata; I had a burger and fries. After dinner we schmoozed in the bar for a while, then adjourned to the hottub until it closed at 10, then back to the bar. Around 11pm I went to check email, update the trip report, take the evening drug regimen plus a muscle relaxant and some high-strength painkillers, and went to bed.
Tuesday started with the Advanced Topics Workshops. Once again, Adam Moskowitz was our host, moderator, and referree. [... The rest of the ATW writeup has been redacted; please check my LJ and my web site for details if you care ...]
After the ATW broke up, I dropped my stuff in my room and went to dinner by taxi van with David, Travis, Chris, Mark, Mike, and Strata at Hoffbrau Steaks in the West End. The concierge gave us coupons for two free appetizers, so we split potato skins and fried jalapeños (and an order of fried pickles that showed up accidentally). I had a t-bone steak and loaded baked potato.
After dinner, we headed back to the hotel on-foot to make it to the GBLT[UVWXYZ] BOF (also known as the motss.bof or the Alphabet Soup BOF). We had 22 people come in over the evening and discussed what the purpose(s) of the BOF should be and whether it's outlived its usefulness. The concensus was it's still useful to help ease newer attendees into groups instead of making them feel lost at a large conference like LISA, and it's a way to help people understand that this is a safe space.
After the BOF I grabbed an ice cream bar from the Cambridge Computing Hostility Suite and adjourned to the LOPSA After Dark suite. I had about a shot of tequila (nice, but it didn't suit my taste for tonight) and a small glass of the lovely dessert wine Travis brought up from Austin for me. The suite itself was too loud and too crowded so I bailed around 10:30pm to finish a first-pass through the ATW notes before bed around midnight.
The conference technical (as opposed to tutorial) sessions began this morning. My day began with the keynote session, which started with the usual statistics and announcements. This was the 21st annual Large Installed System Administration (LISA) conference, dating back to the first workshop in 1987 (75 people). This year we received 55 refereed paper submissions and accepted 22 papers, and had over 1000 registered attendees as of the start of the technical program (96% of last year's attendance).¹
This was followed by thanks to the usual suspects: program committee members, external readers, chairs for IT and Guru tracks, USENIX staff and board, speakers, attendees, sponsors, exhibitors, and vendors. Program Chair Paul Anderson reminded us of the Birds of a Feather (BOF) sessions in the evenings, and the poster sessions on Wednesday and Thursday evenings, which is new for us this year.
This year's awards for Best Paper were both written by students:
- "Application Buffer-Cache Management for Performance: Running the Worlds Largest MRTG," by David Plonka, Archit Guopta, and Dale Carder of the University of Wisconsin at Madison
- "PoDIM: A Language for High-Level Configuration Management," by Thomas Delaet and Wouter Joosen of Katholieke Universiteit Leuven
The SAGE Outstanding Achievement award was given to Æleen Frisch for her contributions as author, mentor, practitioner, and teacher of systems administration. Finally, the annual Chuck Yerkes Award for Mentoring, Participation, & Professionalism went to Paul Lussier and Richard Chycoski.
John Strassner of Motorola was our keynote speaker. Despite some ongoing audio feedback problems, he spoke about autonomic networking and how to get there from here. (For some length of time following the talk, his slides will be available online.) Basically, there are two modes of thought on autonomic computers: HAL 9000, where the computer replaces people, and the computers on Star Trek, where the computer is a tool to help people complete tasks. He went through the definition and functions of autonomic systems; they are self-governing and can self-configure, self-protect, self-heal, and self-optimize. He also discussed how current technology can be used to build autonomic systems, how state machines work in this context, and how context-aware policies apply to objectives. In short, it's not just about technology, and it's harder than it looks.
In the second session, I attended and scribed Carson Gaspar's invited talk, "Deploying Nagios in a Large Enterprise Environment." Carson discussed how a project went from skunk-works to production and how monitoring was explicitly delayed until after an incident. Their Nagios (version 1.x) installation had several initial problems:
- Performance problems — By default, Nagios (pre 3.x) performs active checks and can't exceed about 3 checks per second and did a fork()/exec() for every statistical sample. Also, the web UI for large or complex configurations take a long time to display (fixed in 2.x).
- Configuration — Configuration files are verbose, even with templates. It's too easy to make typos in the configuration files. Keeping up with a high churn rate in monitored servers was very expensive.
- Availability — Hardware and software failures, building power-downs, patches and upgrades, and who monitors the monitoring system when it's down?
- Integration and automation — Alarms need to integrate to the existing alerting and escalation systems, and need to be suppressed in certain situations (e.g., "building is intentionally powered down"). Provisioning needed to be automatic and integrated with the existing provisioning system.
They solved or worked around these problems by switching from active to passive checks (which gets them from 3 to 1800 possible checks per second), splitting the configuration to allow multiple instances of Nagios to run on the same server, deployed highly-available Nagios servers (to reduce any single points of failure), and generated the configuration files from the canonical data sources (for example, so any new server is automatically monitored). They also created a custom notification back end to integrate with their Netcool infrastructure and to intelligently suppress alarms (such as during known maintenance windows or during scheduled building-wide power-downs).
The monitoring system design criteria specified that it had to be lightweight, with easy to write and easy to deploy additional agents, avoid using the expensive fork()/exec() calls as much as possible, support callbacks to avoid blocking, support proxy agents to monitor other devices (such as those where the Nagios agent can't run, like NetApps), and evaluate all thresholds locally and batch the server updates.
The clients evolved over time; some added features included multiple agent instances, agent instance-to-server mapping, auto reloading of confifutation and modules on update, automatically re-exec the Nagios agent on update, collecting statistics instead of just alarms, and SASL authentication to monqueue. The servers evolved as well: Split off instances based on administrative domain (such as production application groups versus developers), high availability, SASL authentication and aauthorization, and service dependencies.
This project started for one project with less than 200 hosts and was eventually for large sections of the environment. Documentation and internal consultancy are critical for user acceptance. Therefore, architect for the eventual adoption in production for the enterprise. For example, one HP DL385G1 (2x2.6GHz with 4GB RAM) is running 11 instances with 27000 service checks on 6600 hosts and it's using no more than 10% CPU and 500MB RAM.
For lunch, I went with Doug, Adam, and Mark R to Novak's Landmark Grill across the street from the hotel for a quick burger. After getting back to the hotel I ran through part of the vendor floor and chatted with some folks about possible job opportunities.
The first afternoon session was the Hit the Ground Running session with five subjects:
- Databases — John Sellens defined what a database was, why you would want them, the types of operations one would use to access them, and several concepts associated with them.
- Spam — Chris St. Pierre gave us everything we needed to know about spam in 15 minutes. Rolling your own solution (cheapest) should take use of realtime blacklists (RBLs), greylisting, sanity restrictions, antivirus, and Bayesian filtering. P0F and tarpitting are up and coming technologies to help as well.
- Data Centers — Doug Hughes gave a quick overview about the issues you need to consider when building a data center: Power (usually 3-phase), cooling, floor loading, and wire density.
- Active Directory Group Policy — Gerald Carter spoke about group policy, which provides a way for hierarchically layering policies to groups of users for NT, and how we can apply this technology to Unix/Linux systems.
- Autonomic Computing — Glenn Fink provided the 30,000-foot overview to autonomic computing. As we saw at the keynote this doesn't really exist yet (we're about 34% there at best) but he spoke as to when it's coming, how it'll affect us, and what we need to do when it gets here.
During the break, I had a brief talk with USENIX Executive Director Ellie Young about the new Raleigh convention center as a possible venue for conferences (as suggested by my SuperShuttle companion back on Saturday). Since I was there, I also had a small glass of champagne and a wedge of chocolate mocha cake (chocolate cake with mocha mousse and covered in a chocolate ganache) for Tony Del Porto's birthday.
Today's final technical session was Andrew Hume's invited talk, "No Terabyte Left Behind." There's a dilemma: Space is cheap, so users want, get, and use more of it. However, this leads to all sorts of interesting problems, like how to partition and how to back up the disk (especially when you get towards terabytes on the desktop). Traditional tools (such as dump take 2.5 days to back up 250GB). Making the space available from servers can be problematic (local or networked file systems and the associated problems with network bandwidth). We've talked about these issues before, but there are still no good solutions.
Let's take a hypothetical example of recording a TiVO-like service without any programming wrappers. Recording everything all the time for both standard and high definition programming leads to about 1.7 petabytes per year worth of data, even assuming no new channels get added. This is too big for the desktop, so we'll need to use space in the machine room: a 2U or 3U generic RAID unit at 2-4TB/U costs up to $1,500/TB and you'd need 133 of them per year. This uses 16TB per square foot and requires 27 feet of aisle space per year with modest power and cooling. But that's a lot of money and space per year. We can possibly be clever looking at the access patterns; for example, moving the older and less-accessed shows off to tape, or keeping only the first 5 minutes of the show on disk and the rest on tape, and thanks to a tape library (the example Andrew used was an LTO-4 with 800GB/tape and 120MB/s sustained write at 60-sec access and a 2.5PB library costs $172/TB and uses 41TB per square foot, and expansion units are $7/TB and 79TB/square foot) can still provide every TV show on-demand with no user-visible delays. Sounds good, right?
Wrong. It gets worse when you realize that the media is not infallible. Ignoring the issues with tape (such as oxide decay, hardware becoming obsolete, and so on), we've got problems with disks.
Here's the reality about really using disks, networks, and tapes: Things go bad, trust nothing, and assume everything is out to get you. You don't always get back what you put out. Compute a checksum for the file every time you touch it, even when it's read-only. Yes, it's paranoid, but it's necessary if you really care about the data integrity, especially with regard to disk and tape. He's seeing a failure rate of about one uncorrectable and undetected error every 10 terabyte-years, even in untouched, static files.
As disk use grows, everyone will see this problem increasing over time. The issue of uncorrectable and undetected errors is real and needs attention. We need a way to address this problem.
A group of us headed out to dinner at RJ's in the West End again, where I had enchiladas and a mango margarita (yum). Sat through the LOPSA community meeting then played various card games (Fluxx, 1000 Blank White Cards, and Give Me the Brain) in the lobby before heading up to the LOPSA After Dark suite for conversation until finally getting kicked out around 12:30am or so. Headed back to the room, took the evening drugs, prepared for the morning sessions, and crashed around 12:45am.
I managed to wake up shortly before the 8am alarm. I had a breakfast bar in the room before heading to my first session, today's plenary session, Tony Cass on "The LHC Computing Challenge," about CERN's Large Hadron² Collider. I didn't take detailed notes, but he gave an overview of the collider environment, which in any given experiment can generate a lot of data, on the order of several hundred good events per second. The raw and processed data comes up to around 15PB/yr once they go live. Everything they do is trivially and massively parallel. Challenges they're facing include capacity provisioning across multiple universities' computing grids, especially with 80% of the computing being done off-site; managing the hardware and software involved; data management and distribution, especially given the quantity of both raw and processed data and the speeds possible for disks, networks, and tapes; and what's going on, in terms of providing the service and whether both the systems and the users understand what's going on, and where the problem (if any) is, requiring new visualization tools as well. This is an immensely complex situation with many challenges, most of which have been addressed with considerable progress over the past 6 years, but this is all before there's new real data. Only time will tell if CERN will be able to quickly identify problems in real-time once the system goes live.
During the second session I hallway-tracked and introduced Jessica Cohen to The Funny Music Project (FuMP) and Mark Roth to the Church of the Flying Spaghetti Monster (FSM).
Because the free pizza on the show floor was inedible to me — one had mushrooms, the second had olives, and the third had bell peppers, and all of those are vile and evil and disgusting — I went out to lunch with Ellen and Mike at Novak's again. Hallway-tracked after lunch, and accidentally reminded Carson that he had 20 minutes to build his slides for his 2pm Hit The Ground Running talk.
In lieu of the conference reception ("hoedown"), several of us — Carson, Ellen, Todd, Trey, and I — headed out to the churrascaria Texas de Brazil for "meat on swords." Very good food (in addition to the shrimp, prosciutto, salami, parmesan cheese, green beans, potatoes au gratin, Romaine salad with bacon and blue cheese dressing, pesto tomatoes, lobster bisque, and whatever else I grabbed in small quantities from the salad bar, I had filet mignon, flank steak, top sirloin (the house specialty), leg of lamb, roasted pork loin, parmesan chicken, and chicken wrapped in bacon, plus the garlic mashed potatoes and grilled plantains, plus the triple chocolate cake for dessert) and very speedy service (no sooner did we flip our signs from red to green than we had four or five gauchos delivering slices and hunks of meat). It was not quite as good as Fogo de Chão last year, but quite enjoyable nevertheless.
Got back to the hotel, swung through the BOFs, then headed off to open up the suite for Cat's birthday party at 10pm. A dozen or so of us hung out and chatted, drinking mostly scotch (Geoff brought some cask-strength Australian whiskey that was very smooth and tasty) and port, and nibbling on 88% chocolate. Around 11:30pm I bailed for a quick swing through the LOPSA After Dark party and then to bed.
The first session was a choice between Brad Knowles speaking about NTP and a substitute speaker about power and heat of electronics not keeping up with Moore's Law, and nothing in the second time block (configuration management, litigation, data centers, or VOIP and LDAP) was of interest to me, so I skipped the entire morning and spent it hallway-tracking. Lunch was a mediocre pizza from the hotel, since I didn't want to go to the West End and didn't want to eat at Novak's a third day running.
After lunch I went to Cat Okita's "The Security Butterfly Effect" talk, which was very much entertainment with a side of education instead of the converse. In information security, a change that seems simple may result in serious vulnerabilitiesand as the complexity and interdependence of the environments we manage increase, predicting the effects of apparently innocent actions will become infinitely more challenging. Narrowing specialization leads to specialized knowledge. It's easier to let the machine do the simple things (rise of the machines) and to trust the results, and to trust that someone else's assumptions are valid and correct. Cat provided eight stories and asked the audience what starting assumptions, by type — about the environment, about human behaviour, or about blind spots — contributed to the problem.
The conference's closing session began with the usual close-of-conference announcements, then we segued into "Cookin' at the Keyboard" with David Blank-Edelman and Lee Damon. While neither have any formal culinary experience, both like to cook. Lee demonstrated by preparing (from chopping the vegetables through serving it to David), on-stage while David spoke, a tofu and vegetable stir fry and ice cream with homemade hot chocolate sauce throughout the course of the talk.³ David spoke about how system administrators could learn from restaurant cooking procedures.
First, as an appetizer, David spoke about why cooking is hard. The summary is that you're not just applying heat to some food, but you're managing the conditions and there are lots of variables, such as the quality of the ingredients, the temperature and humidity of the air, and the level of heat involved.
Next, as the first course, David discussed recipes. Based on talking with cookbook authors and chefs, he talked about how writing recipes is hard. You never make the same food twice, you can't go into explicit detail on every step (including such things as the suppliers of the food and manufacturers of the stove and pans and so forth) without scaring your audience, and most people don't use common sense in terms of recovery (if a recipe says "cook 10 minutes" they'll cook it for 10 minutes even if it's obviously done after 5). Solutions to these problems are to treat recipes as general guidelines and to never expect someone to duplicate a recipe but instead to approximate it. You also find in cookbooks that they specify common units and time ranges and they provide visual and textual clues to let the reader (cook) make judgements. As you get more experience you can come up with simpler recipes with fewer ingredients to achieve the same or better flavors. Or, the better you get, the simpler it gets. So, learn to simplify: It takes experience. Learn where to cut corners, learn to ask questions, to question every ingredient, and to compromise when necessary.
As the second course, David had talked to several working chefs about working in a world-class kitchen. Starting with a definition of restaurant terms, and continuing through a comparison of trade (such as a burger-flipper or help desk worker) to craft (cook or system administrator) to art (chef or ubergeek), he went through the skills you needed. The chefs agreed that to be a good cook you'd need urgency, the ability to take direction, cleaning as you go, precision, knowing about the subject matter, initiative, focus, and dedication. You also need to be part of a team, be willing to jump in and help when needed, and be able to receive new information and produce with it. These skills, with minor changes from food-industry to technological terms, describe a lot about systems administration.
In the case of cooking, preparation (mise en place) is included in the process. You need to be prepared physically and mentally; you need to know where everything is and have everything to hand, ready when you are, and be as fast and as efficient as possible. As you get more experience you're able to work better under pressure, to help others, and to show your creativity.
Finally, for dessert, David provided an overview of what we as systems administrators can take away from the talk. We need to write better recipes and recipe interpreters, such as configuration management tools. We need to develop our skills and moves better. We need to prepare, work clean, and focus on the task. Finally, like a line cook becoming a chef, we need to chase perfection: Take teaching opportunities but not necessarily every learning opportunity, communicate with your team, document what you do, learn more things, ask for help when you need it, and be able to roll back and start over when you have to.
I went out with Bob, Carson, David, Jessica, Mark, Michael, Peter, and Travis to Y.O. Ranch Steakhouse in the West End. I had a caesar salad and a very tasty buffalo filet mignon with a loaded smoked baked potato. I was amused that after our party of nine was seated, at least two other conference parties of 6 were seated nearby, and over the course of the evening LISA attendees filled up about one third to one half of the restaurant. Our waitress was great, no problem with the orders, drink refills, or the separate-checks request. Tres yummilicious.
We got back to the hotel around 8:15pm and headed up to the combination Scotch BOF and Dead Dog party in the managers' suite.4 I stayed there tending bar (and for tha latter half of it in my formerly-regular bartender drag, though I'd not worn it since at least 2003, and catching up with folks I'd missed for most of the conference until about midnight, then headed back to my own room to do the initial packing and crash.
Today was the travel day to return home. I woke up several times — went to bed by 1am, up again at 4am and 8am and so I gave up. I showered, finished packing, checked out, and headed off to the lobby to hang out before catching a shuttle to DFW. Cleared security, had lunch there (at a sit-down restaurant for much less than it would've cost at the hotel), then read a book until boarding.
The flight itself was uneventful. I'd had the foresight to write down where I'd parked this time, unlike last year (blue ramp, 5th floor, row R, to the right of the exit). Got home reasonably quickly, and unpacked and processed the important postal mail before a quick dinner of pepperoni pizza pockets.
|¹||"Registered attendees" is misleading, as it includes people who are registered for one or more tutorials, one or more days of technical sessions, or an exhibition floor booth, and also includes comped registrations in addition to the paid registrations.|
|²||Yes, "hadron" is spelled correctly; the middle two letters are not transposed.|
|³||Unfortunately, due to liability issues, David and Lee were unable to share the stir fry or the chocolate sauce.|
|4||Since the USENIX staff threw the Dead Dog party on Thursday night as a closed party (staff and volunteers only) instead, the cognoscienti moved the Scotch BOF from Thursday to Friday and the managers' suite was the effective end-of-conference party.|