Tuesday's sessions began with the Advanced Topics Workshop; once again, Adam Moskowitz was our host, moderator, and referee. We started with our usual administrative announcements and the overview of the moderation software for the new folks. Then, we went around the room and did introductions. In representation, businesses (including consultants) only outnumbered universities by about 4 to 1 (up from 2 to 1); over the course of the day, the room included 7 LISA program chairs (past, present, and future, up from 6 last year) and 7 past or present members of the USENIX, SAGE, or LOPSA Boards (down from 9 last year).
Like last year, our first topic was on cloud computing. The consensus seemed to be that there's still no single definition for the topic. Most of the technical people present perceived "cloud" to mean "virtualization" (of servers and services), but for nontechnical or management it seems to mean "somewhere else" as in "not my problem." Regardless of the definition, there are some areas that cloud computing is good for and some it isn't. For example, despite pressure to put everything in the cloud, one company used latency requirements for NFS across the internet to identify something couldn't work as a cloud service. They can then escalate up the management stack to rearchitect their applications to get away from the "it's always been done that way" mindset. Some environment are using "cloud" as an excuse to not identify requirements. However, even with environment-specific cloud services, providing self-service access (as in, "I need a machine with this kind of configuration") and not having to wait weeks or months for the IT organization to fulfill that is a big win. IT organizations are often viewed as onerous (or obstructionist) so going to the cloud allows the customers to get around those obstructions. One member noted that the concept of cloud as virtualized servers and services isn't new -- look at Amazon and Google for examples -- and yet research is saying "it's all new." In academia, the cloud is "good for funding." (Even virtualization isn't new; this was done on mainframes ages ago.)
That segued to a discusion about how to implement this. We need to consider the security aspect: what's the impact of sending your stuff somewhere else, what are the security models and controls, is old data wiped when you build new machines, is the data encrypted across the net, and so on. There's also the management assumption that services can be moved to the cloud with no expense, no new hardware, no new software, no downtime, and no problems. One tongue-in-cheek suggestion was to relabel and rename your hardware as cloud001, cloud002, and so on. Management needs to be reminded that "Something for nothing" isn't true since you need to pay for infrastructure, bandwidth, staffing, and so on. "Cloud" may save budget on one line item but may increase it on others.
After our morning break, we resumed with a quick poll on smartphone use. Among the 31 people in the room, the breakdown was:
- Android, 11
- Blackberry, 2
- Dumb, 5
- iPhone, 8
- Palm, 3
- Symbian, 1
- No phone, 1
Next we did a lightning round of favorite new-to-you tools this past year. The answers this year ranged from hardware (Android, hammers, iPad, and Kindle) to software (certain Firefox add-ons, Ganetti, Hudson, Papers, Puppet, R, Splunk, and WordPress) to file systems (HadoopFS, SANs, sshfs, and ZFS on FreeBSD), to services (like EC2), as well as techniques (such as saving command history from everywhere).
Our next major discussion topic was careers in general: jobs, interviewing, and hiring. One hiring manager noted they had a lot of trouble finding qualified people for a high-performance computing sysadmin position. Many agreed it's common to get unqualified applicants and to get few women and minorities. Even with qualified applicants (such as senior people for a senior position), it's problematic finding the right fit. Another hiring manager notes they're seeing more qualified applicants now, which is an improvement from 3-4 years ago.
This led to a discussion of gender balance in the field, and sexism in general. The "you need a tougher skin" feedback seems common out in the world, and one participant noted that saying that would be grounds for termination at his employer. Another person hires undergrads at his university to train them as sysadmins, but in 9 years has had only 2 female applicants. Part of the problem is the (American) cultural bias that tends to keep women out of science and technology because "girls don't do that."
One question is whether the problem is finding people or recruiting people who later turn out to be a poor fit. The discussion on interviewing had a couple of interesting tips. If a candidate botches an interview, closing the interview instead of continuing is a courtesy. Not everyone treats "assertive behavior" as indicative of "passion," so watching your communication style is important. Over-assertiveness can be addressed by interpersonal training, and supervisor training to be able to pull someone back is a good idea.
We segued into the fact that senior people need to have an option other than "become a bad manager" for promotions. Most of us in the room have either been or are managers. Several of us see the problem that the technical track has a finite limit and a ceiling; one company has a "senior architect" position that's the technical equivalent of VP. Some think the two-track, technical or management, model is a fallacy; you tend to deal with more politics as you get more senior, regardless of whether you're technical or management.
Next we discussed automation and devops. There's a lot of automation in some environments: both sysadmin tasks and network tasks, but it's all focused on servers or systems, not on services. Many places have some degree of automation for system builds (desktops if not also servers) and many have some degree of automation for monitoring with escalations if alerts aren't acknowleged in a timely manner. There's a lot of automated configuration management in general; a quick poll showed 22 of 30 of us think we've made progress with configuration management in the past 5 years. At Sunday's Configuration Management Workshop, we seem to have the technical piece mostly solved but now we're fighting the political value. Many people work in siloed environments which makes automating a service creation across teams (such as systems, networks, and databases) difficult.
One participant noted that many sysadmins have a sense of ownership of their own home-grown tool, which can work against adopting open-source tools. With the move towards common tools -- at the Configuration Management Workshop, 70% of people had deployed tools that weren't home grown -- you can start generalizing and have more open source than customization. But capacity planning is hard with the sprawling environment; you need to have rules to automate when to look for more servers. It was also pointed out that automation can mean more than just "build server" but also "deploy and configure database and application."
We have seen DevOps skyrocket over the past couple of years; finally sysadmin is getting some recognition from developers that these problems are in fact problems. We may be able to steal their tools to help manage it. As sysadmins we need to lose our personal relationships with our servers. We should be writing tools that are glue not the tools themselves. Moving towards a self-service model (as in the cloud discussion above) is an improvement. Sysadmins often write software but aren't developers; the software may not be portable, or may solve a symptom but not the cause, and so on. Also, many good sysadmin can't write a large solution. There's been a long-standing stand-off between sysadmins and application developers. It's coming to the point where the application developers aren't getting their requirements met by the sysadmins, so the sysadmins need to come up with a better way for managing the application space. The existence of DevOps recognizes how the industry has changed. It used to be that developers wrote shrink-wrapped code that sysadmins would install later. Now we're working together.
One person noted that DevOps is almost ITIL-light. We're seeing ITIL all over; it's mostly sensible, though sometimes it's process for the sake of process. That segues into a big problem of automation -- people don't know what they actually do (as a sysadmin, as purchasing, as hardware deployment, software deployment, and sometimes even the end user); arguably that's a social problem, but it needs to be solved. Beyond that, DevOps is another way of fancy configuration management.
It was noted that DevOps is as well-defined as "cloud." Several people distinguish between system administration ("provide a platform") and application adminsitration ("the layer on that platform is working"). We ended with a sanity check: Most of us think, in the general case, that a hypothetical tool can exist that can be complete without requiring wetware intervention.
After our lunch break, we had a discussion on file systems and storage. The disussion included a reminder that RAID-5 isn't good enough for terabyte-sized diks due; there's a statistical probability that 2 disks will fail, and the probability of the second disk failing before the first one's done rebuilding aproaches unity. RAID-5 is therefore appropriate only in cases of mirrored servers or smaller disks that rebuild quickly, not for large file systems. We also noted that DropBox (among others) is winding up on Important People's machines without the IT staff knowing (such as vice presidents and deans): It's ubiquitous, sharing is trivial, and so on. It's good for collaboration across departments or universities, but making the users aware of the risks is about all we can do. Consensus is that it's good for casual sharing; several recommended preemptive policies to ensure users understand the risks. In writing those policies, consider communications from the source to the target and all places between them, and consider the aspects of discovery (in the legal sense), and whether the data has regulatory requirements for storage and transmission (such as financials, health, student records). Depending on your environment, much of the risk analysis and policy creation may need to be driven by another organization (Risk Management, Compliance, Legal, or Security) and not IT.
Our next discussion was a lightning round about what surpries happened at work this year. Answers included coworkers at a new job being intelligent, knowledgable, and understanding of best practices; how much the work environment, not the technical aspects, matter; IPv6 deployment and the lack of adoption (only 6 people use IPv6 at all and only 3 of them have it near production); moving from Solaris to Linux because the latter is more stable; moving from syadmin into development; new office uses evaporative cooling and it works; Oracle buying Sun and the death of OpenSolaris; organizational changes; project cancellations; and virtualization allowing Security to push services into the DMZ faster than expected.
After the afternoon break, we resumed with a discussion on security. Most think the state of the art in security hasn't changed in the past year. There've ben no major incidients but the release of Firesheep, the Firefox extension to sniff cookies and sidejack connections, is likely to change that. (This ignores the "Why are you using Facebook during the workday or in my classroom" question.) Cross-site scripting is still a problem. Only one person is using NoScript, and only a few people are using some kind of proxies like SOCKS. Most people use Facebook but nobody present uses Facebook Applications; however, the workshop attendees are self-selected security-savvy people. We also noted that parents of young kids have other security problems, and some people don't want to remember One More Password.
Our next topic was on the profession of systems administration. We have some well-known voices in the industry represented at the ATW and we asked what they think about the profession. The threats to sysadmins tend to fall into three categories: Health, since we've got mostly-sedentary jobs and many of us are out of shape; the industry, where there's enough of a knowledge deficite that the government has to step in; and the profession, as sysadmins don't seem to have a lot of credibility. Sysadmins don't have a PR department or someone from whom the New York Times can get a quote. Outsourcing was identified as a problem, since they tend to have an over-reliance on recipes, playbooks, and scripted responses; this is the best way to head towards mediocrity. It removes critical thinking from the picture and leads to "cargo cult" computing at the institutional level. Junior administrators aren't moving up to the next level. Sysadmin as a profession is past the profitable cool initial phase and into a commodity job: it's not new and exciting; and being bored is one of the key aspects. Furthermore, it's not just about the technology, but also about the people (soft) skills: communication and collaboration are tricky and messy but still essential.
It was noted that as a profession we've tried to move away from the sysadmin-as-hero model. Our services are taken for granted and we're only noticed when things go wrong. This seems to be something of a compliment: Train engineers used to be badasses because they were what sat between passengers and death, and computing around the year 2000 was like that. Now, that's untrue; where are the engineers now? ("Rebooting the train" was one wag's response.) Some believe that as individuals we have more power now, but he believes the reason is because what we do can affect so much more of the business than it used to: IT is more fundamental to the business. Siloing is a characteristic of big organizations. To get very big you have to shove people into pigeonholes. Others believe that in part because of siloing and regulatory requirements we have less power as individuals, since the power is distributed across multiple groups and never the twain shall meet.
Technology is constantly changing, so the challenges we face today are different than those we faced 5 years ago. As a result we recommend hiring for critical thinking skills. Sysadmins used to be the gatekeepers to technology, but so much is self-service at the end users that's no longer true. We provide a service our users consume.
We ended the workshop with a quick poll about what's new on our plates in the coming year. Answers included automating production server builds; dealing with the latest buzzwords; diagnosing cloud bits; handling new corporate overlords; improving both people and project management skills; insourcing previously-outsourced services like email, networking, printing, and telecommunications; managing attrition (a 35% retirement rate in one group alone) moving away from local accounts and allowing the central organization to manage them; outsourcing level-1 help desks; simplifying and unifying the environment; training coworkers; and writing software to manage tens to thousands of applications.