Josh Work Professional Organizations Trip Reports Conference Report: 2014 LISA Advanced Topics Workshop

Tuesday's sessions included the 20th annual and final Advanced Topics Workshop; once again, Adam Moskowitz was our host, moderator, and referee. Unlike past years we only ran for a half day. With only two new participants (both longtime LISA attendees), Adam covered the participant's interface to the moderation software in brief one-on-one sessions over lunch. We started with our usual administrative announcements. We mostly skipped introductions. However, Adam noted that 2 people here were at the first ATW, and he and I were both here for the past 18 years (he as moderator and I as scribe). In representation, businesses (including consultants) outnumbered universities by about 2 to 1 (about the same as last year); over the course of the day, the room included 10 LISA program chairs (past, present, and announced future, up from 5 last year) and 9 past or present members of the LOPSA or USENIX boards.

Our first topic, which took two-thirds of the discussion time, was on why this was the last ATW. To oversimplify:

Of course, since this decision was announced without input from the participants, it... generated a very spirited and passionate discussion (and at times an outright debate). That discussion wandered through what the workshop should be if it were to continue, as well as the future direction of the LISA conference itself. No definitive conclusions were reached, in large part because not all stakeholders were present or represented.

It was argued that the workshop has been successful. The founder, John Schimmel, looked at the conference and identified a problem: More-senior system administrators would only come to LISA (which was then more about training junior administrators) if they were speaking or teaching, and were much less likely to come as an attendee. The workshop was an attempted solution to that problem: Get the more-senior sysadmins present for the workshop, where they could have non-public discussions, without having to step down the language for more-junior sysadmins to understand, and they'd be (and were) much more likely to stick around for the rest of the conference.

It was also argued that there's still value in getting together, even if just "at the bar." Many were quick to point out that it would be much more difficult to sell "I'm meeting with a couple of dozen senior sysadmins at the bar" than "...at the workshop" to their management.

Some of the other points we considered during the discussion included:

It was stressed that all interesting proposals (for papers, talks, tutorials, and workshops) are both welcome and desired. If we say "After N years we have a new version of the ATW called something else," with how it'd be different, it would be considered. There is a limit in the number of workshops based on the number of rooms around, and the number of places any one of us can be at one time. It's not just what sould serve USENIX or LISA better but what would serve us (constituents) better.

As a palate cleanser we went with a lightning round: What's your favorite tool? Answers included Aptly, C+11, CSVKit, Chef, Docker, Expensify, Go, Google Docs, Graphana, Graphite, HipChat, JCubed, JIRA, R, Review Board Sensu, Sinatra, Slack, git and git-annex, logstash, and smartphone-based cameras.

Our next discussion was about platform administrators. With user-level networking and systems becoming one blended platform, are platform admins the new sysadmins? Is this a new tier for provisioning logical load balancers and front and back ends? The discussion seemed to be that it's still sysadmin, just a specific focus. It's like any other new technology, and may be due to the extenstion of virtualization into the network world. The "are we specializing" comes up often (for example, storage, network, Windows versus Unix, and so on), and we're still sysadmins.

One participant strongly disagreed, thinking it's fundamentally different in that for the first time it's now readily straightforward and easy to think of system deployment as a cheap software call or RPC. It's so lightweight in so many ways it's fundamentally different than early virtualized environments. His business expects to routinely spin up thousands of virtual instances. How much and how fast to spin things up (and down again) is a game changer. The other part of it is that the environments they're using for this are fundamentally confused about everything of value, with APIs calling APIs. At some level this is sysadmin on a new layer, because it's a programmability block mode; much of the sysadmin stuff is hidden. What happens when you're repairing a cluster and something says you have to scale out from 200 to 1000? Either "You don't" or "You wait" might be the answer.

Another noted that we're systems administrators, not just focused on the single computer (or network, or person), but on the interaction between those systems (computers, networks, people, and so on). Nothing's really changed: We still look at the pieces, the goals, and if it's delivering the product/service as expected.

Two side discussions came out of this as well. First, with virtualization and cloud and *aaS, how many businesses still administer their IT as their core function? Second, sysadmins who won't write code (including shell scripts) will soon be out of a job, since the field is moving towards that: Systems will be built by writing code. With virtualization and APIs, we suspect that most sysadmins will fall into the "services" mode, maintaining services on perhaps-dedicated probably-virtual machines, as opposed to the folks administering the underlying hardware on which the virtualized machines run.

Our next discussion was started with the phrase, "If I had a dollar for everytime someone said DevOps was the future...." It took forever for Agile to get into Gartner, but DevOps is there already and, in the speaker's opinion, has jumped the shark in less than 2 years. DevOps is a horribly abused term, despite being a paradigm shift. At ChefConf, the belief was that DevOps was "software engineers throwing off the yoke of the evil sysadmins have oppressed them for so long." (That's a direct quote from their keynote speaker.) Code needs to be in the realm of infrastructure; what we did 20 years ago won't scale today. There's a huge difference between writing actual code and writing a Ruby file that consists entirely of declarations.

In another company, they have some developers who do sysadmin work as well, but not all developers there have the background and he doesn't trust them to do it: Their sysadmins are developers but not all developers are sysadmins.

One participant has been going to DevOps and infrastructure-as-code meetups for a while now, and says it's like SAGE-AU and Sun Users' Group repeating the same mistakes all over again.

Even now, everyone still has a different definition as to what DevOps means, though most could agree it's not a tool, position, mechanism, or process, but a culture, about having the operations folks and engineers talk to each other as the product is written as well as after operations has it in production. There's a feedback loop through the entire life cycle. But having "a DevOps team" is not true; it's about not isolating teams.

We had a brief conversation on recruiting. How do you find and entice qualified people to jump ship to a new company? They have problems finding candidates who want to come to the company. The only response was that sometimes you simply can't, and one participant noted he turned down a great job because of its location (being sufficiently unpleasant to make it a show-stopper).

We then discussed what tools people are are using to implement things within a cloud infrastructure. One participant is all in AWS, for example. Do you do it manually or through automation, what do you use to track things and manage things and so on? One participant snarked he'd have an answer next year.

Another is about to start moving away from the AWS API to the Terraform library (written in Go), which supports several different cloud vendors and has a modular plug-in system. Beyond that it depends on what you're trying to do.

Yet another says part of this is unanswerable because it depends on the specific environment. His environment is in the middle of trying to deploy OpenStack stoage stuff, and most of the tools can't work because they reflect the architectural confusion thereof. They have used ZeroMQ for monitoring and control due to scalability (to a million servers — which is what they call a medium-sized applciation). Precious few libraries can handle that level. (That's the number thrown around by HPC too.)

Once you care about speed and latency and measurements you can make a better judgement of how much to spin up to handle those requirements and whether physical or virtual is the right answer for your environment.

Our final discussion topic was on getting useful information from monitoring data. One participant loves Graphite. Since he has a new hammer everything looks like a thumb, so he's been trying to get more and more into it... and now that he's taken the stats classes he needs more low-level information so he can draw correlations... and eventually move data out of the system. What are others doing with their statistics? What are you using to gather, store, and analyze data? In general, R and Hadoop are good places to start, and there's an open source project called Imhotep for large-scale analytics. Several others noted they use Graphite as a front end to look at the data. Spark is useful for realtime and streaming. Nanocubes can do real-time manipulation of the visualization of a billion-point data set. Messaging buses discussed include RabbitMQ and ZeroMQ.

How does this help? In one environment, they used the collected metrics to move a data center from Seattle to San Jose and and the 95th percentile improved a lot. Another noted that Apple determined that the transceiver brand makes a huge difference in performance.

We wrapped up with the traditional lightning round asking what we'd be doing in the next year. Answers included an HPC system with 750K cores and an 80PB file system, automation and instrumentation, chainsaws and hunting rifles in Alaska, enabling one's staff, encouraging people to crete and follow processes, exabyte storage, functional programming, Hadoop, home automation, Impala, infrastrucutre, learning a musical instrument, merging an HPC-focused staff into the common IT group, moving from GPFS to something bigger, network neutrality, organizing a street festival and writing the mobile app therefor, packaging and automated builds, producing a common environment across any type of endpoint device, R, scaling product and infrastructure (quadrupling staff), Spark, trying to get the company to focus on managing problems not incidents, and updating the Cloud Operational Maturity Assessment.

Our moderator thanked the participants, past and present, for being the longest-running beta test group for the moderation software. The participants thanked Adam for moderating ATW for the past 18 years.



Back to my conference reports page
Back to my professional organizations page
Back to my work page
Back to my home page

Last update Feb01/20 by Josh Simon (<jss@clock.org>).