DevOpsDays MSP 2015

09 July 2015

Contents
The Conference
Talks
Ignite Talks
Open Spaces

The Conference

DevOpsDays MSP is an annual 2-day conference in Minneapolis, Minnesota. This is the second year of the conference.

DevOps is *deep breath* the study and practice of combining the classically separate professions of development and operations. (my definition, which is probably different than everyone else’s).

The term ‘DevOps’ is largely credited to Patrick Debois in 2008, though the concepts and practices arguably predate the term.

The following are my personal notes from the conference, summarized in my own words, and sometimes with my own biases and omissions. This is not a transcript and it is not complete. Where possible, the original sources should be referenced.

Talks

Devops: The Missing Pieces

Katherine Daniels - talk

slides

Great overview of devops at Etsy with some actionable advice that every company should consider.

DevOps History
- Used to throw code over the wall, and it didn’t go well
- Patrick Debois organized the first DevOpsDays conference in 2009
  - Ops and Devs should work together, communicate more
DevOps @ Etsy
- “Who is in charge of DevOps?” – “We are all in charge of DevOps.”
- No DevOps team, no DevOps engineers.
- We’re all in charge of devops.
Bootcamping
- Embedding (especially new) employees into other teams
- 1-6 weeks, 1-3 bootcamps per person
- Meaningful contributions with other teams
- Empathy, inter-team understanding
Designated Ops
- Every team has a designated ops engineer
- Even non-dev teams may have an ops person
- Designated, but not dedicated
- Primary, secondary, advisory roles - avoids burnout, spreads load
- Attend other team’s meetings
  - Expose operational ideas, concerns, early
  - Monitoring, operational thinking, fewer surprises
Pair Opsing
- A part of Designated Ops
- Similar to pair programming
- Once a week, or every couple weeks
- Refactoring ops code, changing alerting
- Two-way knowledge sharing
Intersectionality
- The study of intersections between forms or systems of oppression, discrimination
- Accessibility, real name policies, B corps, side effects of businesses
- Rock stars, big egos, bad
- Looking to build orchestras, rather than rock stars
- When hiring, look for diversity, team players,

Why you should care about DevOps in the public sector

Joshua Zimmerman - talk

Challenges in public sector technology and how DevOps can help.

The public sector are organizations owned or administered by the government
Adam Jacob’s Kung Fu Talk recommended
Why care about public sector? Easy example: heathcare.gov
Lacking the agency to fix a problem makes it feel like it isn’t your problem.
Gov’t applications, web apps, have issues. They’re aware of it.
- “This is what my tax dollars pay for.” common complaint
We should be able to improve these applications.
Why devops?
- Improve state of public infrastructure
- Doesn’t require money! It’s culture
Bureaucracy
- Mark Schwarz How DevOps Can Fix Federal Government recommended
- Devops is about bureaucracy
- Fix bureaucracy - devops culture and improve services
- Structure can be a barrier to devops
- Historical baggage - commonalities were not identified early
  - i.e. central IT department may not exist
  - Lots of institutionalized silos
- At UW Madison, ~140 IT units on campus(!)
- Structure and politics are hard
- John Maeda -
  - Silo approach - cooperationn - working together independently
  - devops approach - collaboration - working together dependently
Metrics for success may differ from private sector
- May never feel like you have succeeded
- Time is often a bigger limitation than money
How can everyone help?
- Devops conferences help - affordable and distributed
- Devops resources online also contribute
- Devops community needs better outreach to other tech communities
Your words matter. Choose them wisely. [use inclusive language]
- Don’t assume the goal is to make money. For a public institution, that’s not important.
- Give everyone a voice (Thanks devopsdays)
- Your problems are not unique
Takeaways
- Keep trying to understand us
- Team building without assuming hiring and firing are options

Rolling Your Own vs SaaS: Tradeoffs & Considerations

Colleen Velo - talk

This detail-packed talk shows an overview of what particular pieces of software a large company is using, and why.

Bloom Health
- Private health exchange
- HIPAA compliant, PHI data
- Entire infra in public cloud infrastructure (AWS)
Definitions used in this talk
- SaaS - Software as a Service
- Roll your own - write your own custom software privately
- Open Source - Freely available software
- Commercial (self hosted) - purchased software but administered locally
Considerations on DIY vs SaaS
- Cost, support, staffing, company policies, security
- hipaa security
  - data must be encrypted in transit and at rest
  - principle of least privilege
I excluded a bunch of slides on pros, cons and use cases of each of SaaS, Commercial, Open Source, and Roll-your-own. Watch the talk for full details.
Bloom’s approach to SaaS
- AWS
  - CloudTrail and CloudCheckr - great for auditing access
  - Used to use trusted advisor instead of CloudChecker
  - CloudFormation to manage deployments
  - ElastiCache instead of sticky sessions
  - Planning to use RDS for MySQL
    - Currently still using MySQL master/slave
    - RDS MySQL now HIPAA compliant
  - AWS HIPAA compliant page recommended
- Jira, Confluence, Hipchat
- DockerHub private registry
  - building their own private registry
- Github in the cloud
- Monitoring
  - Stackdriver for system metrics, integrated with AWS
  - New Relic for application monitoring, JVM stack, groovy/tomcat
  - Pingdom for monitoring endpoints
  - Pagerduty for alerting
Bloom’s approach to commercial (self-hosted) software
- Splunk for log aggregation
  - Splunk enterprise security
  - Great integration with 3rd party products
- Casper Suite for Mac Provisioning
  - Security policies
  - Audit trail
- Jira ticketing system for PHI data - self hosted
Bloom’s approach to “roll your own” (write your own solution)
- SFTP file exchange due to HIPAA compliance
- Dynamic service discovery
  - In 2013, not many options available
  - BHStore, based on Redis and publish/subscriber
  - Moving over to Consul (hashicorp)
    - Multi data center support
Bloom’s approach to open source
- Vagrant for development/testing
- Chef solo, migrating to Salt Stack and Docker
- Graphite for monitoring historical application metrics
- Packer for AMIs
Takeaways
- Go to meetups to discuss solutions and experiences

Helping Developers Monitor Their Own Applications

Luke Francl - talk

DevOps from a developer perspective; how to get developer buy-in. Warning, actual code is presented and discussed!

Swiftype - search as a service company using API or web crawler
Luke is developer, admits to not knowing a lot about operations
–Bunch of really funny joke slides excluded–
Thought devops was just an ops thing..
Monitoring
- Make it easy for devs to add monitoring (or it may not get done)
- Example shown of ruby integration with nagios
  - Allows monitoring metrics and thresholds to be defined and implemented directly in code
  - See slides for particular technical details
- With this glue, “Monitoring is addictive” for development
- Developers are subject matter experts at their application. Give them to tools to implement their own monitoring.
- Open sourced ruby/nagios glue at github
“DevOps needs developers”
- Development used to be pretty awesome, throwing code over the wall
- Need developer buy-in. It can’t just be about ops
- Providing infrastructure for developers is a powerful one to convince development
- Make it easy for developers. They’ll understand the value.
- Moving towards having developers on call, but not there yet
Takeaways
- Much of this talk was contextual and funny, which I did not attempt to reproduce here
- Much of this talk was code, which I also did not reproduce here
- Make monitoring directly accessible to developers

The New New Software Development Game

Mary Poppendieck - talk

Mary is from the future. You should listen to what she says and buy her books and the books recommended here.

Software is eating the world.
Things to think about
- lower friction
  - Friction is what makes war different in reality vs on paper.
  - Before containerization in shipping, high friction - low friction with containers - Read The Box
- limit risk
Lower friction, limit risk. How?
- Architecture - microservices
  - In the 90’s, centralize everything into few databases - monolithic architecture
  - Microservices, by contrast, decouples all the things and creates a federated architecture
    - Read Building Microservices
    - Small service - does one thing well. Independently deployable.
    - Small team - end to end responsibility. END TO END. On call, monitoring, QA, deploying, etc
    - Practices
      - No central databases
      - Extensive automation and monitoring
      - Double Mock Contract Testing
      - Smart versioning services
      - Canary releasing
    - Examples: Amazon, netflix, spotify, gilt
    - Risks..
      - Dependency hell.. how is this different than objects?
      - Is it right for your domain?
        
        Yes, if you have high volume
      - Do you understand the domain?
        
        Often start monolithic and move to microservices when it makes sense
        
        Get bounderies right first, hard to refactor later
      - Can you maintain strict discipline?
        
        Restrict interaction to hardened interfaces
        
        Teams maintain situational awareness of their services, its consumers, its providers
- Architecture - containers
  - Pack dependent code into containers
  - Build once, run anywhere
  - Consistency
  - Isolation
  - Easy to use (esp docker)
  - Better server utilization
- Architecture - testing
  - Use smart testing strategies
  - Better mocking - i.e. github.com/realestate-com-au/pact
Dealing with monoliths
- You don’t have to have microservices. i.e. Facebook
- Antipatterns
  - “Smash!” - large infrequent releases - Guarantees failure
- Best practice
  - Poke, test, fix.. small iterations over time
  - Continuous Delivery
    - Not new - 2010 idea. If you’re not doing this yet, you don’t care about stability, reliability, predictability. Least dangerous approach.
    - Must have test-driven development process
    - Tight collaboration between customer-facing and delivery people
    - Cross functional teams, including product, QA, and ops
    - Automated build, testing, DB migration, deployment
    - Incremental dev on mainline with continuous integration
    - Branches are not CD. Always deploy to trunk/master
    - Release is done by a switch. Deployment happens all the time.
    - Software is always production ready
    - Releases tied to business needs, not operational constraints
Organization
- Dev and ops are different - Read Focus
  - Safety-focused goal people
    - prevent failure. i.e. doctors, nurses
    - Duty, obligation
    - Rewards - attention is for bad behavior
    - Nothing going wrong is ideal
    - Limit risk
  - Aspirational goal people
    - create gains
    - explore all the options
    - Rewarded for delivery
    - Lower friction
  - Both important! “Often in a marriage, you’ll have one of each”
- One goal, shared responsibility
  - Who is responsible?
  - “We work together”
  - All of us are responsible.
- Situational awareness - Read This is Lean
  - What makes a great (soccer) team - everyone on the field is aware of everything, all the time. The team with the most situational awareness will win.
  - Wayne Gretzky - skates to where the puck will be

Cheffing Etsy; Do too many cooks really spoil the soup?

Jon Cowie - talk

This was a very detailed and pragmatic talk about how Etsy uses Chef to deliver services.

Do too many cooks really spoil the soup?

What is chef?
- Desired state config management
- Thin server, thick client
- Chef vocab
  - Node - a server being controlled by chef
  - Cookbook - desired state specifications, using recipes
  - Environment - a list of cookbook version constraints
  - Knife - cli for chef server
- There is no magic pill.
- You’re the expert, chef is just a tool, not prescriptive
Chef at Etsy
- Chef server
- ~2000 nodes
- Almost all centos, but a couple Mac OS X
- Everything from OS to “below code”
- chef does not deploy code - deployinator
- Single git repo
  - creates 2 sources of truth..
    - Humans talk to git, servers talk to chef server
  - 50 authors in last month
- ~35 chef deploys per day
- Many less-experienced users - trust but verify
Cookbook workflow
- command line review tool - creates pull request, sends it to people
- Push change to server, using internal tool called knife-spork
  - Helps multiple chefs avoid clashing, and gives visibility
- Test change
  - Move node to unconstrained environment
  - knife node flip foo.etsy.com testing
  - Downsides..
    - no unit tests
    - holding cookbook in testing is blocking
    - testing env affects all cookbooks
  - Use chef-whitelist to solve pain points
Monitoring and debugging
- knife-spork and CI job. Integrated with chat when changes are made (irc)
- IRC handler - deals with exceptions, test fialures
- Lastrun data - shows other nodes with failures
- Dashboard that shows when deploys happen, and overall chef status provides great visibility
How’s it all going?
- Some pain points
  - Change clashes due to number of chef contributors
  - Confusion over state of changes
  - People forget things
  - Testing pains
- 2016 million dollar workflow (improvement plans)
  - deployinator-based workflow
  - push queue
  - unit tests
  - “try” based testing - ci system lets you run jenkins tests before pushing
  - More like existing CD workflows
  - Basically, require fewer people to have to use chef
Read Customizing Chef (speaker’s book)
Lastly, a “rant” about online harassment

Lets Safely Dance

Andy Fleener - talk

Overview of safety concepts in large and complex systems.

Systems inherently unsafe
“Complex systems are trade-offs between multiple irreconcilable goals (e.g. safety and efficiency” - Sidney Dekker
Systems cause failures (not people)
Organizations are complex systems
Become comfortable with failure
Foster a culture of learning
Procedure vs Practive
- What’s written down, vs what actually happens
- Mind the gap - understand the difference
  - Study the practice
Organizational change has the biggest impact
Read:

Closing Keynote

Andrew Clay Shafer

Andrew Clay Shafer’s thoughts on devops.

I excluded a long introduction recounting history of devops

Obligatory Deming Quote - A bad system will beat a good person every time
Innovators, Imitators, Idiots. (Don’t be an idiot.)
inputs and outputs - conway’s law and its impact on org structure
devops - optimizing performance and minimizing suffering.. globally.
Incentives for those wearing pagers vs those paid to ship new features.
- If you wear a pager and are responsible when something breaks, you’ll probably prefer safety
The problem: local rationality (vs global)
- The information we have changes what we see
- Stimulus and response - the system has as big of impact as any individual
- burnout is a feature of a system
  - people in a bad mood have better judgement and attention to detail
  - Perhaps depressions isn’t a malfunction, but an adaption (scientific american)
    - depression is a feature
“I never wanted to be a programmer”
“Computers are pretty easy.. it just does what I say, that’s pretty awesome”
“I never wanted to be a sys admin”.. “I sure as hell never wanted to be a manager”
There’s a tightrope between dunning-kruget and imposter syndrome
Humans are hard wired prefer confidence to expertise (see - sales)
Everyone should read this book: Badass Making Users Awesome - Kathy Sierra
Systems make people awesome. No-one will overcome an unhealthy system.
Build better systems, keep learning, keep helping each other
The punishment for not participating in politics is being ruled by inferiors

Ignite Talks

Ignite talks are 5 minutes each and go very fast. The following are my brief impressions.

Stop Blogging About Women in Tech

Jenna Pederson

6 Actions you can take with regard to women in technology
- Provide a community (for women)
  - i.e. geekettes
- Provide a place to learn
  - i.e. GR8 Ladies
- (Self) Promote
  - i.e. twitter
- Actively Recruit
- Mentor
- Empower
  - i.e. HackTheGap

Effortless

Michael Lanyon - talk

Critical Mass experiment with web performance monitoring
webpagetest.org quantifies the end user experience
RUM - Real User Monitoring
- Navigation Timing - request lifecycle
- github.com/lanyonm/http-stats-collector
- JS error reporting: game changer
- Operationalize it: make it easy
- Chef details on deployment automation
Have team form a relationship with prod

Vulnerability

Larye Pohlman - talk

Empathy, vulnerability
Leadership is associated with vulnerability
Feel pain, show pain
Awkward moments shared with neighbors to make a point about sharing feelings with strangers

GameOps

Jason Clifford

Board games at work
Great for team building, culture, learning, improving, etc
Board games aren’t what they used to be
Organizing tips:
- Learn well enough to teach it first
- Be inclusive
- Accessible and inviting location
Game recommendations

DevOps and the Enterprise

Jason Walker

Target devops - Empathy Fairness and Contentment
Let time pass (after presenting new ideas)
Equal != Fair
Identify ad remove complication

Sports Stats 10

Daniel Willis

Note that Daniel is 12 years old!

Putting the R in sports

Installing R
Type R to start
R variables
Example using R to calculate ERAs
Reading files
Explaining standard deviation is hard (!)
vectors, era plots
Moneyball and sabermetrics
Using Lahman database for historical baseball stats

You, Me & StatsD

Mark Morris

Who does ‘tail -f production.log’?
multipurpose tool to gather information from logs without using tail.
Simple example from a shell
Enter statsd. “logging for metrics”
github.com/etsy/statsd
Metrics in buckets
statsd examples

If you want to have an impact, Devops is not enough

Sara Cowles

Segway example - great tech but little impact
Work hard at work worth doing
option 1: just build it
- bet the farm
option 2: build an MVP (minimally viable product)
- take a gamble
option 3: test assumptions
Be wrong as fast as you can
build -> measure -> learn feedback loop
empathy - the closest thing to a silver bullet

ChatOps

Jason Hand

Email should die
- 28% managing email
- 20% looking for information
20-25% increase in productivity by moving conversations to chat
i.e. trigger jenkins build from chat
i.e. incident managmeent
benefits
- learning
- sharing
- speed
- security
- brainstorming
- fun
Private chats are an anti-pattern - use shared spaces
- black box buckets nobody else benefits from
placing tools directly in the middle of conversation
Read ChatOps for Dummies (https://victorops.com/blog/chatops-for-dummies/)

Devops in the Machine

Matt Stratton

Pete Chesbot jokes

Open Spaces

For reference, here’s the list of open spaces.

Wednesday
- Session 1
  - Saltstack Best Practices
  - Empathy/Cybernetics
  - Config and app dep managmenet
  - GameOps
  - Working with product, UX, Marketing, other non-ops non-dev folks
  - CM on Windows
  - Introverts
  - Leadership in Tech
- Session 2
  - DevOps in Dev Environments
  - DevOps at Tiny Company
  - devops for nonprofits, orgs and low budget side projects
  - Training/Getting Buy In Socializing Holistic Thinking
  - DevOps Career Development
  - CI/CD Pipeline Toolingi
  - Building self-sevice IAAS
- Session 3
  - DevOps Crystal Ball
  - Conference Speaking Efforts @ your company
  - Where is SW/Test Departments in DevOps
  - More contributions back to Open Source
  - Security, DevTools & Monitoring
  - Useful bots in chatrooms
  - Blameless Post-mortems
  - Remote Teamwork
Thursday
- Session 1
  - Cookie Ops
  - app “herding” oragnize and manage state
  - arrested devops podcast
  - monitoring/incident managements
  - education + teaching comp sci
  - scaling elk
  - chatops
- Session 2
  - How to make devops haters devops supporters
  - Empathy: tactics, challenges, stories
  - do your own devopsdays
  - code sync for puppet
  - empowering product and design leaders
  - docker orchestration
  - cross the finish line: devops marathon
  - werewolf
- Session 3
  - sales and marketings place in devops
  - kanban vs devops
  - public conference post-mortem
  - cloud foundry
  - being blind / advocate for under represented groups
  - config driven monitoring/testing infrastructure
  - devsecops: doing, dreaming or what’s security doing here
  - mutable vs immutable infra