Hello World Canada: The Rise of AI

Bloomberg Businessweek presents an exclusive premiere of the latest episode of “Hello World,” the tech-travel show hosted by journalist and best-selling author Ashlee Vance and watched by millions of people around the globe. There’s an AI revolution sweeping across the world. Yet few people know the real story about where this technology came from and why it suddenly took off. In this ground-breaking episode of “Hello World,” the story of AI’s rise is told in detail for the first time, as journalist Ashlee Vance heads to the unexpected birthplace of the technology, Canada. (Source: Bloomberg)

https://www.bloomberg.com/api/embed/iframe?id=d68de08e-2860-4f4f-a119-9d9da769ccad

 

Advertisements

Full Cycle Developers at Netflix — Operate What You Build

The year was 2012 and operating a critical service at Netflix was laborious. Deployments felt like walking through wet sand. Canarying was devolving into verifying endurance (“nothing broke after one week of canarying, let’s push it”) rather than correct functionality. Researching issues felt like bouncing a rubber ball between teams, hard to catch the root cause and harder yet to stop from bouncing between one another. All of these were signs that changes were needed.

Fast forward to 2018. Netflix has grown to 125M global members enjoying 140M+ hours of viewing per day. We’ve invested significantly in improving the development and operations story for our engineering teams. Along the way we’ve experimented with many approaches to building and operating our services. We’d like to share one approach, including its pros and cons, that is relatively common within Netflix. We hope that sharing our experiences inspires others to debate the alternatives and learn from our journey.

One Team’s Journey

Edge Engineering is responsible for the first layer of AWS services that must be up for Netflix streaming to work. In the past, Edge Engineering had ops-focused teams and SRE specialists who owned the deploy+operate+support parts of the software life cycle. Releasing a new feature meant devs coordinating with the ops team on things like metrics, alerts, and capacity considerations, and then handing off code for the ops team to deploy and operate. To be effective at running the code and supporting partners, the ops teams needed ongoing training on new features and bug fixes. The primary upside of having a separate ops team was less developer interrupts when things were going well.

When things didn’t go well, the costs added up. Communication and knowledge transfers between devs and ops/SREs were lossy, requiring additional round trips to debug problems or answer partner questions. Deployment problems had a higher time-to-detect and time-to-resolve due to the ops teams having less direct knowledge of the changes being deployed. The gap between code complete and deployed was much longer than today, with releases happening on the order of weeks rather than days. Feedback went from ops, who directly experienced pains such as lack of alerting/monitoring or performance issues and increased latencies, to devs, who were hearing about those problems second-hand.

To improve on this, Edge Engineering experimented with a hybrid model where devs could push code themselves when needed, and also were responsible for off-hours production issues and support requests. This improved the feedback and learning cycles for developers. But, having only partial responsibility left gaps. For example, even though devs could do their own deployments and debug pipeline breakages, they would often defer to the ops release specialist. For the ops-focused people, they were motivated to do the day to day work but found it hard to prioritize automation so that others didn’t need to rely on them.

In search of a better way, we took a step back and decided to start from first principles. What were we trying to accomplish and why weren’t we being successful?

The Software Life Cycle

The purpose of the software life cycle is to optimize “time to value”; to effectively convert ideas into working products and services for customers. Developing and running a software service involves a full set of responsibilities. We had been segmenting these responsibilities. At an extreme, this means each functional area is owned by a different person/role:

AnalyticsAnywhere

SDLC components

These specialized roles create efficiencies within each segment while potentially creating inefficiencies across the entire life cycle. Specialists develop expertise in a focused area and optimize what’s needed for that area. They get more effective at solving their piece of the puzzle. But software requires the entire life cycle to deliver value to customers. Having teams of specialists who each own a slice of the life cycle can create silos that slow down end-to-end progress. Grouping differing specialists together into one team can reduce silos, but having different people do each role adds communication overhead, introduces bottlenecks, and inhibits the effectiveness of feedback loops.

Operating What You Build

To rethink our approach, we drew inspiration from the principles of the devops movement. We could optimize for learning and feedback by breaking down silos and encouraging shared ownership of the full software life cycle:

AnalyticsAnywhere

“Operate what you build” puts the devops principles in action by having the team that develops a system also be responsible for operating and supporting that system. Distributing this responsibility to each development team, rather than externalizing it, creates direct feedback loops and aligns incentives. Teams that feel operational pain are empowered to remediate the pain by changing their system design or code; they are responsible and accountable for both functions. Each development team owns deployment issues, performance bugs, capacity planning, alerting gaps, partner support, and so on.

Scaling Through Developer Tools

Ownership of the full development life cycle adds significantly to what software developers are expected to do. Tooling that simplifies and automates common development needs helps to balance this out. For example, if software developers are expected to manage rollbacks of their services, rich tooling is needed that can both detect and alert them of the problems as well as to aid in the rollback.

Netflix created centralized teams (e.g., Cloud Platform, Performance & Reliability Engineering, Engineering Tools) with the mission of developing common tooling and infrastructure to solve problems that every development team has. Those centralized teams act as force multipliers by turning their specialized knowledge into reusable building blocks. For example:

AnalyticsAnywhere

Empowered with these tools in hand, development teams can focus on solving problems within their specific product domain. As additional tooling needs arise, centralized teams assess whether the needs are common across multiple dev teams. When they are, collaborations ensue. Sometimes these local needs are too specific to warrant centralized investment. In that case the development team decides if their need is important enough for them to solve on their own.

Balancing local versus central investment in similar problems is one of the toughest aspects of our approach. In our experience the benefits of finding novel solutions to developer needs are worth the risk of multiple groups creating parallel solutions that will need to converge down the road. Communication and alignment are the keys to success. By starting well-aligned on the needs and how common they are likely to be, we can better match the investment to the benefits to dev teams across Netflix.

Full Cycle Developers

By combining all of these ideas together, we arrived at a model where a development team, equipped with amazing developer productivity tools, is responsible for the full software life cycle: design, development, test, deploy, operate, and support.

AnalyticsAnywhere

Full cycle developers are expected to be knowledgeable and effective in all areas of the software life cycle. For many new-to-Netflix developers, this means ramping up on areas they haven’t focused on before. We run dev bootcamps and other forms of ongoing training to impart this knowledge and build up these skills. Knowledge is necessary but not sufficient; easy-to-use tools for deployment pipelines (e.g., Spinnaker) and monitoring (e.g., Atlas) are also needed for effective full cycle ownership.

Full cycle developers apply engineering discipline to all areas of the life cycle. They evaluate problems from a developer perspective and ask questions like “how can I automate what is needed to operate this system?” and “what self-service tool will enable my partners to answer their questions without needing me to be involved?” This helps our teams scale by favoring systems-focused rather than humans-focused thinking and automation over manual approaches.

Moving to a full cycle developer model requires a mindset shift. Some developers view design+development, and sometimes testing, as the primary way that they create value. This leads to the anti-pattern of viewing operations as a distraction, favoring short term fixes to operational and support issues so that they can get back to their “real job”. But the “real job” of full cycle developers is to use their software development expertise to solve problems across the full life cycle. A full cycle developer thinks and acts like an SWE, SDET, and SRE. At times they create software that solves business problems, at other times they write test cases for that, and still other times they automate operational aspects of that system.

For this model to succeed, teams must be committed to the value it brings and be cognizant of the costs. Teams need to be staffed appropriately with enough headroom to manage builds and deployments, handle production issues, and respond to partner support requests. Time needs to be devoted to training. Tools need to be leveraged and invested in. Partnerships need to be fostered with centralized teams to create reusable components and solutions. All areas of the life cycle need to be considered during planning and retrospectives. Investments like automating alert responses and building self-service partner support tools need to be prioritized alongside business projects. With appropriate staffing, prioritization, and partnerships, teams can be successful at operating what they build. Without these, teams risk overload and burnout.

To apply this model outside of Netflix, adaptations are necessary. The common problems across your dev teams are likely similar — from the need for continuous delivery pipelines, monitoring/observability, and so on. But many companies won’t have the staffing to invest in centralized teams like at Netflix, nor will they need the complexity that Netflix’s scale requires. Netflix’s tools are often open source, and it may be compelling to try them as a first pass. However, other open source and SaaS solutions to these problems can meet most companies needs. Start with analysis of the potential value and count the costs, followed by the mindset-shift. Evaluate what you need and be mindful of bringing in the least complexity necessary.

Trade-offs

The tech industry has a wide range of ways to solve development and operations needs (see devops topologies for an extensive list). The full cycle model described here is common at Netflix, but has its downsides. Knowing the trade-offs before choosing a model can increase the chance of success.

With the full cycle model, priority is given to a larger area of ownership and effectiveness in those broader domains through tools. Breadth requires both interest and aptitude in a diverse range of technologies. Some developers prefer focusing on becoming world class experts in a narrow field and our industry needs those types of specialists for some areas. For those experts, the need to be broad, with reasonable depth in each area, may be uncomfortable and sometimes unfulfilling. Some at Netflix prefer to be in an area that needs deep expertise without requiring ongoing breadth and we support them in finding those roles; others enjoy and welcome the broader responsibilities.

In our experience with building and operating cloud-based systems, we’ve seen effectiveness with developers who value the breadth that owning the full cycle requires. But that breadth increases each developer’s cognitive load and means a team will balance more priorities every week than if they just focused on one area. We mitigate this by having an on-call rotation where developers take turns handling the deployment + operations + support responsibilities. When done well, that creates space for the others to do the focused, flow-state type work. When not done well, teams devolve into everyone jumping in on high-interrupt work like production issues, which can lead to burnout.

Tooling and automation help to scale expertise, but no tool will solve every problem in the developer productivity and operations space. Netflix has a “paved road” set of tools and practices that are formally supported by centralized teams. We don’t mandate adoption of those paved roads but encourage adoption by ensuring that development and operations using those technologies is a far better experience than not using them. The downside of our approach is that the ideal of “every team using every feature in every tool for their most important needs” is near impossible to achieve. Realizing the returns on investment for our centralized teams’ solutions requires effort, alignment, and ongoing adaptations.

Conclusion

The path from 2012 to today has been full of experiments, learning, and adaptations. Edge Engineering, whose earlier experiences motivated finding a better model, is actively applying the full cycle developer model today. Deployments are routine and frequent, canaries take hours instead of days, and developers can quickly research issues and make changes rather than bouncing the responsibilities across teams. Other groups are seeing similar benefits. However, we’re cognizant that we got here by applying and learning from alternate approaches. We expect tomorrow’s needs to motivate further evolution.

Source: Medium.com

How to Become a Data Scientist

AnalyticsAnywhere

All such roads lead to the same destination: a job assembling, analyzing and interpreting large data sets to look for information of interest or value.

Data science encompasses “Big Data,” data analytics, business intelligence and more. Data science is becoming a vital discipline in IT because it enables businesses to extract value about the many kinds and large amounts of data they collect in doing whatever it is that they do. For those who do business with customers, it lets them learn more about those customers.

For those who maintain a supply chain, it helps them to understand more and better ways to request, acquire and manage supply components. For those who follow (or try to anticipate) markets – such as financials, commodities, employment and so forth – it helps them construct more accurate and insightful models for such things. The applications for data science are limited only by our ability to conceive of uses to which data may be put – limitless, in other words.

In fact, no matter where you look for data, if large amounts of information are routinely collected and stored, data science can play a role. It can probably find something useful or interesting to say about such collections, if those who examine them can frame and process the right kinds of queries against that data. That’s what explains the increasing and ongoing value of data science for most companies and organizations, since all of them routinely collect and maintain various kinds of data nowadays.

Basic Educational Background

The basic foundation for a long-lived career in IT for anybody getting started is to pursue a bachelor’s degree in something computing related. This usually means a degree in computer science, management information systems (MIS), computer engineering, informatics or something similar. Plenty of people transition in from other fields, to be sure, but the more math and science under one’s belt when making that transition, the easier that adjustment will be. Given projected shortages of IT workers, especially in high demand subject areas – which not only include data science, but also networking, security, software development, IT architecture and its various specialty areas, virtualization, and more – it’s hard to go wrong with this kind of career start.

For data scientists, a strong mathematics background, particularly in statistics and analysis, is strongly recommended, if not outright required. This goes along naturally with an equally strong academic foundation in computing. Those willing to slog through to a master’s or Ph.D. before entering the workforce may find data science a particularly appealing and remunerative field of study when that slog comes to its end. If so, they can also jump directly into mid- or expert/senior level career steps, respectively.

Early Career Work Focus and Experience

If data science is a long-term goal, the more experience one has in working with data, the better. Traditional paths into data science may start directly in that field, though many IT professionals also cross over from programming, analyst or database positions.

Much of the focus in data science comes from working with so-called “unstructured data” – a term used to describe collections of information usually stored outside a database such as large agglomerations of event or security logs, e-mail messages, customer feedback responses, other text repositories and so forth. Thus, many IT pros find it useful to dig into technologies such as NoSQL and data platforms such as Hadoop, Cloudera and MongoDB. That’s because working with unstructured data is an increasingly large part of what data scientists do. Early-stage career IT pros will usually wind up focusing on programming for big data environments, or working under the direction of more senior staff to groom and prepare big data sets for further interrogation and analysis.

At this early stage of one’s career, exposure to text-oriented programming and basic pattern-matching or query formulation is a must, along with a strong and expanding base of coding, testing and code maintenance experience. Development of basic soft skills in oral and written communications is a good idea, as is some exposure to basic business intelligence and analysis principles and practices. This leads directly into the early-career certifications mentioned in the next section.

Early-Career Network Certifications and Learning

Basic data science training is now readily available online in the form of massively open online courses, or MOOCs. Among the many offerings currently available, the January 2017 Quora article “What is the best MOOC to get started in Data Science?” offers a variety of answers, and lists courses from sources such as Duke (Coursera), MIT, Caltech, and the Indian Institute of Management and Business (edX), Stanford, and more. MS has since instituted a Microsoft Professional Program in Data Science that includes nine courses on a variety of related topics and a capstone project to present a reasonably complete introductory curriculum on this subject matter. (Courses aren’t free, but at $99 each, they are fairly inexpensive.)

Mid-career Work Focus and Experience

Data science is a big subject area, so by the time you’ve spent three to five years in the workforce and have started to zero-in on a career path, you’ll also start narrowing in on one or more data science specialties and platforms. These include areas such as big data programming, analysis, business intelligence and more. Any or all of them can put you into a front-line data science job of some kind, even as you narrow your focus on the job.

This is the career stage at which you’ll develop increasing technical skills and knowledge, as you also start to gain more seniority and responsibility among your peers. Soft skills become more important mid-career as well, because you’ll have to start drawing on your abilities to communicate with and lead or guide others (primarily on technical subjects related to data science and its outputs or results) during this career phase.

Mid-career Network Certifications

This is a time for professional growth and specialization. That’s why there is a much broader array of topics and areas to consider as one digs deeper into data science to develop more focused and intense technical skills and knowledge. Data science-related certifications can really help with this but will require some careful research and consideration. Thus, for example, one person might decide to dig into certifications related to a particular big data platform or toolset – such as the Certified Analytics Professional, MongoDB, Dell/EMC, Microsoft, Oracle or SAS.

This is a point at which one might choose to specialize more in big data programming for Hadoop, Cloudera or MongoDB on the one hand, or in running analyses and interpreting results from specific big data sets on the other. Cloudera covers most of these bases all by itself, which makes its offerings worth checking out: among many other certifications, they have Data Scientist, Data Engineer, Spark and Hadoop Developer and Administrator for Apache Hadoop credentials. There are dozens of Big Data certifications available today, with more coming online all the time, so you’ll have to follow your technical interests and proclivities to learn more about which ones are right for you.

Expert or Senior Level Work Focus and Experience

After 10 or more years in the workforce, it’s time to get serious about data science/Big Data. This is the point at which most IT professionals start reaching for higher rungs on the job role and responsibilities ladder.

Jobs with such titles as senior data analyst, senior business intelligence analyst, senior data scientist, big data platform specialist (where you can plug in the name of your chosen platform in searching for opportunities), senior big data developer, and so forth, represent the kinds of positions that data science pros are likely to occupy at the point on the career ladder. Expert or senior level IT pros will often be spearheading project teams of varying sizes by this point on the career line as well, even if their jobs don’t carry a specific management title or overt management responsibilities. This means that soft skills are even more important with an increasing emphasis on leadership and vision, along with skills in people and project management, plus oral and written communications.

Expert or Senior Level Big Data Certifications

This is the career step at which one typically climbs near or to the top of most technical certification ladders. Many of these credentials – such as the SAS “Advanced Analytics” credentials (four at present) – actually include the term “advanced” or “expert” in their certification monikers.

The SAS Institute and Dell/EMC, in particular, have rich and deep certification programs, with various opportunities for interested data scientists or Big Data folks to specialize and develop their skills and knowledge. Database platform vendors, such as Oracle, IBM and Microsoft are also starting to recognize the potential and importance of Big Data and are adding related elements to their certification programs all the time. Because this field is still relatively young and new cert programs are still coming online, the shape of the high end of the cert landscape for Big Data is very much a work in progress.

Whatever Big Data platform or specialty you choose to pursue, this is the career stage where deep understanding of the principals and practices in the field and an understanding of their business impact and value must begin to combine. It is also where people must focus on their soft skills at the highest level, because senior data scientists or Big Data experts must be able to lead teams of high-level individuals in the organizations they serve, including top executives, high-level managers, and other technical experts and consultants. As you might expect, this kind of work is as much about soft skills in communication and leadership as it is about in-depth technical knowledge and ability.

Continuing Education: Master’s or PhD?

Depending on where you are in terms of work experience, family situation and finances, it may be worth considering a master’s degree with a focus on data science or some other aspect of Big Data as a profound developmental step for career development. For most working adults, this will mean getting into a part-time or online advanced degree program.

Many such programs are available, but you’ll want to consider the name recognition value and the cost of those offerings when choosing a degree plan to pursue. If pursued later in life (after one’s 20s), a Ph.D. is probably only attainable for someone with strong interests in research or teaching. That means a Ph.D. is not an option for most readers unless they plan and budget for a lengthy interruption in their working lives (most doctorate programs require full-time attendance on campus, and take from three to six years to complete).

With proper education, certification, planning and experience, working as a data scientist, or in some other Big Data role, is an achievable goal. It will take at least three to five years for entry-level IT professionals to work their way into such a position (less for those with more experience or an advanced degree in the field), but it’s a job that offers high pay and one that is expected to stay in high demand for the foreseeable future. Because the amount of data stored in the world is only increasing year over year, this appears to be a good specialty area in IT that’s long on opportunity and growth potential.

Source: Business News Daily