The Data Science Process

The Data Science Process is a framework for approaching data science tasks, and is crafted by Joe Blitzstein and Hanspeter Pfister of Harvard’s CS 109. The goal of CS 109, as per Blitzstein himself, is to introduce students to the overall process of data science investigation, a goal which should provide some insight into the framework itself.

analytics anywhere

The following is a sample application of Blitzstein & Pfister’s framework, regarding skills and tools at each stage, as given by Ryan Fox Squire in his answer:

Stage 1: Ask A Question
Skills: science, domain expertise, curiosity
Tools: your brain, talking to experts, experience

Stage 2: Get the Data
Skills: web scraping, data cleaning, querying databases, CS stuff
Tools: python, pandas

Stage 3: Explore the Data
Skills: Get to know data, develop hypotheses, patterns? anomalies?
Tools: matplotlib, numpy, scipy, pandas, mrjob

Stage 4: Model the Data
Skills: regression, machine learning, validation, big data
Tools: scikits learn, pandas, mrjob, mapreduce

Stage 5: Communicate the Data
Skills: presentation, speaking, visuals, writing
Tools: matplotlib, adobe illustrator, powerpoint/keynote

Squire then (rightfully) concludes that the data science work flow is a non-linear, iterative process, and that there are many skills and tools required to cover the full data science process. Squire also professes that he is fond of the Data Science Process as it stresses both the importance of asking questions to guide your workflow, and the importance of iterating on your questions and research, as one gains familiarity with one’s data.

The Data Science Framework is an innovative framework for approaching data science problems. Isn’t it?

Source: kdnuggets.com

Advertisements

Top 5 tips for businesses implementing RPA

To remain competitive, businesses must digitalise their operations. One way to speed up this process and improve returns is through robotic process automation

Robotic process automation (RPA) is gaining popularity as enterprises discover new ways to drive business impact and speed up digital transformation. RPA is software that mimics how humans use applications to process transactions, harness data and communicate with other systems.
RPA can provide businesses with fast returns on investment by automating manual data processes, freeing up employees for more value-added tasks and improving operational and cost efficiencies. For enterprises that are digitally transforming their operations, RPA software is becoming fundamental to improving productivity, compliance and competitive advantage. The new economy outlines the top five tips businesses should consider for successful RPA implementation:

Start small and learn
By starting their digitalisation transformation with RPA, enterprises can more effectively plan how they will tackle the process. For example, some businesses may look to identify and automate the lengthiest and most repetitive tasks first, before replicating this in other processes. Others may consider automating operations that can impact a specific function.
RPA is at its most effective when combined with other technologies, but starting small and automating simple, tedious and repetitive processes is a good way to develop a strategy that can be implemented more broadly later on.

Implement holistically
Robotics not only enables companies to automate time consuming human processes, it also promotes innovation in terms of how the business is run and what services are offered to customers. For this reason, senior executives should approach RPA implementation holistically, keeping end-to-end processes in mind and looking for opportunities to exploit machine learning and analytics.
Robotics not only enables companies to automate time consuming human processes, it also promotes innovation
For example, an employee might manually open an email and a PDF attachment, review an invoice and enter that information into a software system. RPA streamlines this process, entering invoice amounts into the system faster and more accurately. In doing so, the software creates actionable business insights that can be used to improve process performance even further.

Consider your workforce
Traditionally, business process operations are intensive, service level-focused environments dealing with planned and unplanned peaks, seasonal variations and exhaustive end-month and end-week periods.
When RPA is introduced, this working environment changes dramatically. Automated processes not only work faster than their human counterparts but they can work 24/7 and automatically scale up to deal with peak demand and scale down in periods of lower intensity. There needs to be a shift away from traditional workforce management and rostering, towards a more skilled, customer-focused, value-adding workforce.

Train bots like humans
Treating software bots as human staff is not as farfetched as it sounds. To achieve the most with RPA, businesses should first give their bots something small and task-orientated to work on. Throughout the bots’ lifecycle, it is important to carry out maintenance and reviews, just as you would with human staff evaluations.
Enterprises should make bots just as accountable to the business as the human workforce is. The most successful RPA implementers have maintained this clarity of accountability.

Establish automation governance
Enterprises need to establish automation governance systems as an extension of corporate governance. To ensure robotic processes comply with regulatory controls, businesses must closely monitor and manage digital workers. Businesses must also ensure they have a clear understanding of what robotics laws are in place and how to effectively comply with them.
RPA is an important asset in any enterprise’s digital transformation. By implementing automation with time-intensive, manual, administrative tasks first, companies can learn from RPA-enabled processes and replicate those successes elsewhere in the business. Combining RPA with a broader set of technological tools ultimately improves business outcomes across end-to-end processes.

Source: theneweconomy.com

Top 7 Data Science Use Cases in Finance

AnalyticsAnywhere

In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.

AnalyticsAnywhere2

In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.

Among the most important applications of data science and artificial intelligence (AI) in risk management is identifying the creditworthiness of potential customers. To establish the appropriate credit amount for a particular customer, companies use machine learning algorithms that can analyze past spending behavior and patterns. This approach is also useful while working with new customers or the ones with a brief credit history.

Although digitalization and automatization of risk management processes in finance are in the early stages, the potential is extremely huge. Financial institutions still need to prepare for this change by automating core financial processes, improving analytical skills of the finance team, and making strategic technology investments. But as soon as the company starts to move in this direction, the profit will not make itself wait.

Managing customer data

For financial firms, data is the most important resource. Therefore, efficient data management is a key to business success. Today, there is a massive volume of financial data diversity in structure and volume: from social media activity and mobile interactions to market data and transaction details. Financial specialists often have to work with semi-structured or unstructured data and there is a big challenge to process it manually.

However, it’s obvious for most companies that integrating machine learning techniques to managing process is simply a necessity to extract real intelligence from data. AI tools, in particular, natural language processing, data mining, and text analytics, help to transform data into information contributing in smarter data governance and better business solutions, and as a result – increased profitability. For instance, machine learning algorithms can analyze the influence of some specific financial trends and market developments by learning from customers financial historical data. Finally, these techniques can be used to generate automated reports.

Predictive analytics

Analytics is now at the core of financial services. Special attention deserves predictive analytics that reveals patterns in the data that foresee the future event that can be acted upon now. Through understanding social media, news trends, and other data sources these sophisticated analytics conquered the main applications such as predicting prices and customers lifetime value, future life events, anticipated churn, and the stock market moves. Most importantly such techniques can help answer the complicated question – how best to intervene.

Real-time analytics

Real-time analytics fundamentally transform financial processes by analyzing large amounts of data from different sources and quickly identifying any changes and finding the best reaction to them. There are 3 main directions for real-time analytics application in finance:

Fraud detection

It’s an obligation for financial firms to guarantee the highest level of security to its users. The main challenge for companies is to find a good fraud detecting system with criminals always hacking new ways and setting up new traps. Only qualified data scientists can create perfect algorithms for detection and prevention of any anomalies in user behavior or ongoing working processes in this diversity of frauds. For instance, alerts for unusual financial purchases for a particular user, or large cash withdrawals will lead to blocking those actions, until the customer confirms them. In the stock market, machine learning tools can identify patterns in trading data that might indicate manipulations and alert staff to investigate. However, the greatest thing of such algorithms is the ability of self-teaching, becoming more and more effective and intelligent over time.

Consumer analytics

Real-time analytics also help with better understanding of customers and effective personalization. Sophisticated machine learning algorithms and customer sentiment analysis techniques can generate insights from clients behavior, social media interaction, their feedbacks and opinions and improve personalization and enhance the profit. Since the amount of data is enormously huge, only experienced data scientists can make precise breakdown.

Algorithmic trading

This area probably has the biggest impact from real-time analytics since every second is at stake here. Based on the most recent information from analyzing both traditional and non-traditional data, financial institutions can make real-time beneficial decisions. And because this data is often only valuable for a short time, being competitive in this sector means having the fastest methods of analyzing it.

Another prospective opens when combining real-time and predictive analytics in this area. It used to be a popular practice for financial companies have to hire mathematicians who can develop statistical models and use historical data to create trading algorithms that forecast market opportunities. However, today artificial intelligence offers techniques to make this process faster and what is especially important – constantly improving.

Therefore, data science and AI made a revolution in the trading sector, starting up the algorithmic trading strategies. Most world exchanges use computers that make decisions based on algorithms and correct strategies taking into account new data. Artificial intelligence infinitely processes tons of information, including tweets, financial indicators, data from news and books, and even TV programs. Consequently, it understands today’s worldwide trends and continuously enhances the predictions about financial markets.

All in all, real-time and predictive analytics significantly change the situation in different financial areas. With technologies such as Hadoop, NoSQL and Storm, traditional and non-traditional datasets, and the most precise algorithms, data engineers are changing the way finance used to work.

Deep personalization and customization

Firms realize that one of the key steps to being competitive in today’s market is to raise engagement through high-quality, personalized relationships with their customers. The idea is to analyze digital client experience and modify it taking into account client’s interests and preferences. AI is making significant improvements in understanding human language and emotion, which brings customer personalization to a whole new level. Data engineers can also build models that study the consumers’ behavior and discover situations where customers needed financial advice. The combination of predictive analytic tools and advanced digital delivery options can help with this complicated task, guiding the customer to the best financial solution at the most opportune time and suggesting personalize offerings based on spending habits, social-demographic trends, location, and other preferences.

Conclusion

For financial institutions, the usage of data science techniques provides a huge opportunity to stand out from the competition and reinvent their businesses. There are vast amounts of continuously changing financial data which creates a necessity for engaging machine learning and AI tools into different aspects of the business.

We focused on the top 7 data science use cases in the finance sector in our opinion, but there are many others that also deserve to be mentioned. If you have any further ideas, please share your vision in the comment section.

Source:activewizards.com

Hello World Canada: The Rise of AI

Bloomberg Businessweek presents an exclusive premiere of the latest episode of “Hello World,” the tech-travel show hosted by journalist and best-selling author Ashlee Vance and watched by millions of people around the globe. There’s an AI revolution sweeping across the world. Yet few people know the real story about where this technology came from and why it suddenly took off. In this ground-breaking episode of “Hello World,” the story of AI’s rise is told in detail for the first time, as journalist Ashlee Vance heads to the unexpected birthplace of the technology, Canada. (Source: Bloomberg)

https://www.bloomberg.com/api/embed/iframe?id=d68de08e-2860-4f4f-a119-9d9da769ccad

 

Full Cycle Developers at Netflix — Operate What You Build

The year was 2012 and operating a critical service at Netflix was laborious. Deployments felt like walking through wet sand. Canarying was devolving into verifying endurance (“nothing broke after one week of canarying, let’s push it”) rather than correct functionality. Researching issues felt like bouncing a rubber ball between teams, hard to catch the root cause and harder yet to stop from bouncing between one another. All of these were signs that changes were needed.

Fast forward to 2018. Netflix has grown to 125M global members enjoying 140M+ hours of viewing per day. We’ve invested significantly in improving the development and operations story for our engineering teams. Along the way we’ve experimented with many approaches to building and operating our services. We’d like to share one approach, including its pros and cons, that is relatively common within Netflix. We hope that sharing our experiences inspires others to debate the alternatives and learn from our journey.

One Team’s Journey

Edge Engineering is responsible for the first layer of AWS services that must be up for Netflix streaming to work. In the past, Edge Engineering had ops-focused teams and SRE specialists who owned the deploy+operate+support parts of the software life cycle. Releasing a new feature meant devs coordinating with the ops team on things like metrics, alerts, and capacity considerations, and then handing off code for the ops team to deploy and operate. To be effective at running the code and supporting partners, the ops teams needed ongoing training on new features and bug fixes. The primary upside of having a separate ops team was less developer interrupts when things were going well.

When things didn’t go well, the costs added up. Communication and knowledge transfers between devs and ops/SREs were lossy, requiring additional round trips to debug problems or answer partner questions. Deployment problems had a higher time-to-detect and time-to-resolve due to the ops teams having less direct knowledge of the changes being deployed. The gap between code complete and deployed was much longer than today, with releases happening on the order of weeks rather than days. Feedback went from ops, who directly experienced pains such as lack of alerting/monitoring or performance issues and increased latencies, to devs, who were hearing about those problems second-hand.

To improve on this, Edge Engineering experimented with a hybrid model where devs could push code themselves when needed, and also were responsible for off-hours production issues and support requests. This improved the feedback and learning cycles for developers. But, having only partial responsibility left gaps. For example, even though devs could do their own deployments and debug pipeline breakages, they would often defer to the ops release specialist. For the ops-focused people, they were motivated to do the day to day work but found it hard to prioritize automation so that others didn’t need to rely on them.

In search of a better way, we took a step back and decided to start from first principles. What were we trying to accomplish and why weren’t we being successful?

The Software Life Cycle

The purpose of the software life cycle is to optimize “time to value”; to effectively convert ideas into working products and services for customers. Developing and running a software service involves a full set of responsibilities. We had been segmenting these responsibilities. At an extreme, this means each functional area is owned by a different person/role:

AnalyticsAnywhere

SDLC components

These specialized roles create efficiencies within each segment while potentially creating inefficiencies across the entire life cycle. Specialists develop expertise in a focused area and optimize what’s needed for that area. They get more effective at solving their piece of the puzzle. But software requires the entire life cycle to deliver value to customers. Having teams of specialists who each own a slice of the life cycle can create silos that slow down end-to-end progress. Grouping differing specialists together into one team can reduce silos, but having different people do each role adds communication overhead, introduces bottlenecks, and inhibits the effectiveness of feedback loops.

Operating What You Build

To rethink our approach, we drew inspiration from the principles of the devops movement. We could optimize for learning and feedback by breaking down silos and encouraging shared ownership of the full software life cycle:

AnalyticsAnywhere

“Operate what you build” puts the devops principles in action by having the team that develops a system also be responsible for operating and supporting that system. Distributing this responsibility to each development team, rather than externalizing it, creates direct feedback loops and aligns incentives. Teams that feel operational pain are empowered to remediate the pain by changing their system design or code; they are responsible and accountable for both functions. Each development team owns deployment issues, performance bugs, capacity planning, alerting gaps, partner support, and so on.

Scaling Through Developer Tools

Ownership of the full development life cycle adds significantly to what software developers are expected to do. Tooling that simplifies and automates common development needs helps to balance this out. For example, if software developers are expected to manage rollbacks of their services, rich tooling is needed that can both detect and alert them of the problems as well as to aid in the rollback.

Netflix created centralized teams (e.g., Cloud Platform, Performance & Reliability Engineering, Engineering Tools) with the mission of developing common tooling and infrastructure to solve problems that every development team has. Those centralized teams act as force multipliers by turning their specialized knowledge into reusable building blocks. For example:

AnalyticsAnywhere

Empowered with these tools in hand, development teams can focus on solving problems within their specific product domain. As additional tooling needs arise, centralized teams assess whether the needs are common across multiple dev teams. When they are, collaborations ensue. Sometimes these local needs are too specific to warrant centralized investment. In that case the development team decides if their need is important enough for them to solve on their own.

Balancing local versus central investment in similar problems is one of the toughest aspects of our approach. In our experience the benefits of finding novel solutions to developer needs are worth the risk of multiple groups creating parallel solutions that will need to converge down the road. Communication and alignment are the keys to success. By starting well-aligned on the needs and how common they are likely to be, we can better match the investment to the benefits to dev teams across Netflix.

Full Cycle Developers

By combining all of these ideas together, we arrived at a model where a development team, equipped with amazing developer productivity tools, is responsible for the full software life cycle: design, development, test, deploy, operate, and support.

AnalyticsAnywhere

Full cycle developers are expected to be knowledgeable and effective in all areas of the software life cycle. For many new-to-Netflix developers, this means ramping up on areas they haven’t focused on before. We run dev bootcamps and other forms of ongoing training to impart this knowledge and build up these skills. Knowledge is necessary but not sufficient; easy-to-use tools for deployment pipelines (e.g., Spinnaker) and monitoring (e.g., Atlas) are also needed for effective full cycle ownership.

Full cycle developers apply engineering discipline to all areas of the life cycle. They evaluate problems from a developer perspective and ask questions like “how can I automate what is needed to operate this system?” and “what self-service tool will enable my partners to answer their questions without needing me to be involved?” This helps our teams scale by favoring systems-focused rather than humans-focused thinking and automation over manual approaches.

Moving to a full cycle developer model requires a mindset shift. Some developers view design+development, and sometimes testing, as the primary way that they create value. This leads to the anti-pattern of viewing operations as a distraction, favoring short term fixes to operational and support issues so that they can get back to their “real job”. But the “real job” of full cycle developers is to use their software development expertise to solve problems across the full life cycle. A full cycle developer thinks and acts like an SWE, SDET, and SRE. At times they create software that solves business problems, at other times they write test cases for that, and still other times they automate operational aspects of that system.

For this model to succeed, teams must be committed to the value it brings and be cognizant of the costs. Teams need to be staffed appropriately with enough headroom to manage builds and deployments, handle production issues, and respond to partner support requests. Time needs to be devoted to training. Tools need to be leveraged and invested in. Partnerships need to be fostered with centralized teams to create reusable components and solutions. All areas of the life cycle need to be considered during planning and retrospectives. Investments like automating alert responses and building self-service partner support tools need to be prioritized alongside business projects. With appropriate staffing, prioritization, and partnerships, teams can be successful at operating what they build. Without these, teams risk overload and burnout.

To apply this model outside of Netflix, adaptations are necessary. The common problems across your dev teams are likely similar — from the need for continuous delivery pipelines, monitoring/observability, and so on. But many companies won’t have the staffing to invest in centralized teams like at Netflix, nor will they need the complexity that Netflix’s scale requires. Netflix’s tools are often open source, and it may be compelling to try them as a first pass. However, other open source and SaaS solutions to these problems can meet most companies needs. Start with analysis of the potential value and count the costs, followed by the mindset-shift. Evaluate what you need and be mindful of bringing in the least complexity necessary.

Trade-offs

The tech industry has a wide range of ways to solve development and operations needs (see devops topologies for an extensive list). The full cycle model described here is common at Netflix, but has its downsides. Knowing the trade-offs before choosing a model can increase the chance of success.

With the full cycle model, priority is given to a larger area of ownership and effectiveness in those broader domains through tools. Breadth requires both interest and aptitude in a diverse range of technologies. Some developers prefer focusing on becoming world class experts in a narrow field and our industry needs those types of specialists for some areas. For those experts, the need to be broad, with reasonable depth in each area, may be uncomfortable and sometimes unfulfilling. Some at Netflix prefer to be in an area that needs deep expertise without requiring ongoing breadth and we support them in finding those roles; others enjoy and welcome the broader responsibilities.

In our experience with building and operating cloud-based systems, we’ve seen effectiveness with developers who value the breadth that owning the full cycle requires. But that breadth increases each developer’s cognitive load and means a team will balance more priorities every week than if they just focused on one area. We mitigate this by having an on-call rotation where developers take turns handling the deployment + operations + support responsibilities. When done well, that creates space for the others to do the focused, flow-state type work. When not done well, teams devolve into everyone jumping in on high-interrupt work like production issues, which can lead to burnout.

Tooling and automation help to scale expertise, but no tool will solve every problem in the developer productivity and operations space. Netflix has a “paved road” set of tools and practices that are formally supported by centralized teams. We don’t mandate adoption of those paved roads but encourage adoption by ensuring that development and operations using those technologies is a far better experience than not using them. The downside of our approach is that the ideal of “every team using every feature in every tool for their most important needs” is near impossible to achieve. Realizing the returns on investment for our centralized teams’ solutions requires effort, alignment, and ongoing adaptations.

Conclusion

The path from 2012 to today has been full of experiments, learning, and adaptations. Edge Engineering, whose earlier experiences motivated finding a better model, is actively applying the full cycle developer model today. Deployments are routine and frequent, canaries take hours instead of days, and developers can quickly research issues and make changes rather than bouncing the responsibilities across teams. Other groups are seeing similar benefits. However, we’re cognizant that we got here by applying and learning from alternate approaches. We expect tomorrow’s needs to motivate further evolution.

Source: Medium.com