Monetizing Data: 4 Datasets You Need for More Reliable Forecasting

In the era of big data, the focus has long been on data collection and organization. But despite having access to more data than ever before, companies today are reporting a low return on their investment in analytics. Something’s not working. Today, business leaders are caught up in concerns that they don’t have enough data, it’s not accessible or it simply isn’t good enough. Instead of focusing on making data sources bigger or better, companies should be thinking about how they can get more out of the data they already have.

Contrary to popular belief, a high volume of perfect data isn’t necessary to drive strategic insight and action. While that might have been the case with time-series analysis, forecasting using simulation allows companies to do more with less. With simulation software, you aren’t constrained by the hard data points you have for every input; it allows you to enter both qualitative and quantitative information, so you can use human intelligence to make estimates that are later validated for accuracy with observable outcomes. Companies can then use these simulations to test how the market will respond to strategic initiatives by quickly running scenarios before launch. Also, most businesses already have enough collective intelligence within their organization to create a reliable, predictive simulation.

By unifying analytics, building forecasts and accelerating analytic processes, simulation helps companies build a holistic picture of their business to optimize strategy and maximize revenue. Here are the four types of information that companies need to fuel simulation forecasting and monetize their data investments:

1. Sales Data: Define success

The first set of information needed for simulation forecasting is sales data. In building a simulation model, sales data is used to define the market by establishing the outcome you’re trying to influence. That said, simulations can forecast more than sales outcomes in terms of revenue – they can also simulate a variety of other outcomes tied to sales such as new subscribers, website visits, online application submissions or program enrollments. Whatever the outcome is that you’re measuring, it’s helpful to have the information broken out by segment. If you don’t have this level of detail to start, you can continue to integrate new data into the model to make it more comprehensive over time.

2. Competitive Data: Paint a full picture of your market

With simulation forecasting, you are recreating an entire market so you can test how your solution will play out amongst competitors. In order to understand how people within a certain category respond to all of the choices available to them, you will need sales and marketing information for your competition. Competitor data is usually accessible from syndicated sources. If you don’t have access to competitor data, you can use approximate information available from public sources, annual reports or analyses from business experts to build out the competitive market in your simulation.

3. Customer Data: Understand how your consumer thinks

The third area of information needed for simulation is customer intelligence. In order to predict the likelihood a consumer will choose one option instead of another, you need to understand how they think. This requires information around awareness, perceptions and the relative importance of different attributes in driving a decision. These datasets are often collected and available through surveys. But even if there isn’t data from a quantitative study, your brand experts can use their judgment to make initial estimates of these values, and the values will later be verified through calibration and forecasting of observed metrics like sales.

4. Marketing Data: Evaluate the impact of in-market strategies

Finally, to drive simulation forecasting, companies need data on past marketing activity. This information is essential to understand how messaging in the market has influenced consumer decision making. This can be as simple as marketing investments and impressions broken out by paid, owned and earned activity, or it can be as granular as the tactics and specific media channels within each area.

Once a company identifies sources for these four types of data, it’s time to find an effective way to monetize it. The best way to get value from your big data is to identify unanswered business questions. With simulation forecasting, reliable answers are accessible – and you may need less data than you think to get meaningful, trustworthy insight.

Source: InsideBIGDATA


Understanding Data Roles

AnalyticsAnywhereWith the rise of Big Data has come the accompanying explosion in roles that in some way involve data. Most who are in any way involved with enterprise technology are at least familiar with them by name, but sometimes it’s helpful to look at them through a comprehensive lens that shows us how they all fit together. In understanding how data roles mesh, think about them in terms of two pools: one is responsible for making data ready for use, and another one that puts that data to use. The latter function includes the tightly-woven roles of Data Analysts and Data Scientist, and the former includes such roles as Database Administrator, Data Architect and Data Governance Manager.

Ensuring the data is ready for use

Making Sure the Engine Works.

A car is only as good as its engine, and according to PC Magazine the Database Administrator (DBA), is “responsible for the physical design and management of the database and for the evaluation, selection and implementation of the DBMS.” Techopedia defines the position as one that “directs or performs all activities related to maintaining a successful database environment.” A DBA’s responsibilities include security, optimization, monitoring and troubleshooting, and ensuring the needed capacity to support activities. This of course requires a high level of technical expertise–particularly in SQL, and increasingly in NoSQL. But while the role may be technical, TechTarget maintains that it may require managerial functions, including “establishing policies and procedures pertaining to the management, security, maintenance, and use of the database management system.”

Directing the Vision. With the database engines in place, the task becomes one of creating an infrastructure for taking in, moving and accessing the data. If the DBA builds the car, then the Enterprise Data Architect (EDA) builds the freeway system, laying the framework for how data will be stored, shared and accessed by different departments, systems and applications, and aligning it to business strategy. Bob Lambert describes the skills as including an understanding of the system development life cycle; software project management approaches; data modeling, database design, and SQL development. The role is strategic, requiring an understanding of both existing and emerging technologies (NoSQL databases, analytics tools and visualization tools), and how those may support the organization’s objectives. The EDA’s role requires knowledge sufficient to direct the components of enterprise architecture, but not necessarily practical skills of implementation. With that said, lists typical responsibilities as: determining database structural requirements, defining physical structure and functional capabilities, security, backup, and recovery specifications, as well as installing, maintaining and optimizing database performance.

Creating and Enforcing the Rules of Data Flow. A well-architected system requires order. A Data Governance Manager organizes and streamlines how data is collected, stored, shared/accessed, secured and put to use. But don’t think of the role as a traffic cop–the rules of the road are there to not only prevent ‘accidents’, but also to ensure efficiency and value. The governance manager’s responsibilities include enforcing compliance, setting policies and standards, managing the lifecycle of data assets, and ensuring that data is secure, organized and able to be accessed by–and only by– appropriate users. By so doing, the data governance manager improves decision-making, eliminates redundancy, reduces risk of fines/lawsuits, ensures security of proprietary and confidential information, so the organization achieves maximum value (and minimum risk). The position implies at least a functional knowledge of databases and associated technologies, and a thorough knowledge of industry regulations (FINRA, HIPAA, etc.).

Making Use of the Data

We create a system in which data is well-organized and governed so that the business can make maximum use of it by informing day-to-day processes, and deriving insight from data analysts/scientists to improve efficiency or innovation.

Understand the past to guide future decisions. A Data Analyst performs statistical analysis and problem solving, taking organizational data and using it to facilitate better decisions on items ranging from product pricing to customer churn. This requires statistical skills, and critical thinking to draw supportable conclusions. An important part of the job is to make data palpable to the C-suite, so an effective analyst is also an effective communicator. refers to data analysts as “data scientists in training” and points out that the line between the roles are often blurred.

Data scientist–Modeling the Future. Data scientists combine advanced mathematical/statistical abilities with advanced programming abilities, including a knowledge of machine learning, and the ability to code in SQL, R, Python or Scala. A key differentiator is that where the Data Analyst primarily analyzes batch/historical data to detect past trends, the Data Scientist builds programs that predict future outcomes. Furthermore, data scientists are building machine learning models that continue to learn and refine their predictive ability as more data is collected.

Of course, as data becomes increasingly the currency of business, as it is predicted to, we expect to see more roles develop, and the ones just described evolve significantly. In fact, we haven’t even discussed one of a role that is now mandated by the EU’s GDPR initiative: The Chief Data Officer, or ‘CDO’.


The Ultimate Data Set


Until recently, using entire populations as data sets was impossible—or at least impractical—given limitations on data collection processes and analytical capabilities. But that is changing.

The emerging field of computational social science takes advantage of the proliferation of data being collected to access extremely large data sets for study. The patterns and trends in individual and group behavior that emerge from these studies provide “first facts,” or universal information derived from comprehensive data rather than samples.

“Computational social science is an irritant that can create new scientific pearls of wisdom, changing how science is done,” says Brian Uzzi, a professor of management and organizations at the Kellogg School. In the past, scientists have relied primarily on lab research and observational research to establish causality and create descriptions of relationships. “People who do lab studies are preoccupied with knowing causality,” Uzzi says. “Computational work says, “I know that when you see X, you see Y, and the reason why that happens may be less important than knowing that virtually every time you see X, you also see Y.”

“Big data goes hand in hand with computational work that allows you to derive those first facts,” Uzzi says. “Instead of trying to figure out how scientists come up with great ideas by looking at 1,000 scientists, you look at 12,000,000 scientists—potentially everyone on the planet. When you find a relationship there, you know it’s universal. That universality is the new fact on which science is being built.”


Computation in the Social Sphere

Studying large data sets for first facts about human behavior has led to striking advances in recent years. Uzzi notes how one particular data set—mobile-phone data—“has taught us very distinctively about human mobility and its implications for economical and social stratification in society.” It has also shed light on how people behave during evacuations and emergency situations, including infectious-disease outbreaks. Knowing how behaviors affect the spread of diseases can help public health officials design programs to limit contagion.

The ability to track the social behavior of large groups has also shifted people’s understanding of human agency. “Until recently, we really believed that each of us made our decisions on our own,” Uzzi says. “Our friends may have influenced us here or there but not in a big way.” But troves of social-media data have shown that people are incredibly sensitive and responsive to what other people do. “That’s often the thing that drives our behavior, rather than our own individual interests or desires or preferences.”

This may change how you think about your consumer behavior, your exercise regimen, or what you Tweet about. Researchers like Uzzi are also deeply interested in how this responsiveness influences political behavior on larger issues like global climate change or investments in education systems. Think of it as a shift from envisioning yourself as a ruggedly individual, purely rational, economic person to a sociological person who encounters and engages and decides in concert with others.

One aspect of computational social science—brain science—has already discovered that those decisions are often being made before we even know it. “Brain science has taught us a lot about how the brain reacts to stimuli,” Uzzi says. With the visual part of your brain moving at roughly 8,000 times the speed of the rest of your brain, the visual cortex has already begun processing information—and leaping to certain conclusions—before the rest of your brain ever catches up. And with 40 percent of the brain’s function devoted strictly to visualization, “if you want to get in front of anything that’s going to lead to a decision, an act of persuasion, an in-depth engagement with an idea, it has got to be visual.”

“The really big things are understanding how something diffuses through a population and how opinions change,” Uzzi says. “If you put those two things together, you really have an understanding of mass behavior.”

This increased understanding of both mass and individual behavior presents huge opportunities for businesses, notably in the health sphere. “There is going to be an entirely new ecology of business that goes beyond how we think about health today,” Uzzi says. “For many people, there is no upper threshold on what they will pay for good health and beauty. With health increasingly decentralized to the individual, that’s going to spin off to companies that want to take advantage of this information to help people do things better.”

Scaling from One to Everyone

While gathering data on groups as large as the entire population is beneficial to scientists, marketers, and the like, computational social science has the scalability to allow for practical data generation on an individual level as well. This means that you can be the subject of your own data-rich computational study, without control groups or comparison testing. “You actually generate enough data on yourself, every day, that could be collected, that you can be the subject of a computational study,” Uzzi says.

Developments in the ability to collect and parse data on individuals is one area where computational social science has the potential to transform people’s lives—from providing more information about individuals’ own health to raising their awareness of unconscious biases to showing how their decision-making processes are influenced by others. “It’s going to allow people to personally use data that can help them improve their lives in a way that they never imagined before,” Uzzi says.

For example, using wearable technologies allows for sensor data collection that can include emotional activation and heart-rate monitoring in social interactions, caloric intake, biorhythms, and nervous energy. The crunching of that raw data into actionable information will happen through our machines. If you think you have a close connection to your smartphone and your tablet now, wait until you rely on it to tell you how much that last workout helped—or did not help—you shake off the tension of a long day at the office.

“Our closest partnership in the world is probably going to be our machine that helps us manage all this,” Uzzi says. This can be transformative by making us healthier.

It may make us less discriminatory, too. We all have cognitive biases that lead us to make irrational decisions. These are thought to be hard-wired things we can identify but not necessarily change on our own. Sensor data can provide a feedback loop of how we have acted in the past. This has the potential to improve future decision making. If your sensors pick up signals that show your body acting differently around certain groups, perhaps in ways that you suppress or to which you are oblivious, that may be harder to ignore.

“Our own sense of identity could be greatly shaken by this, or improved, or both.”

Source: Kellogg Insight

The Big (Unstructured) Data Problem


The face of data breaches changed last year. The one that marked that change for me was the breach that involved former Secretary of State Colin Powell’s Gmail account. Targeted to expose the Hillary Clinton campaign, Colin Powell’s emails were posted on for everyone to read. One of them had an attachment listing Salesforce’s acquisition targets and the details of its M&A strategy. Colin Powell, a member of Salesforce’s board, had access, through his personal email account, to sensitive information. When his personal email was hacked, all of that sensitive information was exposed — and blasted out in the headlines.

Corporations are trying to lock down sensitive information, most of it in structured systems and in data centers with a variety of security solutions. As it is getting harder for hackers to get to the data they want, they are finding the weakest path to that data and evolving their attack vector. Unstructured data is that new attack vector.

Most enterprises do not understand how much sensitive data they have, and when we consider how much unstructured data (emails, PDFs and other documents) a typical enterprise has under management, the red flags are clear and present. Analysts at Gartner (gated) estimate that upward of 80% of enterprise data today is unstructured. This is a big data problem, to say the least. As the level of unstructured data rises and hackers shift their focus to it, unstructured data is an issue that can no longer be placed on the enterprise IT back burner.

What Exactly Is Unstructured Data?

Unstructured data is any data that resides in emails, files, PDFs or documents. Sensitive unstructured data is usually data that was first created in a protected structured system such as SAP Financials for example and then exported into an Excel spreadsheet for easier consumption by audiences who are not SAP users.

Let me give you a very common example in any public company: Every quarter, a PR department receives the final quarterly financial numbers via email ahead of the earnings announcement in order to prepare a press release. The PR draft will be shared via email by a select group within the company before being approved and ready to be distributed out on the news wires. When pulling that financial information from the ERP system — a system that usually lives behind the corporate firewall with strong security and identity controls in place and with business owners who govern access to the systems and data within — we’ve instantly taken that formerly safe data and shared it freely by email as an Excel file.

A hacker could easily try to hack the credentials of a key employee rather than break into the network and tediously make his or her way to the ERP system. The path to getting the coveted earnings data can be easily shortened by focusing on its unstructured form shared via email or stored in files with limited security.

Right now, enterprises are woefully unprepared. Nearly 80% of enterprises have very little visibility into what’s happening across their unstructured data, let alone how to manage it. Enterprises are simply not ready to protect data in this form because they don’t understand just how much they have. Worse yet, they don’t even know what lies within those unstructured data files or who owns these files. Based on a recent survey created by my company, as many as 71% of enterprises are struggling with how to manage and protect unstructured data.

This is especially concerning when we consider the looming General Data Protection Regulation (GDPR) deadline. When that regulation takes effect in May 2018, any consumer data living in these unmanaged files that is exposed during a breach would immediately open the organization up to incredibly steep penalties. While regulations like GDPR put fear into companies, it may be a while before they start to take action. Many companies are struggling to strike the right balance between focusing on reacting to security threats versus time spent evaluating the broader picture of proactively managing risk for their company.

The Path Forward

Enterprises simply cannot afford to ignore the big unstructured data problem any longer. They need an actionable plan, one that starts with this four-step process:

•Find your unstructured data. Sensitive data is most likely spread out across both structured systems (i.e., your ERP application) and unstructured data (i.e., an Excel spreadsheet with exported data from your ERP app) that lives in a file share or the numerous cloud storage systems companies use today for easier cross-company sharing and collaboration.
•Classify and assign an owner to that data. Not all data has value, but even some stale data may still be of sensitive nature. Take the time to review all data and classify it to help you focus only on the most sensitive areas. Then assign owners to the classified unstructured data. If you do not know whom it belongs to, ask the many consumers of that data; they usually always point in the same direction — its natural owner.
•Understand who has access to your data. It’s extremely important to understand who has access to all sensitive company information, so access controls need to be placed on both structured and unstructured data.
•Put parameters around your data. Sensitive data should be accessed on a “need to know” basis, meaning only a select few in the company should have regular access to your more sensitive files, the ones that could have serious consequences if they ended up in the wrong hands.

With these steps in place, you can better avoid anyone within your company from having access to a file that they don’t need to do their job and ultimately minimize the risk of a breach. And although there are data access governance solutions that help corporations protect unstructured data, very few enterprises today have such a program in place. Ultimately, these solutions will need to find their way into enterprises as hackers once again change their attack vector to easier prey.

Source: Forbes

Using Cell Phone Data to Predict the Next Epidemic

Whom you call is linked to where you travel, which dictates how viruses spread.Champion

Can big data about whom we call be used to predict how a viral epidemic will spread?

It seems unlikely. After all, viruses do not spread over a cell network; they need us to interact with people in person.

Yet, it turns out that the patterns in whom we call can be used to predict patterns in where we travel, according to new research from Kellogg’s Dashun Wang. This in turn can shed light on how an epidemic would spread.

Both phone calls and physical travel are highly influenced by geography. The further away a shopping mall or post office is from our home, after all, the less likely we are to visit it. Similarly, our friends who live in the neighborhood are a lot likelier to hear from us frequently than our extended family in Alberta.

But Wang and colleagues were able to take this a step further. By analyzing a huge amount of data on where people travel and whom they call, they were able to determine the mathematical formula that illustrates the link between how distance impacts these two very different activities. This understanding provides a framework for using data about long-distance interactions to predict physical ones—and vice versa.

As humans, we do not like to think that someone could anticipate our actions, says Wang, an associate professor of management and organizations. But his evidence says otherwise. “It’s just fascinating to see this kind of deep mathematical relationship in human behavior,” he says.

Wang’s conclusions were based on the analysis of three massive troves of cell phone data collected for billing purposes. The data, from three nations spanning two continents, included geographic information about where cell phone users traveled, as well as information about each phone call placed or received, and how far a user was from the person on the other end of the line.

The discovery of this underlying relationship between physical and nonphysical interactions has significant practical implications. For example, the researchers were able to model the spread of a hypothetical virus, which started in a few randomly selected people and then spread to others in the vicinity, using only the data about the flow of phone calls between various parties. Those predictions were remarkably similar to ones generated by actual information about where users traveled and thus where they would be likely to spread or contract a disease.

“I think that’s a great example to illustrate the opportunities brought about by big data,” Wang says. “The paper represents a major step in our quantitative understanding of how geography governs the way in which we are connected. These insights can be particularly relevant in a business world that is becoming increasingly interconnected.”

Source: Kellogg Insight

Why Big Data Will Revolutionize B2B Marketing Strategies

B2B, or business to business marketing, involves selling of a company’s services or products to another company. Consumer marketing and B2B marketing are really not that different. Basically, B2B uses the same principles to market its product but the execution is a little different. B2B buyers make their purchases solely based on price and profit
Why Big Data Will Revolutionize B2B Marketing Strategies | Innovation Management

B2B, or business to business marketing, involves selling of a company’s services or products to another company. Consumer marketing and B2B marketing are really not that different. Basically, B2B uses the same principles to market its product but the execution is a little different. B2B buyers make their purchases solely based on price and profit potential while consumers make their purchases based on emotional triggers, status, popularity, and price. B2B is a large industry.

The fact that more than 50 percent of all economic activity in the United States is made up of purchases made by institutions, government agencies, and business gives you a perspective of the size of this industry. Technological advancements and the internet has given B2Bs new ways to make sense of their big data, learn about prospects, and improve their conversion rates. Innovations such as marketing automation platforms and marketing technology — sometimes referred to as ‘martech’ — will revolutionize the way B2B companies market their products. They will be able to deliver mass personalization and nurture leads through the buyer’s journey.

In the next few years, these firms will be spending 73% more on marketing analytics. What does this mean for B2B marketing? The effects of new technology on B2B marketing will be more pronounced in some key areas. These are:

Lead Generation

In the old days, businesses had to spend fortunes on industry reports and market research to find how and to whom to market their products. They had to build their marketing efforts based on what their existing customer base seems to like. However, growing access to technology and analytics has made revenue attribution and lead nurturing a predictable, measurable, and a more structured process. While demand generation is an abstraction or a form of art (largely depends on who you ask), lead generation is a repeatable scientific process. This means less guesswork and more revenue.

Small Businesses

Thanks to SaaS (software-as-a-service) revolution, technologies once only available to elite firms—revenue reporting, real-time web analytics, and marketing automation — are now accessible and affordable to businesses of all sizes. Instead of attempting to build economies of scale, smaller businesses are using the power of these innovations to give the bigger competition a run for their dough. With SaaS, small businesses can now narrow their approaches and zero in on key accounts.

In the context of business to business marketing, this means that instead of trying to attract unqualified, uncommitted top-tier leads, these companies will go after matched stakeholders and accounts and earn their loyalty by providing exceptional customer experiences.

Data Analytics

A few years ago, data was the most underutilized asset in the hands of a marketer. That has since changed. Marketers are quickly coming to the realization that when it comes to their trade, big data is now more valuable than ever — measuring results, targeting prospects, and improving campaigns — and are in search of more ways to exploit it. B2B marketing is laden with new tools that capitalize on data points. These firms use data scraping techniques and tools to customize their sites for their target audiences. Business can even use predictive lead scoring to gauge the performance of leads in the future. Apache Kafka provides a Distributed Streaming Platform for building a real-time data pipeline in addition to streaming mobile apps.


The integration of marketing automation and CRM has made it easier for B2Bs to track and measure marketing campaign efforts through revenue marketing. It has always been hard for firms to calculate their return on marketing investment (ROMI).

Technological advancements have some exciting parallels in the B2B industry. In order to exploit this technology and gain a competitive edge, companies have to stay up to date. The risk involved is very minimal so these firms have absolutely nothing to worry about.

Source: Innovation Management

How To Build A Big Data Engineering Team


Companies are digitizing and pushing all their operational functions and workflows into IT systems that benefit from so-called ‘big data’ analytics. Using this approach, firms can start to analyze the massive firehose stream of data now being recorded by the Internet of Things (IoT) with its sensors and lasers designed to monitor physical equipment. They can also start to ingest and crunch through the data streams being produced in every corner of the what is now a software-driven data-driven business model.

All well and good, but who is going to do all this work? It looks like your company just had to establish a data engineering department.

Drinking from the data firehose
As a technologist, writer and speaker on software engineering, Aashu Virmani also holds the role of chief marketing officer at  in-database analytics software company Fuzzy Logix (known for its DB Lytix product). Virmani claims that there’s gold in them thar data hills, if we know how to get at it. This is the point where firms start to realize that they need to invest in an ever larger army of data engineers and data scientists.

But who are these engineers and scientists? Are they engineers in the traditional sense with greasy spanners and overalls? Are they scientists in the traditional sense with bad hair and too many ballpoint pens in their jacket pockets? Not as such, obviously, because this is IT.

What the difference between a data scientist & a data engineer?
“First things first, let’s ensure we understand what the difference between a data scientist and a data engineer really is because, if we know this, then we know how best to direct them to drive value for the business. In the most simple of terms, data engineers worry about data infrastructure while data scientists are all about analysis,” explains Fuzzy Logix’s Virmani.

Boiling it down even more, one prototypes and the other deploys.

Is one role more important than the other? That’s a bit like asking whether a fork is more important than a knife. Both have their purposes and both can operate independently. But in truth, they really come into their own when used together.

What makes a good data scientist?
“They (the data scientist) may not have a ton of programming experience but their understanding of one or more analytics frameworks is essential. Put simply, they need to know which tool to use (and when) from the tool box available to them. Just as critically, they must be able to spot data quality issues because they understand how the algorithms work,” said Virmani.

He asserts that a large part of their role is hypothesis testing (confirming or denying a well-known thesis) but the data scientist that knows their stuff will impartially let the data tell its own story.

Virmani continued, “Visualizing the data is just as important as being a good statistician, so the effective data scientist will have knowledge of some visualization tools and frameworks to, again, help them tell a story with the data. Lastly, the best data scientists have a restless curiosity which compels them to try and fail in the process of knowledge discovery.”

What makes a good data engineer?
To be effective in this role, a data engineer needs to know the database technology. Cold. Teradata, IBM, Oracle, Hadoop are all ‘first base’ for the data engineer you want in your organization.

“In addition to knowing the database technology, the data engineer has an idea of the data schema and organization – how their company’s data is structured, so he or she can put together the right data sets from the right sources for the scientist to explore,” said Virmani.

The data engineer will be utterly comfortable with the ‘pre’ and ‘post’ tasks before data science will even occur.  The ‘pre’ tasks mostly deal with what we call ETL – Extract, Transform, Load.

Virmani continued, “Often it may be the case that the data science is happening not in the same platform, but an experimental copy of the database and often in a small subset of the data. It is also frequently the case that IT may own the operational database and may have strict rules on how/when the data can be accessed.  A data science team needs a ‘sandbox’ in which to play – either in the same DB environment, or in a new environment intended for data scientists. A data engineer makes that possible. Flawlessly.”

Turning to ‘post’ tasks, once the data science happens (say, a predictive model is built that determines ‘which credit card transactions are fraudulent’), the process needs to be ‘operationalized’.  This requires that the analytic model developed by the data scientists be moved from the ‘sandbox’ environment to the real production/operational database, or transaction system. The data engineer is the role that can take the output of the data scientist and help put this into production. Without this role, there will be tons of insights (some proven, some unproven) but nothing put into production to see if the model is providing the business value in real time or not.

Ok, so you now understand what ‘good’ looks like in terms of data scientists and engineers but how do you set them up for success?

How to make your big data team work
The first and most important factor here is creating the right operational structure to allow both parties to work collaboratively and to gain value from each other. Both roles function best when supported by the other so create the right internal processes to allow this to happen.

Fuzzy Logix’s Virmani heeds that we should never let this not become a tug of war between the CIO and the CDO (chief data officer) – where the CDO’s organization just wants to get on with the analysis/exploration, while the IT team wants to control access to every table/row there is (for what may be valid reasons).

“Next, invest in the right technologies to allow them to maximize their time and to focus in the right areas. For example, our approach at Fuzzy Logix is to embed analytics directly into applications and reporting tools so allowing data scientists to be freed up to work on high value problems,” he said.

Don’t nickel & dime on talent

Speaking to a number of firms in the big data space trying to establish big data teams, one final truth appears to resonate — don’t nickel and dime on talent. These roles are new (comparatively) and if you pay for cheap labor then most likely you’re not going to get data engineering or data science gold.

Fuzzy Logix offers in-database and GPU-based analytics solutions built on libraries of over 600 mathematical, statistical, simulation, data mining, time series and financial models. The firm has an (arguably) non-corporate (relatively) realistic take on real world big data operations and this conversation hopefully sheds some light on the internal mechanics of a department that a lot of firms are now working to establish.

Source: Forbes