Data science landscape will change post COVID-19. How?


Data science has been playing a vital role in our fight against the outbreak of COVID-19. Data scientists have been assisting governments in assimilating trends and extracting relevant insights from them. However, we have witnessed a few challenges in this field, and those need to be addressed. In other words, we might see a few changes in the data science landscape after the crisis is over, and we must be prepared for that.

Advancement In Streaming Analytics
Advanced visualization has helped governments and researchers closely watch day-to-day developments of COVID-19 and make decisions effectively. Not just descriptive statistics, but analyzing correlation among various factors are allowing decision-makers to compare and understand the impact of the pandemic. Multiple organizations are processing a colossal amount of data and providing visualization to demonstrate how the virus got out.

However, such visualization came out after months, and by then, it was too late for the world to take action to effectively contain the virus. Nevertheless, those visualizations were helpful in the decision-making process to further reduce the impact of COVID-19. If we would have had such information earlier, it could have assisted international institutions, such as WHO, to declare an emergency in the very early stages.

Undoubtedly, today we can get COVID-19 data daily, but the information flow has only started in the last few weeks. Besides, real-time data is still a miss due to the lack of integration on electronic health record systems. Therefore, in future, organizations and governments should come together to build an ecosystem that will help the data science community bring more value with real-time insights.

Democratization Of Medical Health Records
AI quickly took the central stage in various sectors, such as finance and media, but was slow in penetrating the healthcare sector due to concerns about misuse of patient health record data. Besides, another reason for keeping AI at bay in healthcare data is that any inaccurate prediction can lead a doctor to suggest fallacious treatment that can directly afflict the patient.

While these reasons are not unfounded, the contribution of data science in the current pandemic has demonstrated how beneficial it could be to leverage patient data as well. This may help experts and decision-makers build frameworks through various policies to allow data science to develop medicines and other healthcare offerings. AI has made a few breakthroughs with disease prediction and drug discovery, but researchers are highly critical of its use due to the shortage of diverse data.

Cloud Adoption
Cloud has been driving businesses to scale quickly while cutting operational costs. But, some companies have been critical of migrating every business operations on cloud due to privacy reasons. And most importantly, data science departments usually stick to on-premise infrastructure for their mission-critical projects. Such projects have now taken a hit due to the lockdown of cities. This has prompted companies to move all active projects to the cloud to bring flexibility in collaboratively working from remote locations.

Robust Natural Language Processing Solutions
Fake news and conspiracy theories about COVID-19 have brought a lot of confusion among people and caused hindrance for governments to ensure that people comply with their lockdown initiatives. This is not new as fake news was abounded on social media and popular chat applications.

Various measures have been taken by companies like WhatsApp to limit users’ capability of sharing on the go, and Youtube has reduced the recommendation of conspiracy theories as well. But these have not eliminated fake news from social media platforms due to the strenuous nature of natural language processing. Researchers might be able to extensively focus on building solutions to precisely identify fake news without human help.

Beyond Prediction
Data scientists are predicting the spread of COVID-19 along with information on the number of lives it is likely to affect. However, forecasting with incomplete data can further confuse people during these challenging times. “Existing datasets (on COVID-19) are incredibly biased. For instance, while calculating the mortality rate, normally developers look at the deaths per confirmed case. But, the assumption is that we have captured all of the confirmed cases, which is not valid.

We are bottlenecked by the number of tests, and only the sickest are diagnosed,” according to Neil Cheng, Senior Data Scientist at Akamai. The community will have to think beyond just prediction and restrain themselves from using any data they get without being critical of the bias in the information that is being collected.

Source: Analytics India Magazine

How to solve a business problem using data


For those of you who know me, you’ll know that I spend a great deal of time focusing on our teams mission to democratize data. For background, I’ve written and spoken in a number of venues about what a data democracy means and how to execute this vision. The core concepts are that data should be easy to access and easy to use by the typical subject matter experts. Subject matter experts are folks from a particular field (industry, geo, product, account etc), but not necessarily data experts.

My team and I tackle any and all barriers to data access and usability. We’ve had a lot of successes with this. But as those major hurdle are knocked over, I’ve started to notice another less obvious hurdle; the ability to break down a business problem into a data problem.

I have spoken to countless folks who understand the data available and know how to use our data applications, but get stuck when trying to form a solution plan. In trying to help them, I began thinking about how I typically handle these problems. This is when I had an epiphany. The foundation for everything that I need to know, I learned in elementary school!

I know this sounds somewhat silly. But, when thinking through the steps that I take to solve a business problem, I realized that I do employ a strategy. The backbone of that strategy is based on the principals of solving a word problem. Yes, that’s right. Does anyone else remember staring at those first complex word problems as a kid and not quite knowing where to start? I do! However, when my teacher provided us with strategies to break down the problem into less intimidating, actionable steps, everything became rather doable. The steps: circle the question, highlight the important information and cross out unnecessary information. Do these steps and all of a sudden the problem is simplified and much less scary. What a relief! By employing the same basic strategy, we too can feel that sense of calm when working on a business problem.

Word Problems
Before tackling the business problems, let’s review the word problem strategy. Outlined below is your checklist to breaking down a word problem.

Read the problem

Highlight the questions you need to answer

Underline essential information

Cross everything out you don’t need to know

Pick a strategy

Solve the problem

Show your work

Start by reading the problem. Move on to identifying the parts of the problem. Highlight the question to answer: “How many apples does she need?”. Underline essential information: “4 friends, 3 apples for each of them”. And then cross out everything we don’t need to know. From there it is pretty straight forward. To solve this problem, we need to use multiplication. If we have 4 friends and they each need 3 apples (4×3) then we need 12 apples total. Simple! This is the type of problem that we are now able to solve without a checklist, but when word problems were new to us, these strategies helped tremendously.

Business Problems
To solve a business problem, we want to take what we learned above and add a few additional steps.

Timeline and Scoping – We need to treat timeline and scoping information as essential information. With grade school word problems the timeline was typically immediate and the scope was to solve in full. In business, it’s much more fluid. You may have a shorter deadline which can force you to get a minimum viable product (MVP) out the door as soon as possible. Or this might be a longer term effort, which will allow you to build out a more involved solution.

Data Object Definition – Data object definition is something to be taken incredibly seriously in business problem decomposition. In word problems you are typically dealing with straight forward, whole objects; like apples in our example above. However, in business problems, objects definitions are typically more complex. They require an explicit definition that usually includes a number of objects, properties and relationship considerations. They often involve some data manipulation to construct the object in question.

Two Way Communication – In word problems, the communication is typically one way. The student reads the question being given and they communicate the answer once they have a solution. With business problems, we want to communicate early and often. The chances for misunderstanding a portion of the ask are high, so it’s important that we communicate with the stakeholders throughout the solution process to ensure a successful outcome.

We’ll now create the business problem decomposition checklist by including the extra building blocks detailed above.

* New steps are highlighted in bold

Read the problem

Highlight the questions you need to answer

Underline essential information, including timing information to help scope the problem

Box all objects and define them explicitly

Cross everything out you don’t need to know

Pick a strategy


Solve the problem

Show your work

Once we have a solution, we are going to deliver it in an appropriate manner. In this case, it is likely going to involve some sort of report. Ideally, it will also come with a presentation to get everyone on the same page about the findings.



16 Useful Advice for Aspiring Data Scientists

Why is data science sexy? It has something to do with so many new applications and entire new industries come into being from the judicious use of copious amounts of data. Examples include speech recognition, object recognition in computer vision, robots and self-driving cars, bioinformatics, neuroscience, the discovery of exoplanets and an understanding of the origins of the universe, and the assembling of inexpensive but winning baseball teams. In each of these instances, the data scientist is central to the whole enterprise. He/she must combine knowledge of the application area with statistical expertise and implement it all using the latest in computer science ideas.

“What advice would you give to someone starting out in data science?”

1 — Chris Wiggins, Chief Data Scientist at The New York Times and Associate Professor of Applied Mathematics at Columbia
“Creativity and caring. You have to really like something to be willing to think about it hard for a long time. Also, some level of skepticism. So that’s one thing I like about PhD students — five years is enough time for you to have a discovery, and then for you to realize all of the things that you did wrong along the way. It’s great for you intellectually to go back and forth from thinking “cold fusion” to realizing, “Oh, I actually screwed this up entirely,” and thus making a series of mistakes and fixing them. I do think that the process of going through a PhD is useful for giving you that skepticism about what looks like a sure thing, particularly in research. I think that’s useful because, otherwise, you could easily too quickly go down a wrong path — just because your first encounter with the path looked so promising.
And although it’s a boring answer, the truth is you need to actually have technical depth. Data science is not yet a field, so there are no credentials in it yet. It’s very easy to get a Wikipedia-level understanding of, say, machine learning. For actually doing it, though, you really need to know what the right tool is for the right job, and you need to have a good understanding of all the limitations of each tool. There’s no shortcut for that sort of experience. You have to make many mistakes. You have to find yourself shoehorning a classification problem into a clustering problem, or a clustering problem into a hypothesis testing problem.
Once you find yourself trying something out, confident that it’s the right thing, then finally realizing you were totally dead wrong, and experiencing that many times over — that’s really a level of experience that unfortunately there’s not a shortcut for. You just have to do it and keep making mistakes at it, which is another thing I like about people who have been working in the field for several years. It takes a long time to become an expert in something. It takes years of mistakes. This has been true for centuries. There’s a quote from the famous physicist Niels Bohr, who posits that the way you become an expert in a field is to make every mistake possible in that field.”

2 — Caitlin Smallwood, Vice President of Science and Algorithms at Netflix
“I would say to always bite the bullet with regard to understanding the basics of the data first before you do anything else, even though it’s not sexy and not as fun. In other words, put effort into understanding how the data is captured, understand exactly how each data field is defined, and understand when data is missing. If the data is missing, does that mean something in and of itself? Is it missing only in certain situations? These little, teeny nuanced data gotchas will really get you. They really will.
You can use the most sophisticated algorithm under the sun, but it’s the same old junk-in–junk-out thing. You cannot turn a blind eye to the raw data, no matter how excited you are to get to the fun part of the modeling. Dot your i’s, cross your t’s, and check everything you can about the underlying data before you go down the path of developing a model.
Another thing I’ve learned over time is that a mix of algorithms is almost always better than one single algorithm in the context of a system, because different techniques exploit different aspects of the patterns in the data, especially in complex large data sets. So while you can take one particular algorithm and iterate and iterate to make it better, I have almost always seen that a combination of algorithms tends to do better than just one algorithm.”

3 — Yann LeCun, Director of AI Research at Facebook and Professor of Data Science/Computer Science/Neuroscience at NYU
“I always give the same advice, as I get asked this question often. My take on it is that if you’re an undergrad, study a specialty where you can take as many math and physics courses as you can. And it has to be the right courses, unfortunately. What I’m going to say is going to sound paradoxical, but majors in engineering or physics are probably more appropriate than say math, computer science, or economics. Of course, you need to learn to program, so you need to take a large number of classes in computer science to learn the mechanics of how to program. Then, later, do a graduate program in data science. Take undergrad machine learning, AI, or computer vision courses, because you need to get exposed to those techniques. Then, after that, take all the math and physics courses you can take. Especially the continuous applied mathematics courses like optimization, because they prepare you for what’s really challenging.
It depends where you want to go because there are a lot of different jobs in the context of data science or AI. People should really think about what they want to do and then study those subjects. Right now the hot topic is deep learning, and what that means is learning and understanding classic work on neural nets, learning about optimization, learning about linear algebra, and similar topics. This helps you learn the underlying mathematical techniques and general concepts we confront every day.”

4 — Erin Shellman, Data Science Manager at Zymergen, Ex-Data Scientist at Nordstrom Data Lab and AWS S3
“For the person still deciding what to study I would say STEM fields are no-brainers, and in particular the ‘TEM ones. Studying a STEM subject will give you tools to test and understand the world. That’s how I see math, statistics, and machine learning. I’m not super interested in math per se, I’m interested in using math to describe things. These are tool sets after all, so even if you’re not stoked on math or statistics, it’s still super worth it to invest in them and think about how to apply it in the things you’re really passionate about.
For the person who’s trying to transition like I did, I would say, for one, it’s hard. Be aware that it’s difficult to change industries and you are going to have to work hard at it. That’s not unique to data science — that’s life. Not having any connections in the field is tough but you can work on it through meet-ups and coffee dates with generous people. My number-one rule in life is “follow up.” If you talk to somebody who has something you want, follow up.
Postings for data scientists can be pretty intimidating because most of them read like a data science glossary. The truth is that the technology changes so quickly that no one possesses experience of everything liable to be written on a posting. When you look at that, it can be overwhelming, and you might feel like, “This isn’t for me. I don’t have any of these skills and I have nothing to contribute.” I would encourage against that mindset as long as you’re okay with change and learning new things all the time.
Ultimately, what companies want is a person who can rigorously define problems and design paths to a solution. They also want people who are good at learning. I think those are the core skills.”

5 — Daniel Tunkelang, Chief Search Evangelist at Twiggle, Ex-Head of Search Quality at LinkedIn
“To someone coming from math or the physical sciences, I’d suggest investing in learning software skills — especially Hadoop and R, which are the most widely used tools. Someone coming from software engineering should take a class in machine learning and work on a project with real data, lots of which is available for free. As many people have said, the best way to become a data scientist is to do data science. The data is out there and the science isn’t that hard to learn, especially for someone trained in math, science, or engineering.
Read “The Unreasonable Effectiveness of Data” — a classic essay by Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira. The essay is usually summarized as “more data beats better algorithms.” It is worth reading the whole essay, as it gives a survey of recent successes in using web-scale data to improve speech recognition and machine translation. Then for good measure, listen to what Monica Rogati has to say about how better data beats more data. Understand and internalize these two insights, and you’re well on your way to becoming a data scientist.”

6 — John Foreman, Vice President of Product Management and Ex-Chief Data Scientist at MailChimp
“I find it tough to find and hire the right people. It’s actually a really hard thing to do, because when we think about the university system as it is, whether undergrad or grad school, you focus in on only one thing. You specialize. But data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.
This is what leads to the big joke of how a data scientist is someone who knows more stats than a computer programmer and can program better than a statistician. What is this joke saying? It’s saying that a data scientist is someone who knows a little bit about two things. But I’d say they know about more than just two things. They also have to know to communicate. They also need to know more than just basic statistics; they’ve got to know probability, combinatorics, calculus, etc. Some visualization chops wouldn’t hurt. They also need to know how to push around data, use databases, and maybe even a little OR. There are a lot of things they need to know. And so it becomes really hard to find these people because they have to have touched a lot of disciplines and they have to be able to speak about their experience intelligently. It’s a tall order for any applicant.
It takes a long time to hire somebody, which is why I think people keep talking about how there is not enough talent out there for data science right now. I think that’s true to a degree. I think that some of the degree programs that are starting up are going to help. But even still, coming out of those degree programs, for MailChimp we would look at how you articulate and communicate to us how you’ve used the data science chops across many disciplines that this particular program taught you. That’s something that’s going to weed out so many people. I wish more programs would focus on the communication and collaboration aspect of being a data scientist in the workplace.”

7 — Roger Ehrenberg, Managing Partner of IA Ventures
I think the areas where the biggest opportunities are also have the most challenges. Healthcare data obviously has some of the biggest issues with PII and privacy concerns. Added to that, you’ve also got sclerotic bureaucracies, fossilized infrastructures, and data silos that make it very hard to solve hard problems requiring integration across multiple data sets. It will happen, and I think a lot the technologies we’ve talked about here are directly relevant to making health care better, more affordable, and more distributed. I see this representing a generational opportunity.
Another huge area in its early days is risk management — whether it’s in finance, trading, or insurance. It’s a really hard problem when you’re talking about incorporating new data sets into risk assessment — especially when applying these technologies to an industry like insurance, which, like health care, has lots of privacy issues and data trapped within large bureaucracies. At the same time, these old fossilized companies are just now starting to open up and figure out how to best interact with the startup community in order to leverage new technologies. This is another area that I find incredibly exciting.
The third area I’m passionate about is reshaping manufacturing and making it more efficient. There has been a trend towards manufacturing moving back onshore. A stronger manufacturing sector could be a bridge to recreating a vibrant middle class in the US. I think technology can help hasten this beneficial trend.

8 — Claudia Perlich, Chief Scientist at Dstillery
“I think, ultimately, learning how to do data science is like learning to ski. You have to do it. You can only listen to so many videos and watch it happen. At the end of the day, you have to get on your damn skis and go down that hill. You will crash a few times on the way and that is fine. That is the learning experience you need. I actually much prefer to ask interviewees about things that did not go well rather than what did work, because that tells me what they learned in the process.
Whenever people come to me and ask, “What should I do?” I say, “Yeah, sure, take online courses on machine learning techniques. There is no doubt that this is useful. You clearly have to be able to program, at least somewhat. You do not have to be a Java programmer, but you must get something done somehow. I do not care how.”
Ultimately, whether it is volunteering at DataKind to spend your time at NGOs to help them, or going to the Kaggle website and participating in some of their data mining competitions — just get your hands and feet wet. Especially on Kaggle, read the discussion forums of what other people tell you about the problem, because that is where you learn what people do, what worked for them, and what did not work for them. So anything that gets you actually involved in doing something with data, even if you are not paid being for it, is a great thing.
Remember, you have to ski down that hill. There is no way around it. You cannot learn any other way. So volunteer your time, get your hands dirty in any which way you can think, and if you have a chance to do internships — perfect. Otherwise, there are many opportunities where you can just get started. So just do it.”

9 — Jonathan Lenaghan, Chief Scientist and Senior Vice President of Product Development at PlaceIQ
First and foremost, it is very important to be self-critical: always question your assumptions and be paranoid about your outputs. That is the easy part. In terms of skills that people should have if they really want to succeed in the data science field, it is essential to have good software engineering skills. So even though we may hire people who come in with very little programming experience, we work very hard to instill in them very quickly the importance of engineering, engineering practices, and a lot of good agile programming practices. This is helpful to them and us, as these can all be applied almost one-to-one to data science right now.
If you look at dev ops right now, they have things such as continuous integration, continuous build, automated testing, and test harnesses — all of which map very well from the dev ops world to the data ops (a phrase I stole from Red Monk) world very easily. I think this is a very powerful notion. It is important to have testing frameworks for all of your data, so that if you make a code change, you can go back and test all of your data. Having an engineering mindset is essential to moving with high velocity in the data science world. Reading Code Complete and The Pragmatic Programmer is going to get you much further than reading machine learning books — although you do, of course, have to read the machine learning books, too.”

10 — Anna Smith, Senior Data Engineer at Spotify, Ex-Analytics Engineer at Rent the Runway
“If someone is just starting out in data science, the most important thing to understand is that it’s okay to ask people questions. I also think humility is very important. You’ve got to make sure that you’re not tied up in what you’re doing. You can always make changes and start over. Being able to scrap code, I think, is really hard when you’re starting out, but the most important thing is to just do something.
Even if you don’t have a job in data science, you can still explore data sets in your downtime and can come up with questions to ask the data. In my personal time, I’ve played around with Reddit data. I asked myself, “What can I explore about Reddit with the tools that I have or don’t have?” This is great because once you’ve started, you can see how other people have approached the same problem. Just use your gut and start reading other people’s articles and be like, “I can use this technique in my approach.” Start out very slowly and move slowly. I tried reading a lot when I started, but I think that’s not as helpful until you’ve actually played around with code and with data to understand how it actually works, how it moves. When people present it in books, it’s all nice and pretty. In real life, it’s really not.
I think trying a lot of different things is also very important. I don’t think I’d ever thought that I would be here. I also have no idea where I’ll be in five years. But maybe that’s how I learn, by doing a bit of everything across many different disciplines to try to understand what fits me best.”

11 — Andre Karpistsenko, Data Science Lead at Taxify, Co-Founder and Research Lead at PlanetOS
“Though somewhat generic advice, I believe you should trust yourself and follow your passion. I think it’s easy to get distracted by the news in the media and the expectations presented by the media and choose a direction that you didn’t want to go. So when it comes to data science, you should look at it as a starting point for your career. Having this background will be beneficial in anything you do. Having an ability to create software and the ability to work with statistics will enable you to make smarter decisions in any field you choose. For example, we can read about how an athlete’s performance is improved through data, like someone becoming the gold medalist in the long jump because they optimized and practiced the angle at which they should jump. This is all led by a data-driven approach to sports.
If I were to go into more specific technical advice, then it depends on the ambitions of the person who is receiving the advice. If the person wants to create new methods and tools, then that advice would be very different. You need to persist and keep going in your direction, and you will succeed. But if your intent is to be diverse and flexible in many situations, then you want to have a big toolbox of different methods.
I think the best advice given to me was given by a Stanford professor whose course I attended a while ago. He recommended having a T-shaped profile of competence but with a small second competence next to the core competence, so that you have an alternative route in life if you need it or want it. In addition to the vertical stem of single-field expertise, he recommended that you have the horizontal bar of backgrounds broad enough so that you can work with many different people in many different situations. So the while you are in a university, building a T shape with another small competence in it is probably the best thing to do.
Maybe the most important thing is to surround yourself with people greater than you are and to learn from them. That’s the best advice. If you’re in a university, that’s the best environment to see how diverse the capabilities of people are. If you manage to work with the best people, then you will succeed at anything.”

12 — Amy Heineike, Vice President of Technology at PrimerAI, Ex-Director of Mathematics at Quid
“I think perhaps they would need to start by looking at themselves and figuring out what it is they really care about. What is it they want to do? Right now, data science is a bit of a hot topic, and so I think there are a lot of people who think that if they can have the “data science” label, then magic, happiness, and money will come to them. So I really suggest figuring out what bits of data science you actually care about. That is the first question you should ask yourself. And then you want to figure out how to get good at that. You also want to start thinking about what kinds of jobs are out there that really play to what you are interested in.
One strategy is to go really deep into one part of what you need to know. We have people on our team who have done PhDs in natural language processing or who got PhDs in physics, where they’ve used a lot of different analytical methods. So you can go really deep into an area and then find people for whom that kind of problem is important or similar problems that you can use the same kind of thinking to solve. So that’s one approach.
Another approach is to just try stuff out. There are a lot of data sets out there. If you’re in one job and you’re trying to change jobs, try to think whether there’s data you could use in your current role that you could go and get and crunch in interesting ways. Find an excuse to get to try something out and see if that’s really what you want to do. Or just from home there’s open data you can pull. Just poke around and see what you can find and then start playing with that. I think that’s a great way to start. There are a lot of different roles that are going under the name “data science” right now, and there are also a lot of roles that are probably what you would think of data science but don’t have a label yet because people aren’t necessarily using it. Think about what it is that you really want.”

13 — Victor Hu, Head of Data Science at QBE Insurance, Ex-Chief Data Scientist at Next Big Sound
“First is that you definitely have to tell a story. At the end of the day, what you are doing is really digging into the fundamentals of how a system or an organization or an industry works. But for it be useful and understandable to people, you have to tell a story.
Being able to write about what you do and being able to speak about your work is very critical. Also worth understanding is that you should maybe worry less about what algorithm you are using. More data or better data beats a better algorithm, so if you can set up a way for you to analyze and get a lot of good, clean, useful data — great!”

14 — Kira Radinsky, Chief Scientist and Director of Data Science at eBay, Ex-CTO and Co-Founder of SalesPredict
“Find a problem you’re excited about. For me, every time I started something new, it’s really boring to just study without a having a problem I’m trying to solve. Start reading material and as soon as you can, start working with it and your problem. You’ll start to see problems as you go. This will lead you to other learning resources, whether they are books, papers, or people. So spend time with the problem and people, and you’ll be fine.
Understand the basics really deeply. Understand some basic data structures and computer science. Understand the basis of the tools you use and understand the math behind them, not just how to use them. Understand the inputs and the outputs and what is actually going on inside, because otherwise you won’t know when to apply it. Also, it depends on the problem you’re tackling. There are many different tools for so many different problems. You’ve got to know what each tool can do and you’ve got to know the problem that you’re doing really well to know which tools and techniques to apply.”

15 — Eric Jonas, Postdoc at UC Berkeley EECS, Ex-Chief Predictive Scientist at Salesforce
“They should understand probability theory forwards and backwards. I’m at the point now where everything else I learn, I then map back into probability theory. It’s great because it provides this amazing, deep, rich basis set along which I can project everything else out there. There’s a book by E. T. Jaynes called Probability Theory: The Logic of Science, and it’s our bible. We really buy it in some sense. The reason I like the probabilistic generative approach is you have these two orthogonal axes — the modeling axis and the inference axis. Which basically translates into how do I express my problem and how do I compute the probability of my hypothesis given the data? The nice thing I like from this Bayesian perspective is that you can engineer along each of these axes independently. Of course, they’re not perfectly independent, but they can be close enough to independent that you can treat them that way.
When I look at things like deep learning or any kind of LASSO-based linear regression systems, which is so much of what counts as machine learning these days, they’re engineering along either one axis or the other. They’ve kind of collapsed that down. Using these LASSO-based techniques as an engineer, it becomes very hard for me to think about: “If I change this parameter slightly, what does that really mean?” Linear regression as a model has a very clear linear additive Gaussian model baked into it. Well, what if I want things to look different? Suddenly all of these regularized least squares things fall apart. The inference technology just doesn’t even accept that as a thing you’d want to do.”

16 — Jake Porwar, Founder and Executive Director of DataKind
“I think a strong statistical background is a prerequisite, because you need to know what you’re doing, and understand the guts of the model you build. Additionally, my statistics program also taught a lot about ethics, which is something that we think a lot about at DataKind. You always want to think about how your work is going to be applied. You can give anybody an algorithm. You can give someone a model for using stop-and-frisk data, where the police are going to make arrests, but why and to what end? It’s really like building any new technology. You’ve got to think about the risks as well as the benefits and really weigh that because you are responsible for what you create.
No matter where you come from, as long as you understand the tools that you’re using to draw conclusions, that is the best thing you can do. We are all scientists now, and I’m not just talking about designing products. We are all drawing conclusions about the world we live in. That’s what statistics is — collecting data to prove a hypothesis or to create a model of the way the world works. If you just trust the results of that model blindly, that’s dangerous because that’s your interpretation of the world, and as flawed as it is, your understanding is how flawed the result is going to be.
In short, learn statistics and be thoughtful.”
Data is being generated exponentially and those who can understand that data and extract value from it are needed now more than ever. The hard-earned lessons and joy about data and models from these thoughtful practitioners would be tremendously useful if you aspire to join the next generation of data scientists.

Source: Medium

Top 10 roles in AI and data science

When you think of the perfect data science team, are you imagining 10 copies of the same professor of computer science and statistics, hands delicately stained with whiteboard marker? I hope not!

analytics anywhereGoogle’s Geoff Hinton is a hero of mine and an amazing researcher in deep learning, but I hope you’re not planning to staff your applied data science team with 10 of him and no one else!

Applied data science is a team sport that’s highly interdisciplinary. Diversity of perspective matters! In fact, perspective and attitude matter at least as much as education and experience.

If you’re keen to make your data useful with a decision intelligence engineering approach, here’s my take on the order in which to grow your team.

#0 Data Engineer

We start counting at zero, of course, since you need to have the ability to get data before it makes sense to talk about data analysis. If you’re dealing with small datasets, data engineering is essentially entering some numbers into a spreadsheet. When you operate at a more impressive scale, data engineering becomes a sophisticated discipline in its own right. Someone on your team will need to take responsibility for dealing with the tricky engineering aspects of delivering data that the rest of your staff can work with.

#1 Decision-Maker

Before hiring that PhD-trained data scientist, make sure you have a decision-maker who understands the art and science of data-driven decision-making.

Decision-making skills have to be in place before a team can get value out of data.

This individual is responsible for identifying decisions worth making with data, framing them (everything from designing metrics to calling the shots on statistical assumptions), and determining the required level of analytical rigor based on potential impact on the business. Look for a deep thinker who doesn’t keep saying, “Oh, whoops, that didn’t even occur to me as I was thinking through this decision.” They’ve already thought of it. And that. And that too.

#2 Analyst

Then the next hire is… everyone already working with you. Everyone is qualified to look at data and get inspired, the only thing that might be missing is a bit of familiarity with software that’s well-suited for the job. If you’ve ever looked at a digital photograph, you’ve done data visualization and analytics.

Learning to use tools like R and Python is just an upgrade over MS Paint for data visualization; they’re simply more versatile tools for looking at a wider variety of datasets than just red-green-blue pixel matrices.

If you’ve ever looked at a digital photograph, you’ve done data visualization and analytics. It’s the same thing.

And hey, if all you have the stomach for is looking at the first five rows of data in a spreadsheet, well, that’s still better than nothing. If the entire workforce is empowered to do that, you’ll have a much better finger on the pulse of your business than if no one is looking at any data at all.

The important thing to remember is that you shouldn’t come to conclusions beyond your data. That takes specialist training. Just as with the photo above, here’s all you can say about it: “This is what is in my dataset.” Please don’t use it conclude that the Loch Ness Monster is real.

#3 Expert Analyst

Enter the lightning-fast version! This person can look at more data faster. The game here is speed, exploration, discovery… fun! This is not the role concerned with rigor and careful conclusions. Instead, this is the person who helps your team get eyes on as much of your data as possible so that your decision-maker can get a sense of what’s worth pursuing with more care.

The job here is speed, encountering potential insights as quickly as possible.

This may be counterintuitive, but don’t staff this role with your most reliable engineers who write gorgeous, robust code. The job here is speed, encountering potential insights as quickly as possible, and unfortunately those who obsess over code quality may find it too difficult to zoom through the data fast enough to be useful in this role.

Those who obsess over code quality may find it difficult to be useful in this role.

I’ve seen analysts on engineering-oriented teams bullied because their peers don’t realize what “great code” means for descriptive analytics. Great is “fast and humble” here. If fast-but-sloppy coders don’t get much love, they’ll leave your company and you’ll wonder why you don’t have a finger on the pulse of your business.

#4 Statistician

Now that we’ve got all these folks cheerfully exploring data, we’d better have someone around to put a damper on the feeding frenzy. It’s safe to look at that “photo” of Nessie as long as you have the discipline to keep yourself from learning more than what’s actually there… but do you? While people are pretty good at thinking reasonably about photos, other data types seem to send common sense out the window. It might be a good idea to have someone around who can prevent the team from making unwarranted conclusions.

Inspiration is cheap, but rigor is expensive.

Lifehack: don’t make conclusions and you won’t need to worry. I’m only half-joking. Inspiration is cheap, but rigor is expensive. Pay up or content yourself with mere inspiration.

Statisticians help decision-makers come to conclusions safely beyond the data.

For example, if your machine learning system worked in one dataset, all you can safely conclude is that it worked in that dataset. Will it work when it’s running in production? Should you launch it? You need some extra skills to deal with those questions. Statistical skills.

If we’re want to make serious decisions where we don’t have perfect facts, let’s slow down and take a careful approach. Statisticians help decision-makers come to conclusions safely beyond the data analyzed.

#5 Applied Machine Learning Engineer

An applied AI / machine learning engineer’s best attribute is not an understanding of how algorithms work. Their job is to use them, not build them. (That’s what researchers do.) Expertise at wrangling code that gets existing algorithms to accept and churn through your datasets is what you’re looking for.

Besides quick coding fingers, look for a personality that can cope with failure. You almost never know what you’re doing, even if you think you do. You run the data through a bunch of algorithms as quickly as possible and see if it seems to be working… with the reasonable expectation that you’ll fail a lot before you succeed. A huge part of the job is dabbling blindly, and it takes a certain kind of personality to enjoy that.

Perfectionists tend to struggle as ML engineers.

Because your business problem’s not in a textbook, you can’t know in advance what will work, so you can’t expect to get a perfect result on the first go. That’s okay, just try lots of approaches as quickly as possible and iterate towards a solution.

Speaking of “running the data through algorithms”… what data? The inputs your analysts identified as potentially interesting, of course. That’s why analysts make sense as an earlier hire.

Although there’s a lot of tinkering, it’s important for the machine learning engineer to have a deep respect for the part of the process where rigor is vital: assessment. Does the solution actually work on new data? Luckily, you made a wise choice with your previous hire, so all you have to do is pass the baton to the statistician.

The strongest applied ML engineers have a very good sense of how long it takes to apply various approaches.

When a potential ML hire can rank options by the time it takes to try them on various kinds of datasets, be impressed.

#6 Data Scientist

The way I use the word, a data scientist is someone who is a full expert in all of the three preceding roles. Not everyone uses my definition: you’ll see job applications out there with people calling themselves “data scientist” when they have only really mastered one of the three, so it’s worth checking.

Data scientist are full experts in all of the three previous roles.

This role is in position #6 because hiring the true three-in-one is an expensive option. If you can hire one within budget, it’s a great idea, but if you’re on a tight budget, consider upskilling and growing your existing single-role specialists.

#7 Analytics Manager / Data Science Leader

The analytics manager is the goose that lays the golden egg: they’re a hybrid between the data scientist and the decision-maker. Their presence on the team acts as a force-multiplier, ensuring that your data science team isn’t off in the weeds instead of adding value to your business.

The decision-maker + data scientist hybrid is a force-multiplier. Unfortunately, they’re rare and hard to hire.

This person is kept awake at night by questions like, “How do we design the right questions? How do we make decisions? How do we best allocate our experts? What’s worth doing? Will the skills and data match the requirements? How do we ensure good input data?”

If you’re lucky enough to hire one of these, hold on to them and never let them go. Learn more about this role here.

#8 Qualitative Expert / Social Scientist

Sometimes your decision-maker is a brilliant leader, manager, motivator, influencer, or navigator of organizational politics… but unskilled in the art and science of decision-making. Decision-making is so much more than a talent. If your decision-maker hasn’t honed their craft, they might do more damage than good.

Instead of firing an unskilled decision-maker, you can augment them with a qualitative expert.

Don’t fire an unskilled decision-maker, augment them. You can hire them an upgrade in the form of a helper. The qualitative expert is here to supplement their skills.

This person typically has a social science and data background — behavioral economists, neuroeconomists, and JDM psychologists receive the most specialized training, but self-taught folk can also be good at it. The job is to help the decision maker clarify ideas, examine all the angles, and turn ambiguous intuitions into well-thought-through instructions in language that makes it easy for the rest of the team to execute on.

We don’t realize how valuable social scientists are. They’re usually better equipped than data scientists to translate the intuitions and intentions of a decision-maker into concrete metrics.

The qualitative expert doesn’t call any of the shots. Instead, they ensure that the decision-maker has fully grasped the shots available for calling. They’re also a trusted advisor, a brainstorming companion, and a sounding board for a decision-maker. Having them on board is a great way to ensure that the project starts out in the right direction.

#9 Researcher

Many hiring managers think their first team member needs to be the ex-professor, but actually you don’t need those PhD folk unless you already know that the industry is not going to supply the algorithms that you need. Most teams won’t know that in advance, so it makes more sense to do things in the right order: before building yourself that space pen, first check whether a pencil will get the job done. Get started first and if you find that the available off-the-shelf solutions aren’t giving you much love, then you should consider hiring researchers.

If a researcher is your first hire, you probably won’t have the right environment to make good use of them.

Don’t bring them in right off the bat. It’s better to wait until your team is developed enough to have figured out that what they need a researcher for. Wait till you’ve exhausted all the available tools before hiring someone to build you expensive new ones.

#10+ Additional personnel

Besides the roles we looked at, here are some of my favorite people to welcome to a decision intelligence project:

  • Domain expert
  • Ethicist
  • Software engineer
  • Reliability engineer
  • UX designer
  • Interactive visualizer / graphic designer
  • Data collection specialist
  • Data product manager
  • Project / program manager

Many projects can’t do without them — the only reason they aren’t listed in my top 10 is that decision intelligence is not their primary business. Instead, they are geniuses at their own discipline and have learned enough about data and decision-making to be remarkably useful to your project. Think of them as having their own major or specialization, but enough love for decision intelligence that they chose to minor in it.

Huge team or small team?

After reading all that, you might feel overwhelmed. So many roles! Take a deep breath. Depending on your needs, you may get enough value from the first few roles.

Revisiting my analogy of applied machine learning as innovating in the kitchen, if you personally want to open an industrial-scale pizzeria that makes innovative pizzas, you need the big team or you need to partner with providers/consultants. If you want to make a unique pizza or two this weekend — caramelized anchovy surprise, anyone? — then you still need to think about all the components we mentioned. You’re going to decide what to make (role 1), which ingredients to use (roles 2 and 3), where to get ingredients (role 0), how to customize the recipe (role 5), and how to give it a taste test (role 4) before serving someone you want to impress, but for the casual version with less at stake, you can do it all on your own. And if your goal is just to make standard traditional pizza, you don’t even need all that: get hold of someone else’s tried and tested recipe (no need to reinvent your own) along with ingredients and start cooking!


The Data Science Process

The Data Science Process is a framework for approaching data science tasks, and is crafted by Joe Blitzstein and Hanspeter Pfister of Harvard’s CS 109. The goal of CS 109, as per Blitzstein himself, is to introduce students to the overall process of data science investigation, a goal which should provide some insight into the framework itself.

analytics anywhere

The following is a sample application of Blitzstein & Pfister’s framework, regarding skills and tools at each stage, as given by Ryan Fox Squire in his answer:

Stage 1: Ask A Question
Skills: science, domain expertise, curiosity
Tools: your brain, talking to experts, experience

Stage 2: Get the Data
Skills: web scraping, data cleaning, querying databases, CS stuff
Tools: python, pandas

Stage 3: Explore the Data
Skills: Get to know data, develop hypotheses, patterns? anomalies?
Tools: matplotlib, numpy, scipy, pandas, mrjob

Stage 4: Model the Data
Skills: regression, machine learning, validation, big data
Tools: scikits learn, pandas, mrjob, mapreduce

Stage 5: Communicate the Data
Skills: presentation, speaking, visuals, writing
Tools: matplotlib, adobe illustrator, powerpoint/keynote

Squire then (rightfully) concludes that the data science work flow is a non-linear, iterative process, and that there are many skills and tools required to cover the full data science process. Squire also professes that he is fond of the Data Science Process as it stresses both the importance of asking questions to guide your workflow, and the importance of iterating on your questions and research, as one gains familiarity with one’s data.

The Data Science Framework is an innovative framework for approaching data science problems. Isn’t it?


Top 7 Data Science Use Cases in Finance


In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.


In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.

Among the most important applications of data science and artificial intelligence (AI) in risk management is identifying the creditworthiness of potential customers. To establish the appropriate credit amount for a particular customer, companies use machine learning algorithms that can analyze past spending behavior and patterns. This approach is also useful while working with new customers or the ones with a brief credit history.

Although digitalization and automatization of risk management processes in finance are in the early stages, the potential is extremely huge. Financial institutions still need to prepare for this change by automating core financial processes, improving analytical skills of the finance team, and making strategic technology investments. But as soon as the company starts to move in this direction, the profit will not make itself wait.

Managing customer data

For financial firms, data is the most important resource. Therefore, efficient data management is a key to business success. Today, there is a massive volume of financial data diversity in structure and volume: from social media activity and mobile interactions to market data and transaction details. Financial specialists often have to work with semi-structured or unstructured data and there is a big challenge to process it manually.

However, it’s obvious for most companies that integrating machine learning techniques to managing process is simply a necessity to extract real intelligence from data. AI tools, in particular, natural language processing, data mining, and text analytics, help to transform data into information contributing in smarter data governance and better business solutions, and as a result – increased profitability. For instance, machine learning algorithms can analyze the influence of some specific financial trends and market developments by learning from customers financial historical data. Finally, these techniques can be used to generate automated reports.

Predictive analytics

Analytics is now at the core of financial services. Special attention deserves predictive analytics that reveals patterns in the data that foresee the future event that can be acted upon now. Through understanding social media, news trends, and other data sources these sophisticated analytics conquered the main applications such as predicting prices and customers lifetime value, future life events, anticipated churn, and the stock market moves. Most importantly such techniques can help answer the complicated question – how best to intervene.

Real-time analytics

Real-time analytics fundamentally transform financial processes by analyzing large amounts of data from different sources and quickly identifying any changes and finding the best reaction to them. There are 3 main directions for real-time analytics application in finance:

Fraud detection

It’s an obligation for financial firms to guarantee the highest level of security to its users. The main challenge for companies is to find a good fraud detecting system with criminals always hacking new ways and setting up new traps. Only qualified data scientists can create perfect algorithms for detection and prevention of any anomalies in user behavior or ongoing working processes in this diversity of frauds. For instance, alerts for unusual financial purchases for a particular user, or large cash withdrawals will lead to blocking those actions, until the customer confirms them. In the stock market, machine learning tools can identify patterns in trading data that might indicate manipulations and alert staff to investigate. However, the greatest thing of such algorithms is the ability of self-teaching, becoming more and more effective and intelligent over time.

Consumer analytics

Real-time analytics also help with better understanding of customers and effective personalization. Sophisticated machine learning algorithms and customer sentiment analysis techniques can generate insights from clients behavior, social media interaction, their feedbacks and opinions and improve personalization and enhance the profit. Since the amount of data is enormously huge, only experienced data scientists can make precise breakdown.

Algorithmic trading

This area probably has the biggest impact from real-time analytics since every second is at stake here. Based on the most recent information from analyzing both traditional and non-traditional data, financial institutions can make real-time beneficial decisions. And because this data is often only valuable for a short time, being competitive in this sector means having the fastest methods of analyzing it.

Another prospective opens when combining real-time and predictive analytics in this area. It used to be a popular practice for financial companies have to hire mathematicians who can develop statistical models and use historical data to create trading algorithms that forecast market opportunities. However, today artificial intelligence offers techniques to make this process faster and what is especially important – constantly improving.

Therefore, data science and AI made a revolution in the trading sector, starting up the algorithmic trading strategies. Most world exchanges use computers that make decisions based on algorithms and correct strategies taking into account new data. Artificial intelligence infinitely processes tons of information, including tweets, financial indicators, data from news and books, and even TV programs. Consequently, it understands today’s worldwide trends and continuously enhances the predictions about financial markets.

All in all, real-time and predictive analytics significantly change the situation in different financial areas. With technologies such as Hadoop, NoSQL and Storm, traditional and non-traditional datasets, and the most precise algorithms, data engineers are changing the way finance used to work.

Deep personalization and customization

Firms realize that one of the key steps to being competitive in today’s market is to raise engagement through high-quality, personalized relationships with their customers. The idea is to analyze digital client experience and modify it taking into account client’s interests and preferences. AI is making significant improvements in understanding human language and emotion, which brings customer personalization to a whole new level. Data engineers can also build models that study the consumers’ behavior and discover situations where customers needed financial advice. The combination of predictive analytic tools and advanced digital delivery options can help with this complicated task, guiding the customer to the best financial solution at the most opportune time and suggesting personalize offerings based on spending habits, social-demographic trends, location, and other preferences.


For financial institutions, the usage of data science techniques provides a huge opportunity to stand out from the competition and reinvent their businesses. There are vast amounts of continuously changing financial data which creates a necessity for engaging machine learning and AI tools into different aspects of the business.

We focused on the top 7 data science use cases in the finance sector in our opinion, but there are many others that also deserve to be mentioned. If you have any further ideas, please share your vision in the comment section.

How to Become a Data Scientist


All such roads lead to the same destination: a job assembling, analyzing and interpreting large data sets to look for information of interest or value.

Data science encompasses “Big Data,” data analytics, business intelligence and more. Data science is becoming a vital discipline in IT because it enables businesses to extract value about the many kinds and large amounts of data they collect in doing whatever it is that they do. For those who do business with customers, it lets them learn more about those customers.

For those who maintain a supply chain, it helps them to understand more and better ways to request, acquire and manage supply components. For those who follow (or try to anticipate) markets – such as financials, commodities, employment and so forth – it helps them construct more accurate and insightful models for such things. The applications for data science are limited only by our ability to conceive of uses to which data may be put – limitless, in other words.

In fact, no matter where you look for data, if large amounts of information are routinely collected and stored, data science can play a role. It can probably find something useful or interesting to say about such collections, if those who examine them can frame and process the right kinds of queries against that data. That’s what explains the increasing and ongoing value of data science for most companies and organizations, since all of them routinely collect and maintain various kinds of data nowadays.

Basic Educational Background

The basic foundation for a long-lived career in IT for anybody getting started is to pursue a bachelor’s degree in something computing related. This usually means a degree in computer science, management information systems (MIS), computer engineering, informatics or something similar. Plenty of people transition in from other fields, to be sure, but the more math and science under one’s belt when making that transition, the easier that adjustment will be. Given projected shortages of IT workers, especially in high demand subject areas – which not only include data science, but also networking, security, software development, IT architecture and its various specialty areas, virtualization, and more – it’s hard to go wrong with this kind of career start.

For data scientists, a strong mathematics background, particularly in statistics and analysis, is strongly recommended, if not outright required. This goes along naturally with an equally strong academic foundation in computing. Those willing to slog through to a master’s or Ph.D. before entering the workforce may find data science a particularly appealing and remunerative field of study when that slog comes to its end. If so, they can also jump directly into mid- or expert/senior level career steps, respectively.

Early Career Work Focus and Experience

If data science is a long-term goal, the more experience one has in working with data, the better. Traditional paths into data science may start directly in that field, though many IT professionals also cross over from programming, analyst or database positions.

Much of the focus in data science comes from working with so-called “unstructured data” – a term used to describe collections of information usually stored outside a database such as large agglomerations of event or security logs, e-mail messages, customer feedback responses, other text repositories and so forth. Thus, many IT pros find it useful to dig into technologies such as NoSQL and data platforms such as Hadoop, Cloudera and MongoDB. That’s because working with unstructured data is an increasingly large part of what data scientists do. Early-stage career IT pros will usually wind up focusing on programming for big data environments, or working under the direction of more senior staff to groom and prepare big data sets for further interrogation and analysis.

At this early stage of one’s career, exposure to text-oriented programming and basic pattern-matching or query formulation is a must, along with a strong and expanding base of coding, testing and code maintenance experience. Development of basic soft skills in oral and written communications is a good idea, as is some exposure to basic business intelligence and analysis principles and practices. This leads directly into the early-career certifications mentioned in the next section.

Early-Career Network Certifications and Learning

Basic data science training is now readily available online in the form of massively open online courses, or MOOCs. Among the many offerings currently available, the January 2017 Quora article “What is the best MOOC to get started in Data Science?” offers a variety of answers, and lists courses from sources such as Duke (Coursera), MIT, Caltech, and the Indian Institute of Management and Business (edX), Stanford, and more. MS has since instituted a Microsoft Professional Program in Data Science that includes nine courses on a variety of related topics and a capstone project to present a reasonably complete introductory curriculum on this subject matter. (Courses aren’t free, but at $99 each, they are fairly inexpensive.)

Mid-career Work Focus and Experience

Data science is a big subject area, so by the time you’ve spent three to five years in the workforce and have started to zero-in on a career path, you’ll also start narrowing in on one or more data science specialties and platforms. These include areas such as big data programming, analysis, business intelligence and more. Any or all of them can put you into a front-line data science job of some kind, even as you narrow your focus on the job.

This is the career stage at which you’ll develop increasing technical skills and knowledge, as you also start to gain more seniority and responsibility among your peers. Soft skills become more important mid-career as well, because you’ll have to start drawing on your abilities to communicate with and lead or guide others (primarily on technical subjects related to data science and its outputs or results) during this career phase.

Mid-career Network Certifications

This is a time for professional growth and specialization. That’s why there is a much broader array of topics and areas to consider as one digs deeper into data science to develop more focused and intense technical skills and knowledge. Data science-related certifications can really help with this but will require some careful research and consideration. Thus, for example, one person might decide to dig into certifications related to a particular big data platform or toolset – such as the Certified Analytics Professional, MongoDB, Dell/EMC, Microsoft, Oracle or SAS.

This is a point at which one might choose to specialize more in big data programming for Hadoop, Cloudera or MongoDB on the one hand, or in running analyses and interpreting results from specific big data sets on the other. Cloudera covers most of these bases all by itself, which makes its offerings worth checking out: among many other certifications, they have Data Scientist, Data Engineer, Spark and Hadoop Developer and Administrator for Apache Hadoop credentials. There are dozens of Big Data certifications available today, with more coming online all the time, so you’ll have to follow your technical interests and proclivities to learn more about which ones are right for you.

Expert or Senior Level Work Focus and Experience

After 10 or more years in the workforce, it’s time to get serious about data science/Big Data. This is the point at which most IT professionals start reaching for higher rungs on the job role and responsibilities ladder.

Jobs with such titles as senior data analyst, senior business intelligence analyst, senior data scientist, big data platform specialist (where you can plug in the name of your chosen platform in searching for opportunities), senior big data developer, and so forth, represent the kinds of positions that data science pros are likely to occupy at the point on the career ladder. Expert or senior level IT pros will often be spearheading project teams of varying sizes by this point on the career line as well, even if their jobs don’t carry a specific management title or overt management responsibilities. This means that soft skills are even more important with an increasing emphasis on leadership and vision, along with skills in people and project management, plus oral and written communications.

Expert or Senior Level Big Data Certifications

This is the career step at which one typically climbs near or to the top of most technical certification ladders. Many of these credentials – such as the SAS “Advanced Analytics” credentials (four at present) – actually include the term “advanced” or “expert” in their certification monikers.

The SAS Institute and Dell/EMC, in particular, have rich and deep certification programs, with various opportunities for interested data scientists or Big Data folks to specialize and develop their skills and knowledge. Database platform vendors, such as Oracle, IBM and Microsoft are also starting to recognize the potential and importance of Big Data and are adding related elements to their certification programs all the time. Because this field is still relatively young and new cert programs are still coming online, the shape of the high end of the cert landscape for Big Data is very much a work in progress.

Whatever Big Data platform or specialty you choose to pursue, this is the career stage where deep understanding of the principals and practices in the field and an understanding of their business impact and value must begin to combine. It is also where people must focus on their soft skills at the highest level, because senior data scientists or Big Data experts must be able to lead teams of high-level individuals in the organizations they serve, including top executives, high-level managers, and other technical experts and consultants. As you might expect, this kind of work is as much about soft skills in communication and leadership as it is about in-depth technical knowledge and ability.

Continuing Education: Master’s or PhD?

Depending on where you are in terms of work experience, family situation and finances, it may be worth considering a master’s degree with a focus on data science or some other aspect of Big Data as a profound developmental step for career development. For most working adults, this will mean getting into a part-time or online advanced degree program.

Many such programs are available, but you’ll want to consider the name recognition value and the cost of those offerings when choosing a degree plan to pursue. If pursued later in life (after one’s 20s), a Ph.D. is probably only attainable for someone with strong interests in research or teaching. That means a Ph.D. is not an option for most readers unless they plan and budget for a lengthy interruption in their working lives (most doctorate programs require full-time attendance on campus, and take from three to six years to complete).

With proper education, certification, planning and experience, working as a data scientist, or in some other Big Data role, is an achievable goal. It will take at least three to five years for entry-level IT professionals to work their way into such a position (less for those with more experience or an advanced degree in the field), but it’s a job that offers high pay and one that is expected to stay in high demand for the foreseeable future. Because the amount of data stored in the world is only increasing year over year, this appears to be a good specialty area in IT that’s long on opportunity and growth potential.

Source: Business News Daily