Top 10 roles in AI and data science

When you think of the perfect data science team, are you imagining 10 copies of the same professor of computer science and statistics, hands delicately stained with whiteboard marker? I hope not!

analytics anywhereGoogle’s Geoff Hinton is a hero of mine and an amazing researcher in deep learning, but I hope you’re not planning to staff your applied data science team with 10 of him and no one else!

Applied data science is a team sport that’s highly interdisciplinary. Diversity of perspective matters! In fact, perspective and attitude matter at least as much as education and experience.

If you’re keen to make your data useful with a decision intelligence engineering approach, here’s my take on the order in which to grow your team.

#0 Data Engineer

We start counting at zero, of course, since you need to have the ability to get data before it makes sense to talk about data analysis. If you’re dealing with small datasets, data engineering is essentially entering some numbers into a spreadsheet. When you operate at a more impressive scale, data engineering becomes a sophisticated discipline in its own right. Someone on your team will need to take responsibility for dealing with the tricky engineering aspects of delivering data that the rest of your staff can work with.

#1 Decision-Maker

Before hiring that PhD-trained data scientist, make sure you have a decision-maker who understands the art and science of data-driven decision-making.

Decision-making skills have to be in place before a team can get value out of data.

This individual is responsible for identifying decisions worth making with data, framing them (everything from designing metrics to calling the shots on statistical assumptions), and determining the required level of analytical rigor based on potential impact on the business. Look for a deep thinker who doesn’t keep saying, “Oh, whoops, that didn’t even occur to me as I was thinking through this decision.” They’ve already thought of it. And that. And that too.

#2 Analyst

Then the next hire is… everyone already working with you. Everyone is qualified to look at data and get inspired, the only thing that might be missing is a bit of familiarity with software that’s well-suited for the job. If you’ve ever looked at a digital photograph, you’ve done data visualization and analytics.

Learning to use tools like R and Python is just an upgrade over MS Paint for data visualization; they’re simply more versatile tools for looking at a wider variety of datasets than just red-green-blue pixel matrices.

If you’ve ever looked at a digital photograph, you’ve done data visualization and analytics. It’s the same thing.

And hey, if all you have the stomach for is looking at the first five rows of data in a spreadsheet, well, that’s still better than nothing. If the entire workforce is empowered to do that, you’ll have a much better finger on the pulse of your business than if no one is looking at any data at all.

The important thing to remember is that you shouldn’t come to conclusions beyond your data. That takes specialist training. Just as with the photo above, here’s all you can say about it: “This is what is in my dataset.” Please don’t use it conclude that the Loch Ness Monster is real.

#3 Expert Analyst

Enter the lightning-fast version! This person can look at more data faster. The game here is speed, exploration, discovery… fun! This is not the role concerned with rigor and careful conclusions. Instead, this is the person who helps your team get eyes on as much of your data as possible so that your decision-maker can get a sense of what’s worth pursuing with more care.

The job here is speed, encountering potential insights as quickly as possible.

This may be counterintuitive, but don’t staff this role with your most reliable engineers who write gorgeous, robust code. The job here is speed, encountering potential insights as quickly as possible, and unfortunately those who obsess over code quality may find it too difficult to zoom through the data fast enough to be useful in this role.

Those who obsess over code quality may find it difficult to be useful in this role.

I’ve seen analysts on engineering-oriented teams bullied because their peers don’t realize what “great code” means for descriptive analytics. Great is “fast and humble” here. If fast-but-sloppy coders don’t get much love, they’ll leave your company and you’ll wonder why you don’t have a finger on the pulse of your business.

#4 Statistician

Now that we’ve got all these folks cheerfully exploring data, we’d better have someone around to put a damper on the feeding frenzy. It’s safe to look at that “photo” of Nessie as long as you have the discipline to keep yourself from learning more than what’s actually there… but do you? While people are pretty good at thinking reasonably about photos, other data types seem to send common sense out the window. It might be a good idea to have someone around who can prevent the team from making unwarranted conclusions.

Inspiration is cheap, but rigor is expensive.

Lifehack: don’t make conclusions and you won’t need to worry. I’m only half-joking. Inspiration is cheap, but rigor is expensive. Pay up or content yourself with mere inspiration.

Statisticians help decision-makers come to conclusions safely beyond the data.

For example, if your machine learning system worked in one dataset, all you can safely conclude is that it worked in that dataset. Will it work when it’s running in production? Should you launch it? You need some extra skills to deal with those questions. Statistical skills.

If we’re want to make serious decisions where we don’t have perfect facts, let’s slow down and take a careful approach. Statisticians help decision-makers come to conclusions safely beyond the data analyzed.

#5 Applied Machine Learning Engineer

An applied AI / machine learning engineer’s best attribute is not an understanding of how algorithms work. Their job is to use them, not build them. (That’s what researchers do.) Expertise at wrangling code that gets existing algorithms to accept and churn through your datasets is what you’re looking for.

Besides quick coding fingers, look for a personality that can cope with failure. You almost never know what you’re doing, even if you think you do. You run the data through a bunch of algorithms as quickly as possible and see if it seems to be working… with the reasonable expectation that you’ll fail a lot before you succeed. A huge part of the job is dabbling blindly, and it takes a certain kind of personality to enjoy that.

Perfectionists tend to struggle as ML engineers.

Because your business problem’s not in a textbook, you can’t know in advance what will work, so you can’t expect to get a perfect result on the first go. That’s okay, just try lots of approaches as quickly as possible and iterate towards a solution.

Speaking of “running the data through algorithms”… what data? The inputs your analysts identified as potentially interesting, of course. That’s why analysts make sense as an earlier hire.

Although there’s a lot of tinkering, it’s important for the machine learning engineer to have a deep respect for the part of the process where rigor is vital: assessment. Does the solution actually work on new data? Luckily, you made a wise choice with your previous hire, so all you have to do is pass the baton to the statistician.

The strongest applied ML engineers have a very good sense of how long it takes to apply various approaches.

When a potential ML hire can rank options by the time it takes to try them on various kinds of datasets, be impressed.

#6 Data Scientist

The way I use the word, a data scientist is someone who is a full expert in all of the three preceding roles. Not everyone uses my definition: you’ll see job applications out there with people calling themselves “data scientist” when they have only really mastered one of the three, so it’s worth checking.

Data scientist are full experts in all of the three previous roles.

This role is in position #6 because hiring the true three-in-one is an expensive option. If you can hire one within budget, it’s a great idea, but if you’re on a tight budget, consider upskilling and growing your existing single-role specialists.

#7 Analytics Manager / Data Science Leader

The analytics manager is the goose that lays the golden egg: they’re a hybrid between the data scientist and the decision-maker. Their presence on the team acts as a force-multiplier, ensuring that your data science team isn’t off in the weeds instead of adding value to your business.

The decision-maker + data scientist hybrid is a force-multiplier. Unfortunately, they’re rare and hard to hire.

This person is kept awake at night by questions like, “How do we design the right questions? How do we make decisions? How do we best allocate our experts? What’s worth doing? Will the skills and data match the requirements? How do we ensure good input data?”

If you’re lucky enough to hire one of these, hold on to them and never let them go. Learn more about this role here.

#8 Qualitative Expert / Social Scientist

Sometimes your decision-maker is a brilliant leader, manager, motivator, influencer, or navigator of organizational politics… but unskilled in the art and science of decision-making. Decision-making is so much more than a talent. If your decision-maker hasn’t honed their craft, they might do more damage than good.

Instead of firing an unskilled decision-maker, you can augment them with a qualitative expert.

Don’t fire an unskilled decision-maker, augment them. You can hire them an upgrade in the form of a helper. The qualitative expert is here to supplement their skills.

This person typically has a social science and data background — behavioral economists, neuroeconomists, and JDM psychologists receive the most specialized training, but self-taught folk can also be good at it. The job is to help the decision maker clarify ideas, examine all the angles, and turn ambiguous intuitions into well-thought-through instructions in language that makes it easy for the rest of the team to execute on.

We don’t realize how valuable social scientists are. They’re usually better equipped than data scientists to translate the intuitions and intentions of a decision-maker into concrete metrics.

The qualitative expert doesn’t call any of the shots. Instead, they ensure that the decision-maker has fully grasped the shots available for calling. They’re also a trusted advisor, a brainstorming companion, and a sounding board for a decision-maker. Having them on board is a great way to ensure that the project starts out in the right direction.

#9 Researcher

Many hiring managers think their first team member needs to be the ex-professor, but actually you don’t need those PhD folk unless you already know that the industry is not going to supply the algorithms that you need. Most teams won’t know that in advance, so it makes more sense to do things in the right order: before building yourself that space pen, first check whether a pencil will get the job done. Get started first and if you find that the available off-the-shelf solutions aren’t giving you much love, then you should consider hiring researchers.

If a researcher is your first hire, you probably won’t have the right environment to make good use of them.

Don’t bring them in right off the bat. It’s better to wait until your team is developed enough to have figured out that what they need a researcher for. Wait till you’ve exhausted all the available tools before hiring someone to build you expensive new ones.

#10+ Additional personnel

Besides the roles we looked at, here are some of my favorite people to welcome to a decision intelligence project:

  • Domain expert
  • Ethicist
  • Software engineer
  • Reliability engineer
  • UX designer
  • Interactive visualizer / graphic designer
  • Data collection specialist
  • Data product manager
  • Project / program manager

Many projects can’t do without them — the only reason they aren’t listed in my top 10 is that decision intelligence is not their primary business. Instead, they are geniuses at their own discipline and have learned enough about data and decision-making to be remarkably useful to your project. Think of them as having their own major or specialization, but enough love for decision intelligence that they chose to minor in it.

Huge team or small team?

After reading all that, you might feel overwhelmed. So many roles! Take a deep breath. Depending on your needs, you may get enough value from the first few roles.

Revisiting my analogy of applied machine learning as innovating in the kitchen, if you personally want to open an industrial-scale pizzeria that makes innovative pizzas, you need the big team or you need to partner with providers/consultants. If you want to make a unique pizza or two this weekend — caramelized anchovy surprise, anyone? — then you still need to think about all the components we mentioned. You’re going to decide what to make (role 1), which ingredients to use (roles 2 and 3), where to get ingredients (role 0), how to customize the recipe (role 5), and how to give it a taste test (role 4) before serving someone you want to impress, but for the casual version with less at stake, you can do it all on your own. And if your goal is just to make standard traditional pizza, you don’t even need all that: get hold of someone else’s tried and tested recipe (no need to reinvent your own) along with ingredients and start cooking!

Source: hackernoon.com

Advertisements

The Data Science Process

The Data Science Process is a framework for approaching data science tasks, and is crafted by Joe Blitzstein and Hanspeter Pfister of Harvard’s CS 109. The goal of CS 109, as per Blitzstein himself, is to introduce students to the overall process of data science investigation, a goal which should provide some insight into the framework itself.

analytics anywhere

The following is a sample application of Blitzstein & Pfister’s framework, regarding skills and tools at each stage, as given by Ryan Fox Squire in his answer:

Stage 1: Ask A Question
Skills: science, domain expertise, curiosity
Tools: your brain, talking to experts, experience

Stage 2: Get the Data
Skills: web scraping, data cleaning, querying databases, CS stuff
Tools: python, pandas

Stage 3: Explore the Data
Skills: Get to know data, develop hypotheses, patterns? anomalies?
Tools: matplotlib, numpy, scipy, pandas, mrjob

Stage 4: Model the Data
Skills: regression, machine learning, validation, big data
Tools: scikits learn, pandas, mrjob, mapreduce

Stage 5: Communicate the Data
Skills: presentation, speaking, visuals, writing
Tools: matplotlib, adobe illustrator, powerpoint/keynote

Squire then (rightfully) concludes that the data science work flow is a non-linear, iterative process, and that there are many skills and tools required to cover the full data science process. Squire also professes that he is fond of the Data Science Process as it stresses both the importance of asking questions to guide your workflow, and the importance of iterating on your questions and research, as one gains familiarity with one’s data.

The Data Science Framework is an innovative framework for approaching data science problems. Isn’t it?

Source: kdnuggets.com

Top 7 Data Science Use Cases in Finance

AnalyticsAnywhere

In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.

AnalyticsAnywhere2

In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.
To help you answer these questions, we have prepared a list of data science use cases that have the highest impact on the finance sector. They cover very diverse business aspects from data management to trading strategies, but the common thing for them is the huge prospects to enhance financial solutions.
Automating risk management
Risk management is an enormously important area for financial institutions, responsible for company’s security, trustworthiness, and strategic decisions. The approaches to handling risk management have changed significantly over the past years, transforming the nature of finance sector. As never before, machine learning models today define the vectors of business development.
There are many origins from which risks can come, such as competitors, investors, regulators, or company’s customers. Also, risks can differ in importance and potential losses. Therefore, the main steps are identifying, prioritizing, and monitoring risks, which are the perfect tasks for machine learning. With training on the huge amount of customer data, financial lending, and insurance results, algorithms can not only increase the risk scoring models but also enhance cost efficiency and sustainability.

Among the most important applications of data science and artificial intelligence (AI) in risk management is identifying the creditworthiness of potential customers. To establish the appropriate credit amount for a particular customer, companies use machine learning algorithms that can analyze past spending behavior and patterns. This approach is also useful while working with new customers or the ones with a brief credit history.

Although digitalization and automatization of risk management processes in finance are in the early stages, the potential is extremely huge. Financial institutions still need to prepare for this change by automating core financial processes, improving analytical skills of the finance team, and making strategic technology investments. But as soon as the company starts to move in this direction, the profit will not make itself wait.

Managing customer data

For financial firms, data is the most important resource. Therefore, efficient data management is a key to business success. Today, there is a massive volume of financial data diversity in structure and volume: from social media activity and mobile interactions to market data and transaction details. Financial specialists often have to work with semi-structured or unstructured data and there is a big challenge to process it manually.

However, it’s obvious for most companies that integrating machine learning techniques to managing process is simply a necessity to extract real intelligence from data. AI tools, in particular, natural language processing, data mining, and text analytics, help to transform data into information contributing in smarter data governance and better business solutions, and as a result – increased profitability. For instance, machine learning algorithms can analyze the influence of some specific financial trends and market developments by learning from customers financial historical data. Finally, these techniques can be used to generate automated reports.

Predictive analytics

Analytics is now at the core of financial services. Special attention deserves predictive analytics that reveals patterns in the data that foresee the future event that can be acted upon now. Through understanding social media, news trends, and other data sources these sophisticated analytics conquered the main applications such as predicting prices and customers lifetime value, future life events, anticipated churn, and the stock market moves. Most importantly such techniques can help answer the complicated question – how best to intervene.

Real-time analytics

Real-time analytics fundamentally transform financial processes by analyzing large amounts of data from different sources and quickly identifying any changes and finding the best reaction to them. There are 3 main directions for real-time analytics application in finance:

Fraud detection

It’s an obligation for financial firms to guarantee the highest level of security to its users. The main challenge for companies is to find a good fraud detecting system with criminals always hacking new ways and setting up new traps. Only qualified data scientists can create perfect algorithms for detection and prevention of any anomalies in user behavior or ongoing working processes in this diversity of frauds. For instance, alerts for unusual financial purchases for a particular user, or large cash withdrawals will lead to blocking those actions, until the customer confirms them. In the stock market, machine learning tools can identify patterns in trading data that might indicate manipulations and alert staff to investigate. However, the greatest thing of such algorithms is the ability of self-teaching, becoming more and more effective and intelligent over time.

Consumer analytics

Real-time analytics also help with better understanding of customers and effective personalization. Sophisticated machine learning algorithms and customer sentiment analysis techniques can generate insights from clients behavior, social media interaction, their feedbacks and opinions and improve personalization and enhance the profit. Since the amount of data is enormously huge, only experienced data scientists can make precise breakdown.

Algorithmic trading

This area probably has the biggest impact from real-time analytics since every second is at stake here. Based on the most recent information from analyzing both traditional and non-traditional data, financial institutions can make real-time beneficial decisions. And because this data is often only valuable for a short time, being competitive in this sector means having the fastest methods of analyzing it.

Another prospective opens when combining real-time and predictive analytics in this area. It used to be a popular practice for financial companies have to hire mathematicians who can develop statistical models and use historical data to create trading algorithms that forecast market opportunities. However, today artificial intelligence offers techniques to make this process faster and what is especially important – constantly improving.

Therefore, data science and AI made a revolution in the trading sector, starting up the algorithmic trading strategies. Most world exchanges use computers that make decisions based on algorithms and correct strategies taking into account new data. Artificial intelligence infinitely processes tons of information, including tweets, financial indicators, data from news and books, and even TV programs. Consequently, it understands today’s worldwide trends and continuously enhances the predictions about financial markets.

All in all, real-time and predictive analytics significantly change the situation in different financial areas. With technologies such as Hadoop, NoSQL and Storm, traditional and non-traditional datasets, and the most precise algorithms, data engineers are changing the way finance used to work.

Deep personalization and customization

Firms realize that one of the key steps to being competitive in today’s market is to raise engagement through high-quality, personalized relationships with their customers. The idea is to analyze digital client experience and modify it taking into account client’s interests and preferences. AI is making significant improvements in understanding human language and emotion, which brings customer personalization to a whole new level. Data engineers can also build models that study the consumers’ behavior and discover situations where customers needed financial advice. The combination of predictive analytic tools and advanced digital delivery options can help with this complicated task, guiding the customer to the best financial solution at the most opportune time and suggesting personalize offerings based on spending habits, social-demographic trends, location, and other preferences.

Conclusion

For financial institutions, the usage of data science techniques provides a huge opportunity to stand out from the competition and reinvent their businesses. There are vast amounts of continuously changing financial data which creates a necessity for engaging machine learning and AI tools into different aspects of the business.

We focused on the top 7 data science use cases in the finance sector in our opinion, but there are many others that also deserve to be mentioned. If you have any further ideas, please share your vision in the comment section.

Source:activewizards.com

How to Become a Data Scientist

AnalyticsAnywhere

All such roads lead to the same destination: a job assembling, analyzing and interpreting large data sets to look for information of interest or value.

Data science encompasses “Big Data,” data analytics, business intelligence and more. Data science is becoming a vital discipline in IT because it enables businesses to extract value about the many kinds and large amounts of data they collect in doing whatever it is that they do. For those who do business with customers, it lets them learn more about those customers.

For those who maintain a supply chain, it helps them to understand more and better ways to request, acquire and manage supply components. For those who follow (or try to anticipate) markets – such as financials, commodities, employment and so forth – it helps them construct more accurate and insightful models for such things. The applications for data science are limited only by our ability to conceive of uses to which data may be put – limitless, in other words.

In fact, no matter where you look for data, if large amounts of information are routinely collected and stored, data science can play a role. It can probably find something useful or interesting to say about such collections, if those who examine them can frame and process the right kinds of queries against that data. That’s what explains the increasing and ongoing value of data science for most companies and organizations, since all of them routinely collect and maintain various kinds of data nowadays.

Basic Educational Background

The basic foundation for a long-lived career in IT for anybody getting started is to pursue a bachelor’s degree in something computing related. This usually means a degree in computer science, management information systems (MIS), computer engineering, informatics or something similar. Plenty of people transition in from other fields, to be sure, but the more math and science under one’s belt when making that transition, the easier that adjustment will be. Given projected shortages of IT workers, especially in high demand subject areas – which not only include data science, but also networking, security, software development, IT architecture and its various specialty areas, virtualization, and more – it’s hard to go wrong with this kind of career start.

For data scientists, a strong mathematics background, particularly in statistics and analysis, is strongly recommended, if not outright required. This goes along naturally with an equally strong academic foundation in computing. Those willing to slog through to a master’s or Ph.D. before entering the workforce may find data science a particularly appealing and remunerative field of study when that slog comes to its end. If so, they can also jump directly into mid- or expert/senior level career steps, respectively.

Early Career Work Focus and Experience

If data science is a long-term goal, the more experience one has in working with data, the better. Traditional paths into data science may start directly in that field, though many IT professionals also cross over from programming, analyst or database positions.

Much of the focus in data science comes from working with so-called “unstructured data” – a term used to describe collections of information usually stored outside a database such as large agglomerations of event or security logs, e-mail messages, customer feedback responses, other text repositories and so forth. Thus, many IT pros find it useful to dig into technologies such as NoSQL and data platforms such as Hadoop, Cloudera and MongoDB. That’s because working with unstructured data is an increasingly large part of what data scientists do. Early-stage career IT pros will usually wind up focusing on programming for big data environments, or working under the direction of more senior staff to groom and prepare big data sets for further interrogation and analysis.

At this early stage of one’s career, exposure to text-oriented programming and basic pattern-matching or query formulation is a must, along with a strong and expanding base of coding, testing and code maintenance experience. Development of basic soft skills in oral and written communications is a good idea, as is some exposure to basic business intelligence and analysis principles and practices. This leads directly into the early-career certifications mentioned in the next section.

Early-Career Network Certifications and Learning

Basic data science training is now readily available online in the form of massively open online courses, or MOOCs. Among the many offerings currently available, the January 2017 Quora article “What is the best MOOC to get started in Data Science?” offers a variety of answers, and lists courses from sources such as Duke (Coursera), MIT, Caltech, and the Indian Institute of Management and Business (edX), Stanford, and more. MS has since instituted a Microsoft Professional Program in Data Science that includes nine courses on a variety of related topics and a capstone project to present a reasonably complete introductory curriculum on this subject matter. (Courses aren’t free, but at $99 each, they are fairly inexpensive.)

Mid-career Work Focus and Experience

Data science is a big subject area, so by the time you’ve spent three to five years in the workforce and have started to zero-in on a career path, you’ll also start narrowing in on one or more data science specialties and platforms. These include areas such as big data programming, analysis, business intelligence and more. Any or all of them can put you into a front-line data science job of some kind, even as you narrow your focus on the job.

This is the career stage at which you’ll develop increasing technical skills and knowledge, as you also start to gain more seniority and responsibility among your peers. Soft skills become more important mid-career as well, because you’ll have to start drawing on your abilities to communicate with and lead or guide others (primarily on technical subjects related to data science and its outputs or results) during this career phase.

Mid-career Network Certifications

This is a time for professional growth and specialization. That’s why there is a much broader array of topics and areas to consider as one digs deeper into data science to develop more focused and intense technical skills and knowledge. Data science-related certifications can really help with this but will require some careful research and consideration. Thus, for example, one person might decide to dig into certifications related to a particular big data platform or toolset – such as the Certified Analytics Professional, MongoDB, Dell/EMC, Microsoft, Oracle or SAS.

This is a point at which one might choose to specialize more in big data programming for Hadoop, Cloudera or MongoDB on the one hand, or in running analyses and interpreting results from specific big data sets on the other. Cloudera covers most of these bases all by itself, which makes its offerings worth checking out: among many other certifications, they have Data Scientist, Data Engineer, Spark and Hadoop Developer and Administrator for Apache Hadoop credentials. There are dozens of Big Data certifications available today, with more coming online all the time, so you’ll have to follow your technical interests and proclivities to learn more about which ones are right for you.

Expert or Senior Level Work Focus and Experience

After 10 or more years in the workforce, it’s time to get serious about data science/Big Data. This is the point at which most IT professionals start reaching for higher rungs on the job role and responsibilities ladder.

Jobs with such titles as senior data analyst, senior business intelligence analyst, senior data scientist, big data platform specialist (where you can plug in the name of your chosen platform in searching for opportunities), senior big data developer, and so forth, represent the kinds of positions that data science pros are likely to occupy at the point on the career ladder. Expert or senior level IT pros will often be spearheading project teams of varying sizes by this point on the career line as well, even if their jobs don’t carry a specific management title or overt management responsibilities. This means that soft skills are even more important with an increasing emphasis on leadership and vision, along with skills in people and project management, plus oral and written communications.

Expert or Senior Level Big Data Certifications

This is the career step at which one typically climbs near or to the top of most technical certification ladders. Many of these credentials – such as the SAS “Advanced Analytics” credentials (four at present) – actually include the term “advanced” or “expert” in their certification monikers.

The SAS Institute and Dell/EMC, in particular, have rich and deep certification programs, with various opportunities for interested data scientists or Big Data folks to specialize and develop their skills and knowledge. Database platform vendors, such as Oracle, IBM and Microsoft are also starting to recognize the potential and importance of Big Data and are adding related elements to their certification programs all the time. Because this field is still relatively young and new cert programs are still coming online, the shape of the high end of the cert landscape for Big Data is very much a work in progress.

Whatever Big Data platform or specialty you choose to pursue, this is the career stage where deep understanding of the principals and practices in the field and an understanding of their business impact and value must begin to combine. It is also where people must focus on their soft skills at the highest level, because senior data scientists or Big Data experts must be able to lead teams of high-level individuals in the organizations they serve, including top executives, high-level managers, and other technical experts and consultants. As you might expect, this kind of work is as much about soft skills in communication and leadership as it is about in-depth technical knowledge and ability.

Continuing Education: Master’s or PhD?

Depending on where you are in terms of work experience, family situation and finances, it may be worth considering a master’s degree with a focus on data science or some other aspect of Big Data as a profound developmental step for career development. For most working adults, this will mean getting into a part-time or online advanced degree program.

Many such programs are available, but you’ll want to consider the name recognition value and the cost of those offerings when choosing a degree plan to pursue. If pursued later in life (after one’s 20s), a Ph.D. is probably only attainable for someone with strong interests in research or teaching. That means a Ph.D. is not an option for most readers unless they plan and budget for a lengthy interruption in their working lives (most doctorate programs require full-time attendance on campus, and take from three to six years to complete).

With proper education, certification, planning and experience, working as a data scientist, or in some other Big Data role, is an achievable goal. It will take at least three to five years for entry-level IT professionals to work their way into such a position (less for those with more experience or an advanced degree in the field), but it’s a job that offers high pay and one that is expected to stay in high demand for the foreseeable future. Because the amount of data stored in the world is only increasing year over year, this appears to be a good specialty area in IT that’s long on opportunity and growth potential.

Source: Business News Daily