Who oversees the data science process?
At most organizations, data science projects are typically overseen by three types of managers:
Business managers: These managers work with the data science team to define the problem and develop a strategy for analysis. They may be the head of a line of business, such as marketing, finance, or sales, and have a data science team reporting to them. They work closely with the data science and IT managers to ensure that projects are delivered.
IT managers: Senior IT managers are responsible for the infrastructure and architecture that will support data science operations. They are continually monitoring operations and resource usage to ensure that data science teams operate efficiently and securely. They may also be responsible for building and updating IT environments for data science teams.
Data science managers: These managers oversee the data science team and their day-to-day work. They are team builders who can balance team development with project planning and monitoring.
But the most important player in this process is the data scientist.
What is a data scientist?
As a specialty, data science is young. It grew out of the fields of statistical analysis and data mining. The Data Science Journal debuted in 2002, published by the International Council for Science: Committee on Data for Science and Technology. By 2008 the title of data scientist had emerged, and the field quickly took off. There has been a shortage of data scientists ever since, even though more and more colleges and universities have started offering data science degrees.
A data scientist’s duties can include developing strategies for analyzing data, preparing data for analysis, exploring, analyzing, and visualizing data, building models with data using programming languages, such as Python and R, and deploying models into applications.
The data scientist doesn’t work solo. In fact, the most effective data science is done in teams. In addition to a data scientist, this team might include a business analyst who defines the problem, a data engineer who prepares the data and how it is accessed, an IT architect who oversees the underlying processes and infrastructure, and an application developer who deploys the models or outputs of the analysis into applications and products.
Challenges of implementing data science projects
Despite the promise of data science and huge investments in data science teams, many companies are not realizing the full value of their data. In their race to hire talent and create data science programs, some companies have experienced inefficient team workflows, with different people using different tools and processes that don’t work well together. Without more disciplined, centralized management, executives might not see a full return on their investments.
This chaotic environment presents many challenges.
Data scientists can’t work efficiently. Because access to data must be granted by an IT administrator, data scientists often have long waits for data and the resources they need to analyze it. Once they have access, the data science team might analyze the data using different—and possibly incompatible—tools. For example, a scientist might develop a model using the R language, but the application it will be used in is written in a different language. Which is why it can take weeks—or even months—to deploy the models into useful applications.
Application developers can’t access usable machine learning. Sometimes the machine learning models that developers receive are not ready to be deployed in applications. And because access points can be inflexible, models can’t be deployed in all scenarios and scalability is left to the application developer.
IT administrators spend too much time on support. Because of the proliferation of open source tools, IT can have an ever-growing list of tools to support. A data scientist in marketing, for example, might be using different tools than a data scientist in finance. Teams might also have different workflows, which means that IT must continually rebuild and update environments.
Business managers are too removed from data science. Data science workflows are not always integrated into business decision-making processes and systems, making it difficult for business managers to collaborate knowledgeably with data scientists. Without better integration, business managers find it difficult to understand why it takes so long to go from prototype to production—and they are less likely to back the investment in projects they perceive as too slow.