Is scalability a data scientist’s problem or an engineering problem?

1 min readJun 11, 2024

Should data scientists worry about how their solutions scale, or is it an engineering problem to solve?

Some believe that data scientists should only focus on the business problem, conduct research, experiment with different solutions, and then deliver the best one.

They argue that it’s not the data scientist’s primary responsibility to be concerned about how the solution would scale, as that is an engineering problem to solve.

I disagree.

I believe that how a solution scales is part of delivering the “best solution.”

Data scientists must consider scalability in their recommendations and then work closely with engineers to implement and scale the model effectively.

Here are a few causes I’ve seen preventing a data science solution from scaling:

using a lot of Python for data wrangling when sql works better,
algorithms with high time and space (memory) complexity,
computational strains (running locally on a laptop using SAS or not having enough budget for enough resources (CPU, GPU,… )),
high inference time when real-time prediction is necessary,
inefficient or unoptimized code,
data privacy and security measures,
and monolithic (tightly coupled) instead of more modular designs.

What do you think?

Is scalability a data scientist’s problem or an engineering problem?

Written by Shadi Balandeh