A construction worker might find a shovel helpful for digging the foundation of a house. But a shovel isn’t going to go very far when he starts putting up the walls.

Roger Peng, professor of Biostatistics at Johns Hopkins, writes that there is no single best tool for doing data science because the design process forces scientists through divergent and convergent phases in which “a tool that is very useful in one phase can be less useful or even detrimental in other phases.”

Peng explains further in Simply Statistics:

Much of the software that I use for data analysis, when I reflect on it, is primarily designed for the divergent phases of analysis. Software is inherently designed to help you do things faster so that you can “analyze at the speed of thought”. Good plotting and data wrangling software lets you do those activities faster. Good modeling software lets you execute and fit models faster. In both situations, fast iteration is important so that many options can be created for consideration.

Software for convergent phases of analysis are lacking by comparison. While there are quite a few good tools for visualization and report writing in Phase 4, I can’t think of a single data analytic software tool designed for Phase 2 when specifying the problem. In particular, for both Phases 2 and 4, I don’t see many tools out there for helping you to choose between all the options that you create in the divergent phases. I think data scientists may need to look outside their regularly scheduled programming to find better tools for those phases. If no good tools exist, then that might make for a good candidate for development.