The IBM Data Science Experience grew from our attempts to understand data science as outsiders: as designers wanting to build a tool for data scientists.
Problem: Data Scientists lack a holistic experience that adapts with their unique workflow.
Goal: To build a revolutionary new experience for Data Scientists to research, get inspired, create, and troubleshoot their work.
Role: I was the Visual Designer working with four UX Designers and one Researcher. I had a hand in almost everything, especially the marketing page and the content cards. I also made the launch video!
Awards: Our product received awards from Red Dot and Fast Company (see below).
Without connecting people to data, it’s just a bunch of stuff
The Data Science practice is amazing and complex. A solo data scientist has to form a relevant hypothesis, find a corresponding data set, clean it, and repeatedly build and edit a model to prove or disprove their hypothesis.
The Data Science Experience grew from our attempts to understand data science as outsiders: as designers wanting to build a tool for data scientists. We were curious how data scientists distill something interesting from inchoate data. This curiosity catapulted us into a months-long research endeavor. We synthesized research conducted in our studios all over the world and had conversations with every data scientist we could find. This included hundreds of interviews, dozens of contextual inquiries, and the production of countless research artifacts. We were astounded by the practice we uncovered, and inspired by its creativity. We came to understand data science as storytelling — an act of cutting away the meaningless, and finding humanity in a series of digits.
The chart above was an attempt to organize and rationalize the data science process, with metrics detailing opportunities for improvement.
Finding our principles
Current tools only address single facets of data science — which means data scientists must toggle back-and-forth between research and development. Data Shaper is for cleaning data, Jupyter is for modeling, and MatPlotLib is for visualizing. These tools are designed to serve a linear process, but a data scientist’s process is not linear, it’s cyclical.
From this model, our first design principle emerged: A holistic approach to enable data scientists. As we discussed before, much of our research involved contextual inquiries. We watched a data scientist build a pipeline — sourcing assets from the web, comparing his code to others’, and constantly jumping from tool to tool. We loved this part of the research, as it helped us understand that each facet of the process requires unique research.
We saw him use dozens of assets of many different types. We watched him organize and name them. At any given point, he needed a tutorial, an academic paper, or a data set to move to the next step in his process, and each of these assets had to be saved and interacted with in a different environment. The process he used to manage his resources helped us establish a tentative system for artifact classification.
Turning principles into practice
We wanted to create an interface that was open and dynamic, just like the modeling process we observed. We determined that our concept must allow the data scientists to converse, learn, and research in the context of their software. We knew our design had to operate as a toolbox that was more dynamic than just a collection of software applications. In addition to providing data scientists with the full scope of software products that they need to complete their process, we need to address their need to validate and advance their work through research.
This helped us design one of our first concepts: the maker palette. This feature developed from the idea that the community is a tool — just as important as a notebook or data set. The design treatment is just the same as any other resource — it appears in a panel that can be opened and closed at will. The benefit is that it’s not specific to a file format or tool, so it can be accessed in any part of the interface.
In the maker palette, a data scientist can find data sets, access papers, view tutorials, and compare their code to others. When they’re uninspired or stuck, the community acts as both peer, tool, and teacher.
The practice of data science surrounds the building of a pipeline, which is a sequence of algorithms that process and learn from data. As we watched data scientists build their pipelines in notebooks, we likened the process to building a wall around a garden, brick by brick. Each brick must be tested to see if it fits the within the bricks that preceded it.
The brick building metaphor inspired the form of our design. We translated the modularity of pipeline construction into a card design paradigm for the interface. Having a uniform treatment for a variety of content types allowed us to streamline the search for resources. A key component of our maker palette was the ability to display mixed content in a singular environment. The data scientist can search for any type of asset inside of their workspace, and review and reference it in a singular, cohesive environment.
Content Card Iterations
The card-in-panel format gives the data scientist the ability to quickly test a variety of assets in their work. They can make off-the-cuff adjustments without having to make time commitments to deep research or additional tools. They can repeatedly complete the cycles of their work — ask, build, test, refine — in one unified experience.
In data scientists, we see ourselves
In IBM Design, we often discuss “the loop,” or the practice of continuous refinement of an idea through research and testing. Like the scientific method, we design a hypothesis, develop prototypes, test them, make observations, and adjust. As software designers, we’re constantly trying to find the storyline in “stuff.” Much like data scientists, we sift through the extraneous to find the human elements in products and processes. At the beginning, data science seemed complex and distant, and now, after all our research and a little self-reflection, it seems strangely familiar.
Read the full blog post here!
In The News
"This will not only enable more advanced analytics, it will help us to reimagine how we manage our organizations and compete in the marketplace." – Forbes
"Data Science Experience is designed to speed and simplify the process of embedding data and machine learning into cloud applications." – PCWorld
"IBM's new Data Science Experience is a native Apache Spark platform for data scientists and developers." – eWEEK