Conjecture Cards - Agile Research Project Management

Project Managing research activities is hard; the open ended nature of research makes it too easy to meander aimlessly through the available time and budget. Good project management won't help you find the solution to the problem but it may stop you wasting time getting to a conclusion. In the competitive world of commercial data science, project management could be the difference between market success and obscurity.

Photo by Eden Constantino on Unsplash

Background

Some of the history has been simplified to avoid allowing the original project and corporate complexity to detract from the key points.

In 2017 we started our first project with data as a primary component, and data science as a necessary skill. We were not creating a new type of model, but we were aiming to deliver a product based on a function for which no pre-trained model existed at the time. We started hiring a "mixed bag" of PhD. data scientists and dived in.

2 main problems started cropping up:

It took a very long time to gather enough data and get it into a usable format. This meant that a data scientist was often waiting for data before they could start exploring the problem. We were lucky enough to have an Ops team that provided labelling, but it was still a slow process.
Data science tasks never finished. They sat at 80% complete for weeks and sometimes months. Some of this was obviously down to naivety on our part, but even when we had clear objectives there was no end to the researching tasks.

The main team, mostly software engineers, were working in a typical agile way with stand-ups and sprint ceremonies. And the data scientists were initially participating, but when they would make the same report each day and had the same task every sprint they started describing the process as "pointless". And we had to agree; their work became disconnected. The data scientists started leaving (or being poached) and we then had a new problem, unfinished research tasks were regressing as experience of what had been tried left the company.

We needed to find ways to:

Build and retain knowledge of the research tasks as they progressed
Integrate upstream data dependencies into the project management process and be able to work with early results
Give the data scientists a sense of achievement even when they were unable to fully solve a problem

Conjecture Records

To build and retain knowledge we started using Conjecture Records. These are like Architecture Decision Records, but for research conjectures in a problem space that can be explored, proven or dis-proven through experimentation. These records were stored in version control and used to share what could be or has been investigated, ideal for new starters and rotating researchers. The standard software engineering practices around PRs and peer reviews were being used to promote good engineering/research hygiene. The collection of Conjecture Records can be considered as the Conjecture Space of the product/problem.

These records were a great improvement. They consolidated the knowledge around solving the research problem, so we knew when we needed to find new ideas, when we needed more data/processing power, when we could try combinations of approaches, and best of all - we knew when to stop. With these we could demonstrate the current state of the research in an easily digestible format, and avoid wasting effort repeating activities. What they didn't provide on their own was the agile project management.

With Conjecture Records in use, daily stand-ups were less pointless, but they were still a series of "I am trying this, or I am trying that" statements. We were reminded of the Yoda quote "Do or do not, there is no try", our Kanban board just had "Doing" and "Done" - so we needed a way to describe the trying as doing. The solution was Conjecture Cards.

Introducing Conjecture Cards

A Word On Conjectures

Conjectures are not Hypotheses. Many people confuse the two words, but the subtle difference is that a Conjecture does not have to be testable or based on evidence, which makes it ideal for brainstorming or back of a beer mat thinking. By providing supporting evidence or data you can turn a Conjecture into a working Hypothesis, and build production systems around it. You can think of and propose Conjectures no matter what your level of experience or education, the rigour and scientific training to test and prove that Conjecture is the harder part.

Riemann Zeta Zeros by Jgmoxness

Golden Rules

Anyone can propose a Conjecture
No Conjecture should be dismissed or downplayed without evidence

Requirements of Agile

Conjecture Cards needs to:

Define units of work around research tasks that fit into common Agile Project Management processes (e.g., Kanban, Scrum, etc.)
Be easy to understand, even if the concept being referenced is more complicated
Be something that only gets done once
Have a verifiable deliverable

Conjecture Cards In A Nutshell

They are cards because they exist in the project management process the same way as any software engineering task. They start in the same backlog and migrate through the process the same as any other task. They exist alongside data engineering and software engineering tasks. The prioritisation of cards as tasks uses the same techniques, although the facets are different.

Conjecture Cards can be compared to Agile Spike Stories in that each step is a time-boxed activity designed to add knowledge to the Conjecture Space. We have used Conjecture Cards to go a level deeper than Spike Stories and tailor their use to the common steps involved in Data Science and ML Applications.

We use the following Conjecture Card archetypes:

Peripheral - looking outside the current Conjecture Space
Evidence focused - adding data to Conjectures, looking for evidence both to support and refute
Operational - adding software, systems and quality control to validate Conjecture related solutions at production scale

Here are some examples of Conjecture Card titles:

"Search for new papers on <subject>" [Peripheral]
"Read & Report on <specific paper>" [Peripheral]
"Replicate relevant outcomes from <specific blog post>" [Peripheral]
"Find & Ingest datasets related to <conjecture>" [Evidence]
"Extract <quality level> features relevant to <conjecture> from <dataset>" [Evidence]
"Run & Analyse EDA for <dataset> related to <conjecture>" [Evidence]
"Test <conjecture> to <quality level> with features from <dataset>" [Evidence]
"Explain & Present how <conjectures> can be used to deliver <improvement>" [Operational]
"Analyse & Predict performance of <conjectures> to solve <problem>" [Operational]
"Review architecture for <solution> based on <conjectures>" [Operational]
"Implement <solution> in <framework>" [Operational]
"Define drift alerting criteria for <solution>" [Operational]

You might notice that there is the concept of a "quality level" in these examples, this is a way to differentiate between fast/early versions and extensive/complete versions. Versions can be "fastest", "easiest", "cheapest", "highest precision", "highest recall", "stratified", "down-sampled production", etc. The same Conjecture may be tested to different quality levels in a series of cards. Taking several passes at a potential solution gradually raises confidence (and expectations at a manageable speed) and reduces the risk of an expensive failure. If a test seems too big for a single cycle, consider a shorter test (e.g., fewer epochs) rather than a longer cycle.

You may also notice that a number of these Conjecture Cards reference upstream and downstream Data Engineering and Architecture activities, and in these cases the Data Scientists may pair up with other specialists to deliver the tasks. Having these tasks in the same backlog process puts them into plain sight of the whole team and encourages supportive behaviours.

There should be at least one Conjecture Card in progress or in the backlog for each unexplored or positively proven Conjecture. If a card is blocked, perhaps waiting for data or shared resources, make sure the tasks to unblock it are clear. Each Conjecture Card moves through the project management system into the done state, the exact process will vary depending on your Agile approach. In doing so the Conjecture Space (we used a git repo) would be updated by a peer reviewed PR containing the knowledge gained, notebooks and/or code produced in the processing of the card's activity. Data Scientists present what they have done and participate in the retrospectives the same as the rest of the team - this builds shared understanding and group ownership. At the end of the cycle you need to add one or more cards for the same Conjecture into the backlog to keep the momentum.

The last general point to make is that there is no 1:1 relationship between Conjectures and data-driven solutions. Conjecture testing delivers incrementally better features and models as the research continues, but the value and priority of those cards change as the product and solutions mature. A typical business solution will involve multiple conjectures.

Things To Try

We have tried some extras to help us, with mixed results, here are the highlights:

Estimation - seems obvious to estimate the time to complete the card, but like software estimation, it didn't actually help much.
Learning Points - like story points, but they represent the knowledge and learning value of the activity; this was useful for prioritisation.
Cost Points - like story points, but they represent the cost of the activity; e.g. buying licensed data, labelling data, running GPU training clusters, experimentation in production; this was useful for managing shared budgets and resources.
Balanced Archetypes - the core of the effort is in the Evidence focused type of activities, but without regular Peripheral activity you may run out of ideas, and without at least 1 solution being pushed into Operational activities then you risk ivory tower accusations and under-delivery to the wider business.
White Paper Solutions - writing a White Paper to represent the collection of conjectures that constitute a solution. The principle is that this goes beyond the internal needs and could form the basis of conference presentations and thought leadership. Jury is still out on this one.
Cards to compare Conjectures - review existing Conjectures in the light of new ideas for Conjectures and unexpected testing outcomes of existing Conjectures. For example, a new way to extract features may come to light while investigating a newer conjecture that could help with other conjecture testing (e.g.. using transformers for NLP or diffusion instead of convolution). Going back over stale areas of the Conjecture Space has value.
Wild Card - picking a (semi) random card every now and then. It can be too easy to get stuck in a chain of thought and so new perspectives sometimes help.
MLOps Cards - adding structure and tools to the workspace; shaping the deployment of data lakes, model/feature stores and other automated solutions to help the team.

Does It Work?

Fast forward to 2023, new teams, new company, same principle, yes it does work, but it is still early days and this approach is far from being well-tested. Product managers, junior software developers and other experts regularly contribute conjectures. Integration with the other skills areas is working well enough. We still wait for data, but there is plenty of work on the board in the meantime. Most Data Scientists like the structured approach. Recently we have not had a "failed solution", but we have had to downgrade expectations in a small number of projects based on disappointing test results during the low sunk costs Evidence phase.

Conclusion

So, when the CEO comes to you to ask, in a very Richard Scarry kind of way, "What Do Your Data Scientists Do All Day?" you can show your Conjecture Cards and demonstrate progress but not necessarily success.

Photo by Tim Collins on Unsplash

Research is very non-linear, and this process gives just enough structure to mix broad and deep scopes of work in the same context as the rest of the product development. You still need the people skills to manage the talent, the deep pockets to pay for the compute and the resourcefulness to accumulate the data for the projects to succeed, but at least you might not squander the opportunity through poor project management.

Author: Hugh Reid, Infer Systems Ltd.

Infer Systems Tech Blog

Search This Blog