How can effective data management accelerate product development?

Product Development Data management November 4, 2025 11 min read

Product development is fundamentally a knowledge function that thrives when high-quality data from diverse resources is available in an easy-to-access, easy-to-consume and consistent format. This article discusses how GenAI can boost the speed of product development by using RAG solutions for efficient knowledge extraction from unstructured data sources, and tools such as Augmend to efficiently improve the quality of (semi-)structured data.

R&D is fundamentally a knowledge function. Its success depends on collecting, creating and translating scientific, technical, and market knowledge into a novel, compliant, and profitable product.

However, this knowledge is complex. It must be drawn from lots of directions: consumers, customers, suppliers, competitors, regulators, collaborators and its internal knowledge of course. And the knowledge resides in two places: in the heads of individuals (scientists, operators) and in data.

Why is R&D data not in good shape?

Product development professionals intrinsically value data, but practical constraints in terms of tools, software skills, time available and the scale of information make it difficult for R&D professionals to manage their data effectively. The advent of large language models (LLMs, for e.g., ChatGPT) provides a golden opportunity for product and process developers to up their game in using their data effectively.

Product development deals with both structured and unstructured data. Below table summarizes the most critical product development processes, and the typical data associated with these processes.

Product development processes Structured Data Unstructured Data
1. Trends & Opportunity Identification Market size & Growth rate data, Competitor sales figures Scientific literature (Full journal article text) and reports, Patent filings, Consumer research, Futurist reports, Regulatory trend Filings
2. Idea Generation & Concept Screening Idea scorecards (Feasibility, ROI metrics), Consumer survey Quantitative results Consumer feedback, Qualitative research, Ideation workshop transcripts, Brainstorming notes
3. Product Formulation & Ingredient Discovery Raw material/Component databases (cost, composition), Physical/chemical testing results Lab notebooks, Supplier technical specifications
4. Testing, Prototyping, & Iteration Sensory panel scores, Product measurements, Accelerated stability studies Lab notebooks, Focus groups, Field trial reports
5. Scale-up and Process Optimization Pilot/commercial Batch Records (yield, cycle time), IoT data, Simulation results Operator notes on pilot runs (free-text), Deviation/incident narratives during trials.
6. Value Improvement (Cost/Value Engineering) Current Bill of Materials (BOM) costs, Production scrap/rework rates, competitor product tear down, Specifications Customer complaint narratives, Value improvement ideation output
7. Innovation Project and Portfolio Mgmt. Stage-gate metrics, Budget allocation, Resource capacity, Projected TTM Stage-gate decision documents, Risk assessments

The unstructured data challenge in product development

Unstructured data is relevant in all stages of product development but especially so in the fuzzy front-end processes of trend identification and idea / concept generation. Think about patents, regulations, reports, scientific literature and you can very well imagine that the vast amount of information here is unstructured, i.e., in text and sometimes images. Cutting through the ocean of information here to come to actionable insights has always been challenging and it is here that experts are truly valuable. With LLMs, however, exploration of this data can now be done very efficiently, and many vendors have started to offer chatbot like products (based on the RAG pattern we discussed elsewhere).

Going from relevant chunks of information from documents to a product development decision, may still need a mental leap. For example, there could be a bunch of internal reports and papers that have measurement data on raw materials and products but because there is a lot of contextual data associated with it such as process conditions that is spread out, it is hard to quickly come to conclusions. Should one be able to structure this data into targeted data sets, decisions would become much more robust.

The structured data challenge in product development

Lab manual data entry - R&D data management challenges

A substantial portion of R&D data is structured, particularly past the fuzzy front-end, particularly the data on measurements of various raw materials, intermediates, packaging and finished products and process conditions. There are multiple challenges with structured data:

  • Manual data entry is time-consuming, leading to data that may not be professionally digitized
  • Data may be available in inconsistent formats or just in narratives instead of structured formats
  • Data tends to be siloed into various buckets so connecting data is hard
  • Data may not have consistent terminology across product developers or time
  • Data may be structured but context may be unstructured (e.g., CoAs from suppliers)

This latter part of product development relies heavily on structured data. In fact, the whole process relies on cycles of data generation and decision making.

Introduction of software systems such as LIMS or lab notes helps in structuring the data, but a lot of data is simply left in a messy format simply because the teams have tight time deadlines to manage. Tools that can create structured datasets from relevant data can be really helpful.

Case in Point: TTM killed by a lack of complete structured data records

The fastest way to do any product development project is not to have to do one at all! Ok this might be only possible on rare occasions but most companies have accumulated a substantial amount of data from their prior work. However, this data is spread out in multiple reports, databases and loose documents. If existing structured dataset can be cleaned and enriched with contextual data, a lot of experimentation can be avoided.

As an example, let us look at a typical incremental innovation project. Product development is called upon to develop a minor improvement to the product. Many a times, a lot of prior experimentation has been done by multiple teams. Now if the teams do not use the same terminology for their materials and measurements, the data cannot be readily put together even though it is structured! If this data can be harmonized, and more context can be added for example from raw material specifications, it might become possible to find the solution directly or with minimum extra experimentation.

The data consistency and completeness problem can be tackled from two ends. At the front end during entering data by eliminating manual entry. This ensures data is captured in a structured format at the time of collection or generation, e.g., from supplier documents or operator pilot batch records. And if necessary to harmonize data terminology for example ingredient and method names, to aggregate as much historical data as possible to draw conclusions from.

Augmend is designed to help in these situations. You can use it to automate manual data entry so that you can collect more of high-quality structured data, and secondly you can use it to parse through your historical data and clean and enrich it.

Do you want to see Augmend in action?

If you're looking to accelerate your product development by managing data more effectively, Augmend might be the solution you've been waiting for. Click the Tackle your challenges button in the navigation bar to discuss your data challenges and check whether we could solve them using Augmend

Related Articles