CAMELOT AI Driven Data Extraction Tool (CADET)
Data processing begins with data entry. Already this first step seems to be a very common pain point for businesses and probably one of the most time-consuming activities. Nowadays companies deal with data entry by doing it manually inhouse or by a shared service center. Data is often entered based on heterogeneous sources of data: structured, unstructured, different languages, formats, pictured, PDFs, drawings, e-mails, etc. This makes processes not only costly and time-consuming, but also prone to numerous errors resulting in poor data quality. Humans are not infallible and can cause a great number of errors, which in turn negatively impacts processes highly dependent of accurate data.
We have built a data extraction and annotation tool that is able to retrieve and process information from both unstructured and structured documents – be that a text file, an image or a PDF file. Here is how it works:
As seen above, human-in-the-loop is part of our solution. Including the validation step is crucial – this way the human is not completely erased from the equation and still acts as an integral part of the process by verifying the automated extraction results. Most importantly, the feedback from human validation is further used for algorithm’s learning, continuously improving the results.
The primary benefit of the CADET tool is the reduced overall processing time and the decrease in costly man-hours – saved time can be redirected to carrying out activities with a higher value generating potential. As the quality of data entered into the system enhances, risks of processes highly dependent on precise data deteriorating become very low. Moreover, digitized information extraction leads to an overall harmonized data handling process, better insights and analytics.
CADET originates from the procurement field – it started with a PoC project addressing the need for automating processes related to handling of complex materials data. The project enabled redirection of saved valuable time towards high-value activities – you can read more about it in this SAP Business Transformation Study. Today, CADET can also be applied in other use cases across the whole value chain – here are some of them:
- Customer service – ticket creation from free text inquiries;
- Accounting – invoice and reporting processes;
- HR – processing resumes and job descriptions.
Rule Mining Service
Another problematic factor in data management is analysis. The main reason behind troubles in generating meaningful insights is often data quality – inconsistencies, incorrect planning parameters or incorrect and slow data entry, as we have seen in the previous case. In addition, data analysis itself, the derivation of rules, their validation and implementation are labor intensive and costly, especially if done by an expert.
Luckily, nowadays we have rule mining technology at our disposal. The CIDS rule mining service extracts business logic from available data in the form of rules. It is built on two pillars: statistical analysis applied to a large dataset and automated derivation of rules and data patterns.
What gains can you expect from using the service? Besides improved data quality, the consistency of your master data will grow – rule mining facilitates identification and correction of inconsistencies from an end-to-end perspective. Increased data patterns transparency will generate valuable and reliable insights, strengthening decision-making processes. Rule mining can potentially reduce complexity and speed up processes such as supply chain setup, data population or quality monitoring.
Rule mining is a prominent field with numerous potential use cases:
- Data analytics and insights
- Value prediction for attribute population – faster and correct data entry by using automated value entry suggestions
- Outlier detection
- Pattern recognition
Figure 2: Rule Mining Service: Architecture
Vendor Hierarchy Optimization Service
We have come to realize that many companies face the problem of poor spend reporting on an aggregate vendor family level. The root cause is in the current approach used for vendor aggregation – most businesses use irregular manual checks that are time consuming and thus costly, as well as a subject to the notorious human error. As a result, poor data quality hinders meaningful insights on vendors.
To address this problem, we came up with a solution that automatically enriches vendor information by using publicly available data. The tool can be seamlessly integrated into existing ERP/MDG systems to update relevant datasets with newly retrieved and validated information. This can be done in two ways:
- through frequently scheduled updating jobs to account for changes in existing vendor hierarchies
- simply whenever a new vendor is entered into the system.
Additionally, bulk uploads are possible – retrieving the hierarchy information can be done for a set of multiple vendors at once.
What are the benefits of this solution? Firstly, it achieves significant data quality improvements as the automated process eliminates human error and does that at a lower time and monetary cost. Secondly, the tool creates a single source of information on vendor hierarchies – helps you get rid of the data silos related to vendor data entries. All of this provides an aggregated view on vendor hierarchies that can serve as a baseline for enhanced decision-making and value generating activities in spend analysis, compliance and controlling.
Finally, the service has scaling up potential in multiple dimensions:
- Result improvement – including multiple sources for the retrieval of the same information piece
- Feature enrichment – including different data sources and enrich data object with other relevant information
Horizontal enlargement – applying the underlying logic (pooling publicly available data sources to enrich data on a certain data object) to other data objects and use cases